Improvement for file comparison dialog

Here you can propose new features, make suggestions etc.

Moderators: Hacker, petermad, Stefan2, white

User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50865
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

My reasoning for not having a codepage dialog is similar to what MVV writes: It makes no sense to compare 2 ANSI files with different codepages, because there will be no matches of foreign characters, e.g. between Cyrillic and accented Latin characters. The only application would be comparing 2 ANSI files with same codepages, or 1 ANSI with one Unicode - and the codepage doesn't matter when using Unicode.

And when you do this comparison, you should choose a font which supports this encoding well - so the encoding can as well be set within that font...
Author of Total Commander
https://www.ghisler.com
User avatar
MVV
Power Member
Power Member
Posts: 8711
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

ghisler(Author) wrote:or 1 ANSI with one Unicode - and the codepage doesn't matter when using Unicode.
Actually codepage matters here for ANSI file (if it has non-standard one).
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

ghisler(Author) wrote:It makes no sense to compare 2 ANSI files with different codepages, because there will be no matches of foreign characters, e.g. between Cyrillic and accented Latin characters.
First of all, Windows doesn't have nothing but the few system code pages (125x), but may have cp tables for tons of other pages (ISO 8859-X, Mac pages 100xx, EBCDIC), depending on the Windows version and configuration, and AFAIK it always has the old OEM code page tables available. Just look for the amount of NLS files in the Windows sytem dir.
So there are situations where you can easily compare two files with different ANSI/OEM code pages in a practical way.
For example ISO 8859-2 VS Mac 10029, or I may want to compare old OEM text files to a recoded text file (to ANSI or Unicode) - for example OEM 850 text files VS ANSI 1252 files, as AntonDudarenko already pointed out (not possible in current CBC tool).

Therefore: why still trying to compare on a "per code page" basis, while you can easily recode both sides' characters to the Unicode plane in memory before comparing (and just compare in Unicode plane) ?
The available NLS tables will allow you to specify which ANSI/OEM code page to use for MultiByteToWideChar (and WideCharToMultiByte). So just enumerate all available code page translations for the current environment and offer the user to either use the system default ANSI cp (like CBC is using for now) or to select a custom cp from these available pages on both sides (it is easily possible to enumerate, I wrote code for this purpose myself).
This would also spare the font render issues, as you just need a Unicode capable font (at least on Vista and newer the default system fonts are able to render all chars from those code pages)
Sure, implementing such thing may be not that easy, especially when looking on TC's support for Windows 9x, but it would make comparing two non-Unicode files with custom page for each side possible.
TC plugins: PCREsearch and RegXtract
User avatar
AntonyD
Power Member
Power Member
Posts: 1664
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Post by *AntonyD »

2milo1012
brilliantly! huge thanks for co-understanding, bro ;)
my main noticeable files come from ibm mainframe server
which has its own view on "windows-like" codepages....
#146217 personal license
browny
Senior Member
Senior Member
Posts: 370
Joined: 2007-09-10, 13:19 UTC

Post by *browny »

milo1012 wrote:you can easily recode both sides' characters to the Unicode plane in memory before comparing (and just compare in Unicode plane) ?
Please note that you used the word recode twice in your message.
When you change code page recoding would be necessary; and that implies recompare.
Also when compare tool starts it should request to choose correct code page; otherwise initial comparison might be wasted.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

browny wrote:When you change code page recoding would be necessary; and that implies recompare.
Well of course, same happens now when you switch one side in CBC from UTF-8 to ANSI or UTF-16 or whatever.
TC reads the file data into memory anyway (file mapping), so there will be no additional I/O implied.
browny wrote:Also when compare tool starts it should request to choose correct code page; otherwise initial comparison might be wasted.
You just take the last selection when TC identifies a non-Unicode and non-binary file for a side (a specific ANSI page or the default system page), but maybe an additional option could be added, like optionally always fall back to default cp when restarting CBC.
browny wrote:Please note that you used the word recode twice in your message.
The first time was just to explain the example of comparing old DOS/OEM file to a manually recoded file (like in an external text editor) and I want to check for additional differences in these files besides the encoding.
TC plugins: PCREsearch and RegXtract
browny
Senior Member
Senior Member
Posts: 370
Joined: 2007-09-10, 13:19 UTC

Post by *browny »

I pointed out multiple use of the word "recode" because this contradicts original poster's idea to use existing comparison results and only repaint text with different characters (just like Windows/DOS charsets switch in Lister).

For that reason the idea of recoding and recomparing text deserves to be discussed as a separate suggestion.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

2browny
Well, I answered to the statement "It makes no sense to compare 2 ANSI files with different codepages" and I explained that the whole idea about "the font determines the visual encoding" (like in Lister) is outdated, which is why I suggested to switch to a full-blown Unicode compare, as it would free you of a whole bunch of additional problems.

Sure, it might not exactly be what OP (2nd post, that is) suggested in terms of "how to achieve this", but it would lead to the same goal (and I'd use the same thread title for such kind of suggestion).
TC plugins: PCREsearch and RegXtract
User avatar
AntonyD
Power Member
Power Member
Posts: 1664
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Post by *AntonyD »

2browny
you have isolated not the most important part in my post. sorry.

main idea is - to have the ability to quickly apply charset's changing on panels separately (for some human purposes of files comparison). that's all.

IF for achievement of this goal it looks like very useful and more correct to change the whole comparing process (to transfer it to rails of full-fledged Unicode-comparison) - so let's do it.
2milo1012
explained it very very clear and logical.
#146217 personal license
browny
Senior Member
Senior Member
Posts: 370
Joined: 2007-09-10, 13:19 UTC

Post by *browny »

AntonDudarenko wrote:you have isolated not the most important part in my post. sorry.
That was one of the key points, and now you have changed your opinion on the subject.
AntonDudarenko wrote:for some human purposes of files comparison). that's all.
The phrase is very unclear.
As far as I can guess that means: throw away all comparison results and use your eyes to spot the differences instead.
In that case you hardly need any comparison tool at all.
User avatar
AntonyD
Power Member
Power Member
Posts: 1664
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Post by *AntonyD »

2browny
looks like i do have enough proper english words to explain my view on this suggestion.

Again. My main point is - to add ability to change charset for panels with compared files. In order to add additional step for human analysis of compared files. I have such needs.

Where in this case the compare process itself should be pinned - I don't know. I had a guess that to add such ability AFTER the comparing process - should be rather easy, quickly and correctly.
That's why I've posted my second post in this thread in which I've told that only "repaint of text" - that's the only expected operation.
But later 2milo1012 very easily and clearly explained that the whole compare process can be tuned up/improved/....
And of course I certainly agree with this approach. Improvements are always welcomed.

But in any case inside the compare process or after after process I want to open a combobox with configurable list of charsets (like Lister does) for left panel OR for right panel and to choose some charset for related file. And my expectations here - to see "the same colored" file but in a different font encoding only.

whether comparison process has to be caused again at such approach, or shouldn't - is as you wrote it's a good reason for parallel discussion.
but I it seems now precisely expressed the request for improvement. isn't it?
#146217 personal license
browny
Senior Member
Senior Member
Posts: 370
Joined: 2007-09-10, 13:19 UTC

Post by *browny »

AntonDudarenko wrote: My main point is - to add ability to change charset for panels with compared files.
That is clear enough.
AntonDudarenko wrote: I had a guess that to add such ability AFTER the comparing process - should be rather easy, quickly and correctly.
Easy, quick and correct it would be only in case of ANSI text and the same code page for both panes.
Otherwise recomparison would be unavoidable - contrary to your expectations to keep results and just substitute characters.
AntonDudarenko wrote: And my expectations here - to see "the same colored" file but in a different font encoding only.
If different code pages for left and panes were chosen after comparison, then coloration would be incorrect.
Post Reply