LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: white, Hacker, petermad, Stefan2

User avatar
Dalai
Power Member
Power Member
Posts: 9387
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Dalai »

Hi there :).

What is LineBreakinfo?
LineBreakInfo is a Total Commander content plugin (WDX) that provides various pieces of information about file contents like line break type, type of BOM (Byte Order Mark), and number of each type of line break character(s) (CR/LF/CRLF/FF/VT).

Features:
  • Determine type of line breaks in a file, i.e. N/A, CR (Carriage Return), LF (Line Feed), CRLF, FF (Form Feed), VT (Vertical Tab) or Mixed
  • Determine type of BOM (Byte Order Mark) in a file, i.e. None, UTF-8, UTF-16 LE/BE, UTF-32 LE/BE
  • Provides the number of CR, LF, FF, VT and combined CRLF sequences as well as the number of "binary" (non-printable) bytes
  • Supports Unicode and long paths (> 259 characters)
Download on totalcmd.net, Mirror

Enjoy! Please feel free to test it, comment, discuss, report bugs and so on :D. Feedback is highly appreciated!

Regards
Dalai
Last edited by Dalai on 2023-12-01, 15:07 UTC, edited 3 times in total.
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
white
Power Member
Power Member
Posts: 4618
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Re: LineBreakInfo - Content plugin that provides information about line break type, BOM type, number of CR/LF/CRLF seque

Post by *white »

Dalai wrote: 2023-11-15, 21:56 UTC Download on totalcmd.net, Mirror
Congrats with your release.
history.txt wrote: Version 0.2.0 [???]
- Initial public release
Was ??? supposed to be replaced with a date?

Suggestion for topic title:
LineBreakInfo 0.2.0 - Content plugin for info about line break type, BOM type, number of CR/LF/CRLF occurrences
User avatar
Dalai
Power Member
Power Member
Posts: 9387
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: LineBreakInfo - Content plugin that provides information about line break type, BOM type, number of CR/LF/CRLF seque

Post by *Dalai »

white wrote: 2023-11-15, 22:12 UTC
history.txt wrote: Version 0.2.0 [???]
- Initial public release
Was ??? supposed to be replaced with a date?
Yes, and it is now. Thanks for spotting this.

It's a lot to do within a very short period of time when releasing a new plugin. There are links to the TC discussion thread on totalcmd.net that are not known before creating a thread in this forum. And the same link is also in the plugin's readme file. So, it's a bit of work to avoid a catch-22 ;).
Suggestion for topic title:
LineBreakInfo 0.2.0 - Content plugin for info about line break type, BOM type, number of CR/LF/CRLF occurrences
Thanks for the suggestion. Although I'm aware that it has some advantages, I don't like the version number in the thread title. But I'm going to change the title to something more clear.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
petermad
Power Member
Power Member
Posts: 14795
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *petermad »

2Dalai

Thanks for the very well documented plugin.

Here is the Danish translation:

Code: Select all

[dan]
;--- Line Breaks
Line Breaks=Linjeskift
N/A|Binary|LF|CR|CRLF|Mixed=Ingen|Binær|LF|CR|CRLF|Blandet

;--- BOM Type
BOM Type=BOM Type
None|UTF-8|UTF-16 LE|UTF-16 BE=Ingen|UTF-8|UTF-16 LE|UTF-16 BE

Binary Count=Antal Binære tegn
CR Count=CR-antal
LF Count=LF-antal
CRLF Count=CRLF-antal
FF Count=FF-antal
VT Count=VT-antal
Bytes Read=Bytes Læst
In the readme.htm file you have:

Code: Select all

  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
But the file is not a UTF-8 file, and therefore the Edgeviewer, HTMLView and MarkdownView plugins cannot open the file when using "Define view method by file type" in Lister.

Use either of these two instead:

Code: Select all

  <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
 
instead.
License #524 (1994)
Danish Total Commander Translator
TC 11.03 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1371a
TC 3.50 on Android 6 & 13
Try: TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
Dalai
Power Member
Power Member
Posts: 9387
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Dalai »

petermad wrote: 2023-11-16, 00:15 UTCThanks for the very well documented plugin.
Thanks!
Here is the Danish translation:
Thanks, added to the .lng file. And I hope that the encoding is correct since it's been a while that I've done that...
In the readme.htm file you have:

Code: Select all

  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
But the file is not a UTF-8 file, and therefore the Edgeviewer, HTMLView and MarkdownView plugins cannot open the file when using "Define view method by file type" in Lister.
Hm, I've never had a problem with specifying UTF-8 charset in HTML files, even if there's not a single UTF-8 encoded character in there. HTMLView is what I use to view the readme files while writing them, but I don't use "Define view method by file type". I'm assuming the same problem exists in the readme.htm files of my other plugins, right?

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
petermad
Power Member
Power Member
Posts: 14795
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *petermad »

2Dalai
Thanks, added to the .lng file. And I hope that the encoding is correct since it's been a while that I've done that
You can download the modified LineBreakInfo.lng file here: https://tcmd.madsenworld.dk/LineBreakInfoDan.zip to avoid that copy-pasting from the forum could change the encoding.


but I don't use "Define view method by file type". I'm assuming the same problem exists in the readme.htm files of my other plugins, right?
I checked Services2m and Startups , which are the ones I have - and yes - the same problem.

I use this for *.htm* files in "Define view method by file type":

Code: Select all

1,5,EdgeViewer.wlx,htmlview.wlx
because I usually want to see the code, not the rendered html page. And for your files, when I press 4, TC only cycles between text mode and it's own "HTML as text" mode (mode 1 and 5). If I disable "Define view method by file type" TC cycles through all available plugins, including EdgeViewer and HTMLView.

If I change:

Code: Select all

  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
in your readme.htm file to:

Code: Select all

  <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
then EdgeViewer and HTMLView also works with "Define view method by file type" enabled. Maybe it is a caused by a bug in TC?
License #524 (1994)
Danish Total Commander Translator
TC 11.03 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1371a
TC 3.50 on Android 6 & 13
Try: TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
Dalai
Power Member
Power Member
Posts: 9387
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *Dalai »

petermad wrote: 2023-11-16, 01:32 UTCI use this for *.htm* files in "Define view method by file type":

Code: Select all

1,5,EdgeViewer.wlx,htmlview.wlx
because I usually want to see the code, not the rendered html page. And for your files, when I press 4, TC only cycles between text mode and it's own "HTML as text" mode (mode 1 and 5). If I disable "Define view method by file type" TC cycles through all available plugins, including EdgeViewer and HTMLView.
I just tried it with TC 10.52 and HTMLView and can confirm the behavior. No idea if Lister is supposed to behave that way.

It has something to do with the auto-detection of UTF-8. Pressing 5 disables UTF-8 (Encoding menu gets enabled) and pressing 4 afterwards switches to the plugin. The same can be observed with other HTML files like those of TCMediaInfo, NLInfo and Uninstaller64 plugins.

Using Lister's Plugins menu to switch to the plugin also works as expected.
Maybe it is a caused by a bug in TC?
It could be. But it's also possible that Lister blocks the switch to a plugin on purpose when (HTML plus) UTF-8 is enabled.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
AntonyD
Power Member
Power Member
Posts: 1246
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: Content plugin translation issue

Post by *AntonyD »

2Dalai
HOW to force this new plugin to use translated strings?
I've changed properly LANG file inside LineBreakInfo folder (installed plugins' destination)
and rebooted the Total - and... nothing. Tried to find inside the INI files some option to force
the using of some LANG id - no luck...

btw and what about:
NEL (U+0085): next line
LS (U+2028): line separator
PS (U+2029): paragraph separator

WIKI says that these are also the Line breaks chars!

And do you have an example of file with mixed Line Break types?

P.S.
By the way, how to distinguish when the file UTF-8 and it does not have a BOM marker from a file
that does not have this marker - because it is NOT even the UTF-8 file? Right now I see only NONE
for all cases. But definitely this is not a true logic. NONE should be used when the file IS indeed
the UTF-8. And when the file is NOT at all the UTF-8 coded - then it should be N/A imho.
#146217 personal license
User avatar
Dalai
Power Member
Power Member
Posts: 9387
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: Content plugin translation issue

Post by *Dalai »

AntonyD wrote: 2023-11-16, 09:41 UTCHOW to force this new plugin to use translated strings?
The plugin doesn't do any translation here, but TC does.
I've changed properly LANG file inside LineBreakInfo folder (installed plugins' destination)
and rebooted the Total - and... nothing.
The language TC uses for content plugin depends on the language used for TC. It's the same two or three letter code that's used for TC language files. For example, if you have set Russian language in TC, i.e. use wcmd_rus.lng, TC looks for a [rus] section in the plugin language file. For German it's wcmd_deu.lng and the section [deu], for Czech it's wcmd_cz.lng and section [cz] and so on.

Every string not found in that section is not translated and is let through 'unfiltered'. And if you use TC's internal English language, strings from content plugins won't be translated.
Tried to find inside the INI files some option to force the using of some LANG id - no luck...
As I said, TC does the translation
btw and what about:
NEL (U+0085): next line
LS (U+2028): line separator
PS (U+2029): paragraph separator
I'm aware of these, but they're Unicode only. For now I'm going to ignore them. I even wanted to ignore FF until I saw it being used in a GPL license file; that was easy to add.
And do you have an example of file with mixed Line Break types?
Sure. Search your %ProgramFiles% directory and you'll probably find quite a few. I've found them among installations of LibreOffice, VMware Workstation, Mozilla Thunderbird and PDFCreator. You can also create one yourself when appending a file with LFs to a file with CRs (or vice versa) via TC's copy and append feature.
By the way, how to distinguish when the file UTF-8 and it does not have a BOM marker from a file
that does not have this marker - because it is NOT even the UTF-8 file? Right now I see only NONE
for all cases. But definitely this is not a true logic. NONE should be used when the file IS indeed
the UTF-8. And when the file is NOT at all the UTF-8 coded - then it should be N/A imho.
The plugin just checks for the existence of a BOM, for UTF-8 it's the first three bytes (the first two bytes for UTF-16). It doesn't check the file's encoding (if there are any Unicode characters in the file) because that's irrelevant for counting line breaks and subsequently determining the line break type. Use a different plugin like EncInfo if you want to know a file's encoding.

To moderators: This post and the previous one should probably be moved to the LineBreakInfo thread.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
white
Power Member
Power Member
Posts: 4618
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Re: Content plugin translation issue

Post by *white »

Moderator message from: white » 2023-11-16, 15:14 UTC

Dalai wrote: 2023-11-16, 12:40 UTC To moderators: This post and the previous one should probably be moved to the LineBreakInfo thread.
Done.
User avatar
petermad
Power Member
Power Member
Posts: 14795
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *petermad »

2AntonyD
HOW to force this new plugin to use translated strings?
Use a language file that has the language abbreviation in the end of the file name before the extension for example wcmd_rus.lng or wcmd_mylang_rus.lng - but NOT wcmd_mylang.lng - and put the translation in a corresponding [rus] section.
License #524 (1994)
Danish Total Commander Translator
TC 11.03 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1371a
TC 3.50 on Android 6 & 13
Try: TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
AntonyD
Power Member
Power Member
Posts: 1246
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *AntonyD »

2white
2petermad
Done.
IMHO some problems during the posts moving have happen - I am opening the topic "Content plugin translation issue" -
where we wrote our latest posts - but I’m forced to be in this topic. So what happened? Why can’t I get into it in the first place?
Yes - for further discussion will correctly write questions here. BUT! The same topic should have remained in its place?
Now I don’t see it at all.
#146217 personal license
User avatar
AntonyD
Power Member
Power Member
Posts: 1246
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *AntonyD »

Use a language file that has the language abbreviation in the end of the file name before the extension
Thanks! Now I get it!

Code: Select all

[rus]
;--- Line Breaks
Line Breaks=Тип Переноса строк
N/A|Binary|LF|CR|CRLF|Mixed=Нет данных|Двоичный файл|LF(\n)|CR(\r)|CRLF(\r\n)|Смесь типов

;--- BOM Type
BOM Type=Тип BOM-маркера
None|UTF-8|UTF-16 LE|UTF-16 BE=Нет данных|UTF-8|UTF-16 LE|UTF-16 BE

Binary Count=Кол-во непечатных символов
CR Count=Кол-во CR (возврат каретки)
LF Count=Кол-во LF (перевод каретки)
CRLF Count=Кол-во CRLF (и CR и LF)
FF Count=Кол-во FF (перевод страницы)
VT Count=Кол-во VT (верт-ая табуляция)
Bytes Read=Байт проанализировано
Here it is my translation in Russian;)
#146217 personal license
User avatar
AntonyD
Power Member
Power Member
Posts: 1246
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *AntonyD »

Search your %ProgramFiles% directory and you'll probably find quite a few
no one!
As a matter of fact - that’s why I asked you to provide at least one example of such a file - where you\your plug would find these commixed line breaks.
#146217 personal license
User avatar
AntonyD
Power Member
Power Member
Posts: 1246
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: LineBreakInfo - Content plugin for information about line break type, BOM type, number of CR/LF/CRLF occurrences

Post by *AntonyD »

The plugin just checks for the existence of a BOM, for UTF-8 it's the first three bytes (the first two bytes for UTF-16). It doesn't check the file's encoding (if there are any Unicode characters in the file) because that's irrelevant for counting line breaks and subsequently determining the line break type. Use a different plugin like EncInfo if you want to know a file's encoding.
No, again - I don’t care about the encoding. There’s no task to find out. BUT! It is understood that the BOM marker is ONLY related to UTF-8 encoded files!!! So it would be very strange to see almost always the same "None" record - for ANY files! THOUGH IT WAS ONLY EXPECTED TO BE SEEN FOR REAL UTF-8 FILES THAT DID NOT HAVE THIS TOKEN! But for all OTHER FILES, `N/A` was expected!
#146217 personal license
Post Reply