Searching for files containing only 0x00 bytes does not work
Moderators: petermad, Stefan2, white, Hacker
-
- Junior Member
- Posts: 3
- Joined: 2010-06-08, 11:29 UTC
Searching for files containing only 0x00 bytes does not work
Hi,
for some unknown reason on my system some files are overwritten with all 0x00 (file length is unchanged).
I tried to search for those file and did the following:
-Search text = "[\x01-\xFF]"
-find files which doesn't contain the search TEXT (called "Finde Dateien, die den Text NICHT enthalten" in german)
-Reg.Expression
-File size > 0 bytes
This didn't work for me! It didn't find any file, even those are not found which are known to only contain 0x00 bytes!
Then i did some more tests:
Seaching for files containing "\x00" (RegEx enabled) did find all files even if they don't contain any 0x00 byte!
I helped myself by setting the search text to "[\x02-\xFF]", which is not fully correct but will do almost what i wanted.
Unfortunally, that did also find files which only contains "0x0D 0x0A" sequence(s), but this seems to be a result of the RegEx search, which doesn't search globally, but line-by-line.
Conclusion:
There is a bug searching for regex "\x00"
PS: Any hints on how to optimize my search for files which only contains 0x00 bytes?
for some unknown reason on my system some files are overwritten with all 0x00 (file length is unchanged).
I tried to search for those file and did the following:
-Search text = "[\x01-\xFF]"
-find files which doesn't contain the search TEXT (called "Finde Dateien, die den Text NICHT enthalten" in german)
-Reg.Expression
-File size > 0 bytes
This didn't work for me! It didn't find any file, even those are not found which are known to only contain 0x00 bytes!
Then i did some more tests:
Seaching for files containing "\x00" (RegEx enabled) did find all files even if they don't contain any 0x00 byte!
I helped myself by setting the search text to "[\x02-\xFF]", which is not fully correct but will do almost what i wanted.
Unfortunally, that did also find files which only contains "0x0D 0x0A" sequence(s), but this seems to be a result of the RegEx search, which doesn't search globally, but line-by-line.
Conclusion:
There is a bug searching for regex "\x00"
PS: Any hints on how to optimize my search for files which only contains 0x00 bytes?
Problem with regex is that \x00 is an end of line. So I think any regex engine will find no matches for \x00 character. Only hex search can find zero bytes, but it doesn't allow to check if some other bytes exist.
I think this plugin can help you. All you need - to write script that will return Yes if file contain only zero bytes. It will allow to search using script.wdx plugin's field.
I think this plugin can help you. All you need - to write script that will return Yes if file contain only zero bytes. It will allow to search using script.wdx plugin's field.
- ghisler(Author)
- Site Admin
- Posts: 50263
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
-
- Junior Member
- Posts: 3
- Joined: 2010-06-08, 11:29 UTC
Exact.MVV wrote:Problem is that he need to search files that contains 00 bytes only (entire file filled with 00 bytes), but not files that contain at least one 00 byte.
That is not fully true. I tried two other tools ("Notepad++" ("Find In Files") and "PowerGREP") and they both handle it like one would expect.MVV wrote:Problem with regex is that \x00 is an end of line. So I think any regex engine will find no matches for \x00 character.
As the source code of Notepad++ is available, you might want to look into their implementation.
- ghisler(Author)
- Site Admin
- Posts: 50263
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Unfortunately the used RegEx library doesn't allow to search for NULL bytes, sorry.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
-
- Junior Member
- Posts: 3
- Joined: 2010-06-08, 11:29 UTC
If the file contents contains null-characters, searching using regex is not reliable.
To give some more insight, here's my beta test report from some time ago:
By the way. The website supporting the RegEx library: http://www.regexpstudio.com/ seems to be dead for quite some time. Does anyone know what happened?
To give some more insight, here's my beta test report from some time ago:
Unfortunately these are known limitations of the RegEx library and cannot be helped. I hope in the future Christian will find a better library to use.white (Posted on beta forum: Lister regex search and null-characters) wrote:When searching in Lister using Regex, it is known you cannot find null-characters (with hex value 00) using "\x00". Instead searching for "\x00" seems to find the end of a line. It is because the Regex library was implemented using null-terminated strings.
But also if the file contents contains null-characters, searching using regex is not reliable. So searching binaries of any kind using regex is not reliable. Here are some quirks I found:
1) Searching for "\x01" finds the corrects characters in lines without null-characters. Searching for "\x01" does not match the end of a line.
But in lines with one or more null-characters it also find null-characters and end of the line. TC finds things that aren't there.
2) When searching in a line containing a null-character, searching for "$", "\Z" or "\x00" finds the end of the line one or two characters after the actual end of the line.
Example:
Create the following file.Repeatedly search for "t$".Code: Select all
some text some[NULL]text some text
Also try to search repeatedly for "..$".
3) When searching backwards TC does not find matches after a null-character.
Example:
Open the file mentioned in the example above in Lister.
Go to the end of the file.
Repeatedly search backwards for "e". (regex checked!)
So searching in binaries can find things that aren't there and it may fail to find things that are there. I think this should be fixed or the user should be made aware of this.
By the way. The website supporting the RegEx library: http://www.regexpstudio.com/ seems to be dead for quite some time. Does anyone know what happened?
The website is online again.By the way. The website supporting the RegEx library: http://www.regexpstudio.com/ seems to be dead for quite some time. Does anyone know what happened?
- ghisler(Author)
- Site Admin
- Posts: 50263
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
You can't do it, it's impossible with the currently used RegEx library because it is line based.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com