nsp's comment about the 260 bytes buffer size seems to be spot on.
The string
Code: Select all
H:\share\download\anime\BD\ソードアート・オンライン アリシゼーション War of Underworld\[アニメ BD] ソードアート・オンライン アリシゼーション War of Underworld 第01話 「北の地にて」 (1920x1080 x264 AAC+コメ).mp4
encoded in UTF-8 would require 264 bytes. With a buffer of only 260 bytes, the last 4 bytes will not be accounted for.
If one also considers the string in the buffer will be terminated with a null byte (in other words, the buffer could only hold max 259 bytes of text data), in total the last 5 bytes from the original string would be missing. Which is exactly what we are seeing here: ").mp4" are the missing last 5 bytes of the original string.
I think this topic should be moved into the bugs section.
Imortant note to
ghisler (just in case you are not already aware of this):
If you are going to address and fix this issue, do
NOT mistakenly assume that a string encoded in UTF-8 will always use less or the same amount of bytes as the same string encoded in UTF-16. For characters in the Unicode code range 000800–00FFFF, the UTF-8 encoded character uses 3 octets/bytes, whereas in UTF-16 the same character will only use 2 octets/bytes. As you might already suspect many CJK characters fall into this code range. It is thus quite possible that a file path string encoded in UTF-8 can require more bytes than the same file path string encoded in UTF-16. To deal with this you could query the UTF-8-encoded byte size of any string and dynamically allocate/resize the buffer accordingly before writing the UTF-8 encoded string to the buffer. If you prefer sticking with pre-allocated constant-size buffers, it should be safe to use a buffer size of (̶3̶/̶2̶)̶ ̶*̶ ̶3̶2̶K̶ ̶=̶ ̶4̶8̶ ̶K̶B̶y̶t̶e̶s̶
3 * 32 KB = 96 KB for UTF-8 file paths.