Sublime Forum

Bug: Non UTF-8 character causes file truncation

#1

When Sublime Text loads a file and encounters a multi-byte UTF-8 character, I assume it “switches on” UTF-8 display. If, later on in the file, a non UTF-8 byte is encountered, Sublime Text stops loading the rest of the file and the file appears truncated in the edit window.

If you open a notepad window (low-level editing), paste in the following text, and save the file, then load the file in Sublime Text the bug will appear. The first special character on line 2 is a multi-byte UTF-8 a-e ligature (ĂŚ) while the one on line 4 is a non UTF-8 double low-9 quotation mark.

This is a new file. æ This text will be displayed. „ Sublime Text will not display this line, nor any line beneath it. Lorem ipsum dolor sit amet.

My fallbackEncoding hasn’t been changed from the default.

Expected behaviour: Load the entire file and display at least a question mark or broken character box instead of the non UTF-8 character.

0 Likes

#2

Thanks for the bug report, this will be fixed in the next beta

0 Likes

#3

This in fixed in 20091029.

If a file containing an invalid UTF-8 sequence is loaded, it’ll now be correctly identified as invalid UTF-8, and the fallback encoding will be used. If it’s explicitly chosen to be UTF-8 via File/Open with Encoding, then invalid sequences will be substituted with a replacement character.

0 Likes

#4

Fixed. Thank you :smile:

0 Likes