Sublime Forum

Encoding questions

#1

Which encoding is currently detected by ST2 before using the fallback_encoding ?
And is there a way to know the actual encoding used for a file in a view (view.settings().get(???)) ?

Like the ‘Show Unsaved Changes…’ command that only work for UTF8 files, I’ve lot of issues in plugin when interacting with files with unknown encoding.
Actually 95% of my files are UTF8 or window-1252 (fallback_encoding), so trying the first and in case of exception the second work pretty well.
For the others (UTF16,…), bad luck… Maybe using something like that (http://chardet.feedparser.org) is a solution, understanding that there is no definitive solution for this problem.

Is something planned to ST2 for encoding detection with an API to check a file without opening it in a view ?

0 Likes

Changing the default file encoding
How to retrieve the current's view content/buffer
How to retrieve the current's view content/buffer
#2

I have same question postet in this topic. I saw a topic (file encoding), but I assume that this is not top priority thing :frowning:. For me possability to work with other encoding is a must to consider Sublime as production ready editor.

0 Likes

#3

I’ll try to guess ( probably **wrong **) I suspect the editor work this way:

The editor try to open the file using UTF-8, if fails it will try the fallback_encoding ( failing in silence )
The file is converted from the fallback_encoding to UTF-8.
The editor edit the file in UTF-8
When you save the file, the editor encode the file with the fallback_encoding ( if the file is not UTF-8 )

Guessed right?

0 Likes

#4

[quote=“tito”]I’ll try to guess ( probably **wrong **) I suspect the editor work this way:

The editor try to open the file using UTF-8, if fails it will try the fallback_encoding ( failing in silence )
The file is converted from the fallback_encoding to UTF-8.
The editor edit the file in UTF-8
When you save the file, the editor encode the file with the fallback_encoding ( if the file is not UTF-8 )

Guessed right?[/quote]

This should work that way, but its not.
Open non UTF-8, trying fallback_encoding.
Conver and edit in UTF-8 (guess?)
Save with UTF-8. To have save with same encoding (non UTF-8) You need to use “Reopen with Encoding”.

0 Likes

#5

The way encoding of file work right now is really not optimal.

What I like:

  • Having an API method or a setting to get the actual encoding of a buffer.
  • Having the choice to change the encoding of a buffer, at anytime or at least at save time.
  • Having an ‘encoding’ parameter to the open_file API command (and to the ‘open’ command, but maybe it already there, not tested)

And an idea that just come in my mind:
Create a new Buffer class that is the underlying buffer of a view. So it’s a kind of invisible view.
A Buffer could be linked to zero or more view (clone).
The Buffer class implement most of the View class (substr, insert, erase, …).

This new class could be used to open a file ‘the ST2 way’ (encoding, indenting settings (indent guess), …) without showing it in the editor.
This could be used to replace all the open(‘filename’) in the plugins when you need data from another file (Diff command, …).

0 Likes

#6

Some encoding related topics:

sublimetext.userecho.com/topic/3 … -encoding/
sublimetext.userecho.com/topic/5 … -encoding/
sublimetext.userecho.com/topic/2 … tatus-bar/
sublimetext.userecho.com/topic/5 … as-option/

I think we need few things:

  • Easy way to set and save file with different encoding and keep/remember that encoding for file
  • Save as with encoding (and ability to detect this encoding at least) to convert
  • Some API methods (it’s not so important when first two are done)
  • Encoding and line ending info on status bar (minor)

First one is top priority IMHO.

0 Likes

#7

Thanks for searching userecho, it look like there is lot of people with this issue.
Actually this problem is probably very minor for English speaking country, but for us European that work with lot of different language, it’s very important.
In a perfect world, everybody use UTF files and there is no issue with encoding :smile:

The problem is that keeping/remembering the encoding of a file is not possible, because except for UTF files with BOM, there is no way to know the encoding of a file.
The setting could be saved in the session and remembered this way, but when you close the file you lose this information and the next time you open it you have to set the encoding again.

Some editor guess the encoding and even if it’s not perfect, it’s a nice to have.

0 Likes

#8

I wrote plugin EncodingHelper some days ago EncodingHelper ( Encoding on status bar, Convert to UTF8 )

Summarizing: is a non-obstructive plugin which runs on own thread with some optimizations and uses python chardet library chardet.feedparser.org/ to show encoding on status bar and convert to utf8 from a variete of encodings.
Provides an API GuessEncoding(file_name, [list_of_fallback_encodings], False, aCallback).start()
The callback receives as first argument the encoding. BTW, I’m not sure how can you include the code and use the API.

Save as… with encoding should be relative easy to add to this plugin if the encoding of the buffer is know.

BTW, I’m looking for improvements, python chardet library is currently a gone project and probably should be replaced. If someone knows a well maintained software to guess encoding let me know. Almost all of them are based into the Universal Charset Detector from Mozilla.

0 Likes

#9

Dev build 2144 introduces two important features in API https://forum.sublimetext.com/t/dev-build-2144/3234/1

  • Added view.encoding() and view.set_encoding()
  • Added view.line_endings() and view.set_line_endings()
0 Likes

#10

[quote=“Mylith”]Dev build 2144 introduces two important features in API https://forum.sublimetext.com/t/dev-build-2144/3234/1

  • Added view.encoding() and view.set_encoding()
  • Added view.line_endings() and view.set_line_endings()[/quote]

Thanks for the answer, I already notice that and started to use it.
There are still missing stuff but it’s a good start.

0 Likes