Home Download Buy Blog Forum Support

Encoding questions

Encoding questions

Postby bizoo on Wed Sep 28, 2011 8:09 am

Which encoding is currently detected by ST2 before using the fallback_encoding ?
And is there a way to know the actual encoding used for a file in a view (view.settings().get(???)) ?

Like the 'Show Unsaved Changes...' command that only work for UTF8 files, I've lot of issues in plugin when interacting with files with unknown encoding.
Actually 95% of my files are UTF8 or window-1252 (fallback_encoding), so trying the first and in case of exception the second work pretty well.
For the others (UTF16,...), bad luck... Maybe using something like that (http://chardet.feedparser.org) is a solution, understanding that there is no definitive solution for this problem.

Is something planned to ST2 for encoding detection with an API to check a file without opening it in a view ?
bizoo
 
Posts: 886
Joined: Wed Dec 08, 2010 6:53 am
Location: Switzerland

Re: Encoding questions

Postby Mylith on Wed Oct 19, 2011 9:57 am

I have same question postet in this topic. I saw a topic (file encoding), but I assume that this is not top priority thing :(. For me possability to work with other encoding is a must to consider Sublime as production ready editor.
Mylith
 
Posts: 30
Joined: Mon Oct 17, 2011 12:58 pm

Re: Encoding questions

Postby tito on Thu Oct 20, 2011 9:37 am

I'll try to guess ( probably wrong ) I suspect the editor work this way:

The editor try to open the file using UTF-8, if fails it will try the fallback_encoding ( failing in silence )
The file is converted from the fallback_encoding to UTF-8.
The editor edit the file in UTF-8
When you save the file, the editor encode the file with the fallback_encoding ( if the file is not UTF-8 )

Guessed right?
Give APIs, let the community build the rest!
https://github.com/titoBouzout
tito
 
Posts: 855
Joined: Thu Sep 29, 2011 2:27 pm
Location: Montevideo, Uruguay

Re: Encoding questions

Postby Mylith on Thu Oct 20, 2011 1:32 pm

tito wrote:I'll try to guess ( probably wrong ) I suspect the editor work this way:

The editor try to open the file using UTF-8, if fails it will try the fallback_encoding ( failing in silence )
The file is converted from the fallback_encoding to UTF-8.
The editor edit the file in UTF-8
When you save the file, the editor encode the file with the fallback_encoding ( if the file is not UTF-8 )

Guessed right?


This should work that way, but its not.
Open non UTF-8, trying fallback_encoding.
Conver and edit in UTF-8 (guess?)
Save with UTF-8. To have save with same encoding (non UTF-8) You need to use "Reopen with Encoding".
Mylith
 
Posts: 30
Joined: Mon Oct 17, 2011 12:58 pm

Re: Encoding questions

Postby bizoo on Thu Oct 20, 2011 2:41 pm

The way encoding of file work right now is really not optimal.

What I like:
- Having an API method or a setting to get the actual encoding of a buffer.
- Having the choice to change the encoding of a buffer, at anytime or at least at save time.
- Having an 'encoding' parameter to the open_file API command (and to the 'open' command, but maybe it already there, not tested)

And an idea that just come in my mind:
Create a new Buffer class that is the underlying buffer of a view. So it's a kind of invisible view.
A Buffer could be linked to zero or more view (clone).
The Buffer class implement most of the View class (substr, insert, erase, ...).

This new class could be used to open a file 'the ST2 way' (encoding, indenting settings (indent guess), ...) without showing it in the editor.
This could be used to replace all the open('filename') in the plugins when you need data from another file (Diff command, ...).
bizoo
 
Posts: 886
Joined: Wed Dec 08, 2010 6:53 am
Location: Switzerland

Re: Encoding questions

Postby Mylith on Fri Oct 21, 2011 5:42 am

Some encoding related topics:

http://sublimetext.userecho.com/topic/3 ... -encoding/
http://sublimetext.userecho.com/topic/5 ... -encoding/
http://sublimetext.userecho.com/topic/2 ... tatus-bar/
http://sublimetext.userecho.com/topic/5 ... as-option/

I think we need few things:
- Easy way to set and save file with different encoding and keep/remember that encoding for file
- Save as with encoding (and ability to detect this encoding at least) to convert
- Some API methods (it's not so important when first two are done)
- Encoding and line ending info on status bar (minor)

First one is top priority IMHO.
Mylith
 
Posts: 30
Joined: Mon Oct 17, 2011 12:58 pm

Re: Encoding questions

Postby bizoo on Fri Oct 21, 2011 7:19 am

Thanks for searching userecho, it look like there is lot of people with this issue.
Actually this problem is probably very minor for English speaking country, but for us European that work with lot of different language, it's very important.
In a perfect world, everybody use UTF files and there is no issue with encoding :)
Mylith wrote:- Easy way to set and save file with different encoding and keep/remember that encoding for file

The problem is that keeping/remembering the encoding of a file is not possible, because except for UTF files with BOM, there is no way to know the encoding of a file.
The setting could be saved in the session and remembered this way, but when you close the file you lose this information and the next time you open it you have to set the encoding again.

Some editor guess the encoding and even if it's not perfect, it's a nice to have.
bizoo
 
Posts: 886
Joined: Wed Dec 08, 2010 6:53 am
Location: Switzerland

Re: Encoding questions

Postby tito on Fri Oct 21, 2011 5:02 pm

I wrote plugin EncodingHelper some days ago viewtopic.php?f=5&t=3453

Summarizing: is a non-obstructive plugin which runs on own thread with some optimizations and uses python chardet library http://chardet.feedparser.org/ to show encoding on status bar and convert to utf8 from a variete of encodings.
Provides an API
Code: Select all
GuessEncoding(file_name, [list_of_fallback_encodings], False, aCallback).start()

The callback receives as first argument the encoding. BTW, I'm not sure how can you include the code and use the API.

Save as.. with encoding should be relative easy to add to this plugin if the encoding of the buffer is know.

BTW, I'm looking for improvements, python chardet library is currently a gone project and probably should be replaced. If someone knows a well maintained software to guess encoding let me know. Almost all of them are based into the Universal Charset Detector from Mozilla.
Give APIs, let the community build the rest!
https://github.com/titoBouzout
tito
 
Posts: 855
Joined: Thu Sep 29, 2011 2:27 pm
Location: Montevideo, Uruguay

Re: Encoding questions

Postby Mylith on Mon Nov 28, 2011 8:10 am

Dev build 2144 introduces two important features in API http://www.sublimetext.com/forum/viewtopic.php?f=2&t=3862
* Added view.encoding() and view.set_encoding()
* Added view.line_endings() and view.set_line_endings()
Mylith
 
Posts: 30
Joined: Mon Oct 17, 2011 12:58 pm

Re: Encoding questions

Postby bizoo on Mon Nov 28, 2011 8:23 am

Mylith wrote:Dev build 2144 introduces two important features in API http://www.sublimetext.com/forum/viewtopic.php?f=2&t=3862
* Added view.encoding() and view.set_encoding()
* Added view.line_endings() and view.set_line_endings()

Thanks for the answer, I already notice that and started to use it.
There are still missing stuff but it's a good start.
bizoo
 
Posts: 886
Joined: Wed Dec 08, 2010 6:53 am
Location: Switzerland

Next

Return to Technical Support

Who is online

Users browsing this forum: Exabot [Bot] and 17 guests