Sublime Forum

How can I use utf-8 codes in a buffer with view.substr()?

#1

Hi,

While I had tried to use view.substr() to extract Japanese character (utf-8) on a buffer, it didn’t work.
Is it possible to handle this correctly ?

Text on a buffer:

あいうえお

and I had tried to use view.substr() on the console:

print view.substr(sublime.Region(0,2))

then, codecs.py had caused the following error messages:

>>> print view.substr(sublime.Region(0,2))
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/codecs.py", line 352, in write
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/codecs.py", line 351, in write
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)

It would be much appreciated if you could give any answer.
Thank you,

0 Likes

#2

It works on Windows.
Looks like the encoding used for the console is ACSII, which mean that you couldn’t print these chars.
So the command work fine but you couldn’t print the result to the console.

Try to type only:

view.substr(sublime.Region(0,2))

Don’t know how to change encoding in OS X.

0 Likes

#3

Thanks for your prompt reply.

Yes, you may be right because it was no problem to use substr() w/o ‘print’, and my environment is os x actually.

However, this leads me to another question about handling utf-8 characters on Python.

It seems NOT to handle utf-8 characters in webbrowser module same as console.
The following code would be fail to Google the query, such as 寿司 (sushi)

# import webbrowser
webbrowser.open_new_tab('http://www.google.com/search?q=寿司')

Given that Python would be able to handle utf-8 along with the following statements,
is there any solution to handle utf-8 correctly even in plugin using webbrowser module of the Sublime Text [23] ?

#!/usr/bin/env python
# -*- coding: utf-8 -*-

Thank you,

0 Likes

#4

You must give the source an encoding using the header (like your example) AND use an unicode string for the url by prefixing it with u:

[code]# -- coding: utf-8 --
import sublime, sublime_plugin
import webbrowser

class ExampleCommand(sublime_plugin.WindowCommand):
def run(self):
webbrowser.open_new_tab(u’http://www.google.com/search?q=寿司’)[/code]

0 Likes

#5

Thanks again, bizoo.

Now I have doubt that this problem might come up ONLY OS X because …

a. My original code can work all right on Windows, even though doesn’t work on os x
b. The sample code you can provide me doesn’t work on os x as well

The difference of Python’s behavior might come from implementation of Python interpreter. Only os x version uses the system Python.

Is there any workaround for this ? Any ideas ?

0 Likes

#6

URLs need to be escaped, and typically need to be encoded in UTF-8. The following worked for me on OSX:

# -*- coding: utf-8 -*-
import sublime, sublime_plugin
import webbrowser
import urllib

class ExampleCommand(sublime_plugin.WindowCommand):
    def run(self):
    	quoted = urllib.quote_plus(u'寿司'.encode('utf-8'))
        webbrowser.open_new_tab('http://www.google.com/search?q='+quoted)
0 Likes

#7

Thanks sapphirehamster,

Now I have a clear understanding for that, and I can close the problem !!!

Thanks again, sapphirehamster, bizoo.
Kind regards,

0 Likes