Sublime Forum

Unicode issues with subprocess

#1

Hi all,

I’m a bit stymied by an issue I’m encountering that seems to do with the ST3 version of python or environment, because it doesn’t happen when trying the same function in the Python 3 REPL. I’m doing some work with git and passing it a diff file to stage. Here’s the relevant code:

new_diff = "\n".join("\n".join(hunk) for i, hunk in enumerate(chunk(lines(diff))) if i in choices) if final_line and new_diff.splitlines()-1] != final_line: new_diff += (u"\n" + final_line) p = subprocess.Popen('git', 'apply', '--cached', '--recount', '--allow-overlap'], stderr=subprocess.STDOUT, stdin=subprocess.PIPE, universal_newlines=True) p.communicate(input=new_diff)

In the REPL, that does what is intended - it stages a git commit with the diff file that I pass. In Sublime Text 3, the same code fails with the following error:

UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in position 106: ordinal not in range(128)
So something in Sublime Text is assuming that one of my unicode strings is ascii—which is surprising to me, since Python 3 is unicode throughout. I would pass the diff as utf-8 encoded bytes, but subprocess needs a str, not bytes. Does anyone have a suggestion to deal with this issue?

0 Likes

#2

It’s hard to tell without knowing what the content of diff and newline is. I get the same effect when i do this:

new_diff = u"\n" + '\ufeff' p = subprocess.Popen('git', 'status'], stderr=subprocess.STDOUT, stdin=subprocess.PIPE, universal_newlines=True) p.communicate(input=new_diff)

it works when i change “final_line” to raw:

new_diff = u"\n" + r'\ufeff'

afaik, you can convert bytes to str like this:

new_diff = str(b'\ufeff', 'utf-8')
0 Likes

#3

Thanks. I ended up addressing it by decoding the subprocess byte stream to unicode, doing my work with unicode streams, and then encoding back into a byte stream to pass as input to subprocess. The kind of quirk is that when universal newlines is enabled, subprocess ONLY accepts strings as input, and when it’s disabled, it ONLY accepts byte streams.

0 Likes