Sublime Forum

New to Python, looking for critique. Select tag contents

#1

So I’m completely new to Python and finally bothered to mess around with it and plugins. Perhaps I picked something not so beginner friendly to start with, but oh well. The plugin functions perfectly though I pretty much have no confidence in the code. It feels horrendous and is probably incredibly inefficient. Not being familiar with either Python or the Sublime API probably didn’t help much either.

Any who, I’m looking for any advice on how I can make it better, anything I did wrong, or any part of it that’s inefficient or the likes. Thanks to any one who has the know how and doesn’t mind helping me out.

[code]class selectTagContents(sublimeplugin.TextCommand):
def run(self, view, args):
current = view.sel()[0].a

	closedPoints = ]
	closedTags = ]
	openedPoints = ]
	openedTags = ]
	
	for region in view.findAll('</[a-zA-Z0-9]+>'):
		if region.a >= current:
			closedPoints.append(region.a)
	
	for region in view.findAll('(?<=</)[a-zA-Z0-9]+(?=>)'):
		if region.a >= current:
			closedTags.append(view.substr(region))
	
	for region in view.findAll('<^/].+?^/]>'):
		if region.b <= current:
			openedPoints.append(region.b)
	
	for region in view.findAll('(?<=<)[a-zA-Z0-9]+(?!.*?/>)'):
		if region.b <= current:
			openedTags.append(view.substr(region))
	
	oLen = len(openedPoints)
	cLen = len(closedPoints)
	
	if cLen > 0 and oLen > 0:
		openedTags.reverse()
		openedPoints.reverse()
		
		closed = ([closedTags[i], closedPoints*]) for i in range(0, cLen)]
		opened = ([openedTags[i], openedPoints*]) for i in range(0, oLen)]
		
		done = None
		
		for cTag, cPoint in closed:
			for oTag, oPoint in opened:
				if cTag == oTag:
					selection = sublime.Region(oPoint, cPoint)
					
					view.sel().clear()
					view.sel().add(selection)
					
					done = True
					
					break
			
			if done is True:
				break[/code]

Some specific questions I had if anyone can answer them.

1)* I REALLY REALLY hate Pythons data structures. They are driving me nuts. I searched everywhere and couldn’t find anything, is there anyway to get the index of the current item when looping through a list? That would make it so I could cut out the loop comprehensions converting my lists to a list of tuples but I couldn’t find how to do it anywhere.

The possible solutions I came across was .index(x) but that doesn’t work because it finds the nearest and that doesn’t work when tags repeat and I may be needing one further down the list. The other was converting them to a dictionary but that didn’t work because the only thing I could really use as a key was the tag and that doesn’t work because tags aren’t unique and there may need to be multiple entries for the same tag.

2) Is there anyway to break out of multiple loops? Such as in PHP you can just do break for the x amount of loops you want to break out of. I’d think there’d be a better way to stop the outer for loop at the end rather than having to set a check.

3) Lastly, and it’s more of a general question I guess, when I’m converting between data structures why does it mess with the types of the values in them? For instance in this plugin when I first pull the tag names and points, they are strings and integers. However, when I convert those lists into tuples it converts the strings to unicode strings (I think it does at least, that’s what the u in front of a string like u’my string’ means right?) and integers into long(?) integers (integers show up like 5476L). It doesn’t break anything I’m just wondering why it messes with them in the first place.

Thanks again for any help anyone can offer.*

0 Likes

#2

First things first: can you please explain what the command does? It will help me critique it.

Second, just some quick general notes (I have to run, so I’ll do a "full’ critique later):
By the way, I love Python, so I might be a little biased, but if you’re having trouble with Data Structures, well let’s just say they really rock. I’ll try and answer some of your questions.

  1. “is there anyway to get the index of the current item when looping through a list?” Sure, when you go over the list, do this:
for i, item in enumerate(list_name): print i

This basically makes a new list of tuples, where the first item is the index and the next is the original item. Then it puts it in the variables “i” and “item” respectively.
There’s a lot of other cool stuff when it comes to going over lists, check this: http://docs.python.org/tutorial/datastructures.html#looping-techniques (you should read the whole chapter, and in fact the entire tutorial if you have the time).

  1. Breaking out of multiple loops is not really easy, like in most languages. The usual advice is to refactor the code, so that the inner loop is a function (and you can use return). When I go over your code I might be able to suggest more.

  2. I’m not sure where it converts things to unicode in your code. It might be after using the API, which works only with unicode (and so everything passed there is automatically turned into unicode).

Hope that helps, I’ll give some more thoughts later (when I know what the code is supposed to do! :smile:

0 Likes

#3

Hey thanks for the response.

It selects the contents of a tag (html, xml) etc… The challenges and what ended up making it be more robust, is making sure it captures the tag the cursor is actually inside of, and making sure to ignore things like self closing tags.

Given the following:

[code]

Title

Some text, with a link.

[/code] If the cursor was on the word title and I used the command, it would select 'Title'. If it was on the empty line above the closing head tag, it would select, \nTitle\n\n. If it was on the empty line below the closing head tag, it would select everything inside the html tag.

As you can see for the command to be useful it gets a bit complicated and you can’t just have it select everything between the nearest opening and closing tags.

No problem with me if you have any bias, might be good so I can hear the things that are really great about it as I’m new and haven’t explored Python that much.

Pythons data structures are driving me nuts because they’re all so similar. As far as I can tell the major difference between lists and tuples is that tuples are immutable and lists aren’t. The biggest response I’ve seen for issues with Python is ‘well if you’re running into that you’re doing it wrong’. So I just wonder well, if you need a list to be immutable why just not edit it? Then you have dictionaries, which are basically just lists with named keys. Then you have sets that are unordered tuples. To me they all just seem like arrays with minor differences.

  1. If Python had foreach, that’d be awesome. WTB foreach $list as $key => $value ! I think basically what I did was I created that list of tuples manually instead of having enumerate do it.

  2. It probably does need a good bit of refactoring, but with my limited knowledge of Python, and since I just got it working, I haven’t really come up with anything.

  3. I’m pretty sure it’s Python. I print the data as I first got it and it’s fine, then I print it again after running it through the list comprehension, which is pretty much just tuple() and has nothing to do with the API, and it’s changed.

Also I really hate not having to declare variables. Feels so messy. Though I’m loving not having PHP’s magical wisdom trying to fiddle with converting between types and the annoyances between loose and strict comparisons because of it.

Thanks again.

0 Likes

#4

What do you mean? The way I see it, Python’s “for” is actually a “foreach”. Python doesn’t have the traditionnal “for”. You can really loop through lists or dicts in any way you see fit.

By the way, you might want to look at sublimator’s WebDevelopment package, and perhaps the Beautiful Soup parser.
If you can make something work with malformed XML/HTML, that’d be awesome!

Oh and beware the Regex Aneurysm :open_mouth:

0 Likes

#5

As gpfsmurf said, python’s “for” is like a foreach, since you can just loop over anything.

About data types, I wouldn’t worry too much about Tuples vs Lists.
The main difference is the “meaning” you understand from the fact something is a tuple. I.e., functions will usually return a tuple of information, which you obviously wouldn’t want to modify.
You can just usually use a list.
Dictionaries are something completely different, and they have much different uses.

One quick note: you keep using region.a. Why not use region.begin()/region.end()? They guarantee to always return the beginning and ending points, no matter which way the selection is “aimed”.

0 Likes

#6

By foreach I meant having the ability to grab the key / index of any array type data structure just by specifying it. In PHP you can grab the key / index of any array just by specifying as such > foreach arr as KEY => var. If you don’t it’s just the same as Python > foreach arr as var. Having to convert lists into tuples whenever you want to access the index of it, or creating a variable to act as a counter, seems silly.

Is there any syntactic shortcut to append in Python? Anytime you want to add something to a list having to call append is a bit annoying. Something similar to PHP’s ]?

@gpfsmurf Not sure I get how you’d want it to work with malformed markup. Could you give me a quick example?

@edanm I suppose I could but I don’t think it would change the functionality of it at all. I suppose it could matter if you have a big selection going through multiple tags but I don’t know why’d you do that and then run this command, and if you did I’m not sure what the expected result would be. For the rest of the time I’m pulling one or the other it’s just from the results of findAll which is always going to select in one direction as far as I know, so it’s just shorter. If begin() or end() is more efficient though or something I can swap to them.

e: Actually I could see a use for it sort of, but it lead me to see a way I could extend the command a bit. Making it work with multiple selections! Gonna work on that real quick.

0 Likes

#7

I’m not sure what you mean by “having to convert lists into tuples”. When do you need to do that?

The “enumerate” function works exactly like PHP’s foreach from what I see, except you have to think about using it (and type more :smile:.
It doesn’t work on dictionaries, since in Python they’re considered fundamentally different. There, you have things like my_dict.iteritems() to get keys and values.
Does that help you any?

0 Likes

#8

It’s not giving me trouble anymore. Just a little bone I have to pick with the language I suppose :stuck_out_tongue:

Also the command now works with multiple selections and if you have a big selection spanning over tags it will now consider the selections boundaries when looking for what tag’s contents to select.

[code]class selectTagContents(sublimeplugin.TextCommand):
def run(self, view, args):
selections = ]

	closedPoints = ]
	closedTags = ]
	openedPoints = ]
	openedTags = ]
	
	for region in view.findAll('</[a-zA-Z0-9]+>'):
		closedPoints.append(region.a)
	
	for region in view.findAll('(?<=</)[a-zA-Z0-9]+(?=>)'):
		closedTags.append(view.substr(region))
	
	for region in view.findAll('<^/].+?^/]>'):
		openedPoints.append(region.b)
	
	for region in view.findAll('(?<=<)[a-zA-Z0-9]+(?!.*?/>)'):
		openedTags.append(view.substr(region))
	
	closed = ]
	opened = ]
	
	cLen = len(closedPoints)
	oLen = len(openedPoints)
	
	if cLen > 0 and oLen > 0:
		done = None
		
		openedTags.reverse()
		openedPoints.reverse()
		
		closed = ([closedTags[i], closedPoints*]) for i in range(0, cLen)]
		opened = ([openedTags[i], openedPoints*]) for i in range(0, oLen)]
		
		for region in view.sel():
			curBegin = region.begin()
			curEnd = region.end()
			
			for cTag, cPoint in closed:
				for oTag, oPoint in opened:
					if cTag == oTag and cPoint >= curEnd and oPoint <= curBegin:
						selections.append(sublime.Region(oPoint, cPoint))
						
						done = True
						
						break
				
				if done is True:
					break
			
			done = None
		
		view.sel().clear()
		
		for region in selections:
			view.sel().add(region)[/code]

One issue I had though, I was trying to use RegionSet.addAll() instead of .add() but it kept throwing an error at me.

for region in selections: view.sel().add(region)
works but,

view.sel().addAll(selections)

doesn’t

(67, 319), (185, 311)] Traceback (most recent call last): File ".\sublimeplugin.py", line 119, in execTextCommand File ".\textManip.py", line 71, in run Boost.Python.ArgumentError: Python argument types in RegionSet.addAll(RegionSet, list) did not match C++ signature: addAll(class SelectionSet {lvalue}, class SelectionSet)
That was the error and the first line is the contents of selections.**

0 Likes

#9

In Python, you usually shouldn’t ask what’s more efficient. Just assume that everything works efficiently enough, and only ask questions when things are too slow. :smile:

I think the problem you’re talking about is that you’re trying to send a list to the addAll function, which is supposed to take a RegionSet, not a list. Since the api itself is written in C++, it actually checks the function signature and tells you when there’s a problem (that’s the whole thing with the “Boost” mentioned in the error message; Boost is what translates the Python to C++ code).

0 Likes

#10

Yeah I suppose it’s just sort of the OCD in me. Whenever I’m learning a new language, or something new about a language, I usually try to find out which way is the best to do whatever it is I’m doing. I’m familiar with the handful of quotes regarding premature optimization but the OCD in me can’t help it when I’m learning something new. Most of the time it helps me discover a handful of ways to do one thing. Then, if I ever run into a similar situation again I have a better understanding of the tools available to me than if I had just used the first way that worked. It makes starting off a bit verbose so-to-speak, but down the road I think it puts me in a better position.

Yeah I figured as much. How exactly do I go about turning a list of regions into a RegionSet? I looked through the API reference and nothing jumped out at me. No methods of RegionSet return a RegionSet, and the only other method that returns a RegionSet is view.sel(). Or is addAll() not intended for what I’m trying to use it for?

0 Likes

#11

Technically, you could just create one by doing “rs = sublime.RegionSet()” like you create a new Region.
But I tried this, and I got an error saying that you can’t create a RegionSet from Python. So supposedly, it’s only for internal use in Sublime, not for plugins.

0 Likes

#12

Ah ok. Thanks.

0 Likes

#13

I think I refactored this as best I can. Much cleaner than any of the previous iterations. Any last comments or critiques?

[code]class selectTagContents(sublimeplugin.TextCommand):
def run(self, view, args):
selections = ]

	closed = zip(
		[view.substr(tag) for tag in view.findAll('(?<=</)[a-zA-Z0-9]+(?=>)')],
		[region.begin() for region in view.findAll('</[a-zA-Z0-9]+>')]
	)
	
	opened = zip(
		[view.substr(tag) for tag in reversed(view.findAll('(?<=<)[a-zA-Z0-9]+(?!.*?/>)'))],
		[region.end() for region in reversed(view.findAll('<[^/].*?^/]>'))]
	)
	
	if len(closed) > 0 and len(opened) > 0:
		for region in view.sel():
			curBegin = region.begin()
			curEnd = region.end()
			
			for cTag, cPoint in closed:
				for oTag, oPoint in opened:
					if cTag == oTag and cPoint >= curEnd and oPoint <= curBegin:
						selections.append(sublime.Region(oPoint, cPoint))
						
						break
				else:
					continue
				
				break
		
		view.sel().clear()
		
		for region in selections:
			view.sel().add(region)[/code]

e: Not sure how I missed how the command is fundamentally broken : / It’s easily broken with any repeated tags in a file. I’ll have to mess with it later to see if I can rethink how it goes about figuring out what tag the cursor is in.

0 Likes