I'm trying to to sort through a file line by line, comparing the beginning with a string from a list, like so:
for line in lines:
skip_line = True
for tag in tags:
if line.startswith(tag) is False:
continue
else:
skip_line = False
break
if skip_line is False:
#do stuff
While the code works just fine, I'm wondering if there's a neater way to check for this condition. I have looked at any()
, but it seems to just give me the possibility to check if any of my lines start with a fixed tag (not eliminating the for
loop needed to loop through my list.
So, essentially I'm asking this:
Is there a better, sleeker option than using a for loop to iterate over my tags
list to check if the current line starts with one of its elements?
As Paradox pointed out in his answer: Using a dictionary to lookup if the string exists has O(1) complexity and actually makes the entire code look a lot cleaner, while being faster than looping through a list. Like so:
tags = {'ticker':0, 'orderBook':0, 'tradeHistory':0}
for line in lines:
if line.split('\t')[0] in tags:
#do stuff
If you're determined to pull this down into a one-liner, you can use a generator:
tagged_lines = (line for line in lines if any(line.startswith(tag) for tag in tags))
for line in tagged_lines:
# Do something with line here
Of course, how readable this is is a different question.
You've probably seen syntax like [x*x for x in range(10)]
before, but by swapping the []
for ()
, we instead generate each item only when it's asked for.
Instead of iterating over your tags list, you can put all your tags inside a HashMap and do a simple lookup like myMap.exists("word"). This would be much faster that iterating through your tags list and works in O(1) complexity. In python its actually a dictionary data structure. http://progzoo.net/wiki/Python:Hash_Maps
This has been asked before. Take a look at this post for more solutions. I would flag this post as a duplicate but I still do not have the reputation.
https://stackoverflow.com/a/10477481/5016492
You'll need to modify the regular expression so that it looks at the start of the line. Something like this should work for you '^tag' .
In fact any()
will do the job
Looping each line
for line in lines:
tagged = any(lambda: line.startswith(y), tags)
Any list start with any tag
any(lambda x: any(lambda y: x.startswith(y), tags), lines)
Filter tagged lines
filter(lambda x: any(lambda y: x.startswith(y), tags), lines)
How about a combination off any() and filter() like in this example:
# use your data here ...
mytags = ('hello', 'world')
mylines = ('hello friend', 'you are great', 'world is cruel')
result = filter(lambda line: any(map(lambda tag: line.startswith(tag), mytags)), mylines)
print result
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.