简体   繁体   中英

Python: Use list.index with regular expression

I have lists of strings of which i want to extract a certain value:

["bla","blabla","blablabla","time taken to build model: 5.1 seconds", "blabla"]

Normally I would look for the index of the element I am looking for by

list.index("time taken")

But since the time changes, I think of using a regular expression. I just can't figure out how to do this.

So how can I find out the index of a list element that matches a certain regex like eg re.match()? (Without iterating through the list, this would take to long)

Not sure if there is a built in method but its easy to do this with list comprehensions in O(n) time.

With regular expressions:

import re
your_list = ["bla","blabla","blablabla","time taken to build model: 5.1 seconds", "blabla"]
regex = re.compile("^time taken")
idxs = [i for i, item in enumerate(your_list) if re.search(regex, item)]

And without regular expressions:

your_list = ["bla","blabla","blablabla","time taken to build model: 5.1 seconds", "blabla"]
query_term = 'time taken'
idxs = [i for i, item in enumerate(your_list) if item.startswith(query_term)]

You can make it return the first found index or last found index depending or parameterise it in a method to provide flexibility.

To find an element in a list, unless you have extra information (such as order of elements), you have to iterate through it. If you really want to go faster, change the structure, use a database or use another language.

Regex solution need iterate through sequence. If you want get strings with some prefix or suffix, you should implement Trie it's the fastest solution of a problem. Also you can implement solution with cycled hashes of different lengths, but in some cases it will be uneffciient.

If your priority is to get first match in the sequence , then only index() is useful. That's how you do it, if you want to use regex in index() method

lst=["bla","blabla","blablabla","time taken to build model: 5.1 seconds", "blabla"]

lst.index([i for i in lst if re.findall(r'^time taken', i)][0])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM