I want to use fuzzy matching to check if dataframe contain keywords.
However, it is very slow to use apply
.
Are there any faster methods?
Can we use str
or re
?
import regex
result = df['sentence'].apply(lambda x: regex.compile('(keyword){e<4}').findall(x)) #slow
Thank you very much.
Why're you compiling inside the apply? That literally defeats its purpose. Also, the best way to speed up an apply
call is to not use apply
.
Without context to what you're actually trying to match, I present to you:
p = regex.compile('(keyword){e<4}')
result = [p.findall(x) for x in df['sentence']]
My tests show that a list comprehension based regex match supersedes str
methods in terms of performance. Well, take that with a grain of salt, because it always depends on your data and what you're trying to match.
You may want to consider using re.search
instead of findall if you just want a single match (for more performance).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.