[英]Regex that recognises certain words in a sentence and only the first two words
我有个问题。 我想使用正则表达式来识别文本中的某些文本模块。 例如, beach vibe some
。 问题是一些文本模块是三个字长(甚至更长)。 然而,大多数人只使用前两个,也许是第二个单词的缩写。
如果正则表达式只识别前两个单词,是否可以选择说它应该命中? 并且它应该只查看第二个单词的前三个字母?
customerId text element code
0 1 please use beach vibe some beach vibe some 0
1 1 you should use beach vibe beach vibe some 0
2 1 right use beach vib beach vibe some 0
3 3 use floating pow floating power 1
4 3 use floating stuff right now floating stuff 2
import pandas as pd
import copy
import re
d = {
"customerId": [1, 1, 1, 3, 3],
"text": ["please use beach vibe some",
"you should use beach vibe",
"right use beach vib",
'use floating pow',
'use floating stuff right now'],
"element": ['beach vibe some', 'beach vibe some', 'beach vibe some', 'floating power', 'floating stuff']
}
df = pd.DataFrame(data=d)
df['code'] = df['element'].astype('category').cat.codes
print(df)
def f(x):
match = 999
for element in df['element'].unique():
check = bool(re.search(element, x['text'], re.IGNORECASE))
if(check):
#print(forwarder)
match = df['code'].loc[df['element']== element].iloc[0]
break
elif(re.search(' '.join(element.split()[:2]), x['text'], re.IGNORECASE)):
match = df['code'].loc[df['element']== element].iloc[0]
break
else:
s = element.split()
s[1] = s[1][:3]
string = ' '.join(s[:2])
if(bool(re.search(string, x['text'], re.IGNORECASE))):
match = df['code'].loc[df['element']== element].iloc[0]
break
x['test'] = match
return x
#print(match)
df['test'] = None
df = df.apply(lambda x: f(x), axis = 1)
print(df)
customerId text element code test
0 1 please use beach vibe some beach vibe some 0 0
1 1 you should use beach vibe beach vibe some 0 0
2 1 right use beach vib beach vibe some 0 0
3 3 use floating pow floating power 1 1
4 3 use floating stuff right now floating stuff 2 2
为什么要使用正则表达式?
element_parts = element.lower().split()
lookup_key = element_parts[0] + " " + element_parts[1][:3]
if lookup_key in x["text"].lower():
# here we go ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.