[英]Regex that recognises certain words in a sentence and only the first two words
我有個問題。 我想使用正則表達式來識別文本中的某些文本模塊。 例如, beach vibe some
。 問題是一些文本模塊是三個字長(甚至更長)。 然而,大多數人只使用前兩個,也許是第二個單詞的縮寫。
如果正則表達式只識別前兩個單詞,是否可以選擇說它應該命中? 並且它應該只查看第二個單詞的前三個字母?
customerId text element code
0 1 please use beach vibe some beach vibe some 0
1 1 you should use beach vibe beach vibe some 0
2 1 right use beach vib beach vibe some 0
3 3 use floating pow floating power 1
4 3 use floating stuff right now floating stuff 2
import pandas as pd
import copy
import re
d = {
"customerId": [1, 1, 1, 3, 3],
"text": ["please use beach vibe some",
"you should use beach vibe",
"right use beach vib",
'use floating pow',
'use floating stuff right now'],
"element": ['beach vibe some', 'beach vibe some', 'beach vibe some', 'floating power', 'floating stuff']
}
df = pd.DataFrame(data=d)
df['code'] = df['element'].astype('category').cat.codes
print(df)
def f(x):
match = 999
for element in df['element'].unique():
check = bool(re.search(element, x['text'], re.IGNORECASE))
if(check):
#print(forwarder)
match = df['code'].loc[df['element']== element].iloc[0]
break
elif(re.search(' '.join(element.split()[:2]), x['text'], re.IGNORECASE)):
match = df['code'].loc[df['element']== element].iloc[0]
break
else:
s = element.split()
s[1] = s[1][:3]
string = ' '.join(s[:2])
if(bool(re.search(string, x['text'], re.IGNORECASE))):
match = df['code'].loc[df['element']== element].iloc[0]
break
x['test'] = match
return x
#print(match)
df['test'] = None
df = df.apply(lambda x: f(x), axis = 1)
print(df)
customerId text element code test
0 1 please use beach vibe some beach vibe some 0 0
1 1 you should use beach vibe beach vibe some 0 0
2 1 right use beach vib beach vibe some 0 0
3 3 use floating pow floating power 1 1
4 3 use floating stuff right now floating stuff 2 2
為什么要使用正則表達式?
element_parts = element.lower().split()
lookup_key = element_parts[0] + " " + element_parts[1][:3]
if lookup_key in x["text"].lower():
# here we go ...
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.