[英]Extract a word from regular expression using python data frame
這是我正在使用的數據:
Topic About Group Discussion
microwave is not working i tried turning on the microwave and it wont turn on [[person1 yeah the microwave wont turn on i tested it], [person2 okay send it over to warranty], ...]
the light of the oven wont turn on i have tried to press the light on the oven and nothing [[person3 did you power on the oven], [person4 it was powered on], ...]
water will not come out of sink i turn the valve and nothing comes out of the sink [[person5 okay it looks like water is not coming out of the sink], [person6 okay send it over to this department to take a look], ...]
我想要的是這樣的:
Topic About Group Discussion Topic_Extract About_Extract Group_Discussion_Extract
microwave is not working i tried turning on the microwave and it wont turn on [[person1 yeah the microwave wont turn on i tested it], [person2 okay send it over to warranty], ...] microwave microwave microwave
the light of the oven wont turn on i have tried to press the light on the oven and nothing [[person3 did you power on the oven], [person4 it was powered on], ...] oven oven oven
water will not come out of sink i turn the valve and nothing comes out of the sink [[person5 okay it looks like water is not coming out of the sink], [person6 okay send it over to this department to take a look], ...] sink sink sink
編輯:好的,現在它說一切都是“未分類的”不知道如何解決這個問題:
df['Title_Extract'] = ''
def loop(data):
for i,j in data['Topic'].iteritems():
if (re.search(r'microwave|microwave will not turn on|microwave is not working|microwave wont work|microwave will not work|microwave is broken', j) == True):
return(data['Topic_Extract'].str.replace('', 'microwave'))
elif (re.search(r'oven|oven will not turn on|oven is not working|oven wont work|oven will not work|oven is broken|oven wont turn on', j) == True):
return(data['Topic_Extract'].str.replace('', 'oven'))
elif (re.search(r'sink|sink will not turn on|sink is not working|sink wont work|sink will not work|sink is broken|sink wont turn on', j) == True):
return(data['Topic_Extract'].str.replace('', 'sink'))
else:
return 'unclassified'
loop(df)
當我嘗試提取單詞時遇到以下錯誤 - 沒有正確分類:
0 unclassified
...
2 unclassified
創建要搜索的值列表。 然后使用 findall 返回在 df 列中找到的值
topic_terms = ['microwave','sink', 'oven']
df['term']=df['Topic'].str.findall("|".join(terms))
df
使用的數據
data = {'Topic': {0: 'microwave is not working ',
1: 'the light of the oven wont turn on ',
2: 'water will not come out of sink '},
'About': {0: 'i tried turning on the microwave and it wont turn on ',
1: 'i have tried to press the light on the oven and nothing ',
2: 'i turn the valve and nothing comes out of the sink '},
'Group Discussion': {0: '[[person1 yeah the microwave wont turn on i tested it], [person2 okay send it over to warranty], ...]',
1: '[[person3 did you power on the oven], [person4 it was powered on], ...]',
2: '[[person5 okay it looks like water is not coming out of the sink], [person6 okay send it over to this department to take a look], ...'}}
df=pd.DataFrame(data)
df
Topic About Group Discussion term
0 microwave is not working i tried turning on the microwave and it wont t... [[person1 yeah the microwave wont turn on i te... microwave
1 the light of the oven wont turn on i have tried to press the light on the oven an... [[person3 did you power on the oven], [person4... oven
2 water will not come out of sink i turn the valve and nothing comes out of the ... [[person5 okay it looks like water is not comi... sink
我想通了,謝謝大家的幫助。 這是我的解決方案的樣子:
def loop(data):
for i,j in data['Topic'].iteritems():
if (re.search(r'\bmicrowave\b', j)):
data['Topic Extract'][i].append('microwave')
elif (re.search(r'\boven\b', j)):
data['Topic Extract'][i].append('oven')
elif (re.search(r'\bsink\b', j)):
data['Topic Extract'][i].append('sink')
else:
data['Topic Extract'][i].append('unclassified')
return data
df = loop(df)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.