[英]Extract a word from regular expression using python data frame
This is the data I'm working with:这是我正在使用的数据:
Topic About Group Discussion
microwave is not working i tried turning on the microwave and it wont turn on [[person1 yeah the microwave wont turn on i tested it], [person2 okay send it over to warranty], ...]
the light of the oven wont turn on i have tried to press the light on the oven and nothing [[person3 did you power on the oven], [person4 it was powered on], ...]
water will not come out of sink i turn the valve and nothing comes out of the sink [[person5 okay it looks like water is not coming out of the sink], [person6 okay send it over to this department to take a look], ...]
What I would like is this:我想要的是这样的:
Topic About Group Discussion Topic_Extract About_Extract Group_Discussion_Extract
microwave is not working i tried turning on the microwave and it wont turn on [[person1 yeah the microwave wont turn on i tested it], [person2 okay send it over to warranty], ...] microwave microwave microwave
the light of the oven wont turn on i have tried to press the light on the oven and nothing [[person3 did you power on the oven], [person4 it was powered on], ...] oven oven oven
water will not come out of sink i turn the valve and nothing comes out of the sink [[person5 okay it looks like water is not coming out of the sink], [person6 okay send it over to this department to take a look], ...] sink sink sink
EDIT: Okay, now it's saying everything is 'unclassified' not sure how to fix this:编辑:好的,现在它说一切都是“未分类的”不知道如何解决这个问题:
df['Title_Extract'] = ''
def loop(data):
for i,j in data['Topic'].iteritems():
if (re.search(r'microwave|microwave will not turn on|microwave is not working|microwave wont work|microwave will not work|microwave is broken', j) == True):
return(data['Topic_Extract'].str.replace('', 'microwave'))
elif (re.search(r'oven|oven will not turn on|oven is not working|oven wont work|oven will not work|oven is broken|oven wont turn on', j) == True):
return(data['Topic_Extract'].str.replace('', 'oven'))
elif (re.search(r'sink|sink will not turn on|sink is not working|sink wont work|sink will not work|sink is broken|sink wont turn on', j) == True):
return(data['Topic_Extract'].str.replace('', 'sink'))
else:
return 'unclassified'
loop(df)
I am running into the following error when I'm trying to extract a word - not classifying correctly:当我尝试提取单词时遇到以下错误 - 没有正确分类:
0 unclassified
...
2 unclassified
create a list of values to search.创建要搜索的值列表。 then use findall to return the values that are found in the df column
然后使用 findall 返回在 df 列中找到的值
topic_terms = ['microwave','sink', 'oven']
df['term']=df['Topic'].str.findall("|".join(terms))
df
data used使用的数据
data = {'Topic': {0: 'microwave is not working ',
1: 'the light of the oven wont turn on ',
2: 'water will not come out of sink '},
'About': {0: 'i tried turning on the microwave and it wont turn on ',
1: 'i have tried to press the light on the oven and nothing ',
2: 'i turn the valve and nothing comes out of the sink '},
'Group Discussion': {0: '[[person1 yeah the microwave wont turn on i tested it], [person2 okay send it over to warranty], ...]',
1: '[[person3 did you power on the oven], [person4 it was powered on], ...]',
2: '[[person5 okay it looks like water is not coming out of the sink], [person6 okay send it over to this department to take a look], ...'}}
df=pd.DataFrame(data)
df
Topic About Group Discussion term
0 microwave is not working i tried turning on the microwave and it wont t... [[person1 yeah the microwave wont turn on i te... microwave
1 the light of the oven wont turn on i have tried to press the light on the oven an... [[person3 did you power on the oven], [person4... oven
2 water will not come out of sink i turn the valve and nothing comes out of the ... [[person5 okay it looks like water is not comi... sink
I figured it out, thanks for the help everyone.我想通了,谢谢大家的帮助。 This is what my solution looks like:
这是我的解决方案的样子:
def loop(data):
for i,j in data['Topic'].iteritems():
if (re.search(r'\bmicrowave\b', j)):
data['Topic Extract'][i].append('microwave')
elif (re.search(r'\boven\b', j)):
data['Topic Extract'][i].append('oven')
elif (re.search(r'\bsink\b', j)):
data['Topic Extract'][i].append('sink')
else:
data['Topic Extract'][i].append('unclassified')
return data
df = loop(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.