简体   繁体   English

使用 python 数据框从正则表达式中提取单词

[英]Extract a word from regular expression using python data frame

This is the data I'm working with:这是我正在使用的数据:

Topic                                 About                                                     Group Discussion
microwave is not working              i tried turning on the microwave and it wont turn on      [[person1 yeah the microwave wont turn on i tested it], [person2 okay send it over to warranty], ...]
the light of the oven wont turn on    i have tried to press the light on the oven and nothing   [[person3 did you power on the oven], [person4 it was powered on], ...]
water will not come out of sink       i turn the valve and nothing comes out of the sink        [[person5 okay it looks like water is not coming out of the sink], [person6 okay send it over to this department to take a look], ...]

What I would like is this:我想要的是这样的:

Topic                                 About                                                     Group Discussion                                                                                                                           Topic_Extract         About_Extract        Group_Discussion_Extract
microwave is not working              i tried turning on the microwave and it wont turn on      [[person1 yeah the microwave wont turn on i tested it], [person2 okay send it over to warranty], ...]                                      microwave             microwave            microwave
the light of the oven wont turn on    i have tried to press the light on the oven and nothing   [[person3 did you power on the oven], [person4 it was powered on], ...]                                                                    oven                  oven                 oven
water will not come out of sink       i turn the valve and nothing comes out of the sink        [[person5 okay it looks like water is not coming out of the sink], [person6 okay send it over to this department to take a look], ...]     sink                  sink                 sink

EDIT: Okay, now it's saying everything is 'unclassified' not sure how to fix this:编辑:好的,现在它说一切都是“未分类的”不知道如何解决这个问题:

df['Title_Extract'] = ''
def loop(data):
    for i,j in data['Topic'].iteritems():
        if (re.search(r'microwave|microwave will not turn on|microwave is not working|microwave wont work|microwave will not work|microwave is broken', j) == True):
            return(data['Topic_Extract'].str.replace('', 'microwave'))
        elif (re.search(r'oven|oven will not turn on|oven is not working|oven wont work|oven will not work|oven is broken|oven wont turn on', j) == True):
            return(data['Topic_Extract'].str.replace('', 'oven'))
        elif (re.search(r'sink|sink will not turn on|sink is not working|sink wont work|sink will not work|sink is broken|sink wont turn on', j) == True):
            return(data['Topic_Extract'].str.replace('', 'sink'))
        else:
            return 'unclassified'

loop(df)

I am running into the following error when I'm trying to extract a word - not classifying correctly:当我尝试提取单词时遇到以下错误 - 没有正确分类:

0        unclassified
...
2        unclassified

create a list of values to search.创建要搜索的值列表。 then use findall to return the values that are found in the df column然后使用 findall 返回在 df 列中找到的值

topic_terms = ['microwave','sink', 'oven']
df['term']=df['Topic'].str.findall("|".join(terms))
df


data used使用的数据

data = {'Topic': {0: 'microwave is not working ',
  1: 'the light of the oven wont turn on ',
  2: 'water will not come out of sink '},
 'About': {0: 'i tried turning on the microwave and it wont turn on ',
  1: 'i have tried to press the light on the oven and nothing ',
  2: 'i turn the valve and nothing comes out of the sink '},
 'Group Discussion': {0: '[[person1 yeah the microwave wont turn on i tested it], [person2 okay send it over to warranty], ...]',
  1: '[[person3 did you power on the oven], [person4 it was powered on], ...]',
  2: '[[person5 okay it looks like water is not coming out of the sink], [person6 okay send it over to this department to take a look], ...'}}

df=pd.DataFrame(data)
df
Topic   About   Group Discussion    term
0   microwave is not working    i tried turning on the microwave and it wont t...   [[person1 yeah the microwave wont turn on i te...   microwave
1   the light of the oven wont turn on  i have tried to press the light on the oven an...   [[person3 did you power on the oven], [person4...   oven
2   water will not come out of sink     i turn the valve and nothing comes out of the ...   [[person5 okay it looks like water is not comi...   sink

在此处输入图像描述

I figured it out, thanks for the help everyone.我想通了,谢谢大家的帮助。 This is what my solution looks like:这是我的解决方案的样子:

def loop(data):
    for i,j in data['Topic'].iteritems():
        if (re.search(r'\bmicrowave\b', j)):
            data['Topic Extract'][i].append('microwave')
        elif (re.search(r'\boven\b', j)):
            data['Topic Extract'][i].append('oven')
        elif (re.search(r'\bsink\b', j)):
            data['Topic Extract'][i].append('sink')
        else:
            data['Topic Extract'][i].append('unclassified')
    return data

df = loop(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM