简体   繁体   English

在 Python 中特定单词列表之后的每行中查找单词

[英]Find words in a column per row after list of specific words in Python

I have a pandas data frame with a column called warranty.我有一个 pandas 数据框,其中有一列称为保修。 It has record of ways to fix different issues for example.例如,它记录了解决不同问题的方法。 It looks something like the attached picture.它看起来像所附的图片。

在此处输入图像描述

Goal is to find words after the words listed below.目标是在下面列出的单词之后找到单词。

word_list=['replace', 'clean', 'remove']

how can I get this expected output a column added to above df with values replace battery wire clean fuel tank remove nail我怎样才能得到这个预期的 output 一个列添加到 df 上面的值 更换电池线 清洁油箱 移除钉子

pandas can use regex to search string and you could use pattern to pandas可以使用regex来搜索字符串,您可以使用模式来搜索

(?:replace|clean|remove) (\w+)

You can use python to generate this pattern您可以使用 python 生成此模式

words = "|".join(word_list)
pattern = f'(?:{words}) (\w+)'

print('pattern:', pattern)

And later然后

df['word'] = df['warranty'].str.lower().str.findall(pattern).str[0]

To make sure I convert text to lower() because pattern uses lower case words.为了确保我将文本转换为lower()因为模式使用小写单词。


If replace,clean,remove is always as first word then you could simply split(" ") text and get second element:如果replace,clean,remove始终作为第一个单词,那么您可以简单地split(" ")文本并获取第二个元素:

df['word'] = df['warranty'].str.split(' ').str[1]

If you need more complex code then you could use .apply()如果您需要更复杂的代码,那么您可以使用.apply()

def function(text):
    # ... complex code ...
    return text.split(' ')[1]

df['word'] = df['warranty'].apply(function)

Minimal working code最少的工作代码

import pandas as pd

data = {
    'warranty': [
        'replace battery wire from car',
        'clean fuel tank',
        'remove nail from tire',
    ], 
}

word_list=['replace', 'clean', 'remove']

df = pd.DataFrame(data)

words = "|".join(word_list)
pattern = f'(?:{words}) (\w+)'
print('pattern:', pattern)

def function(text):
    # ... complex code ...
    return text.split(' ')[1]

df['method1'] = df['warranty'].str.lower().str.findall(pattern).str[0]
df['method2'] = df['warranty'].str.split(' ').str[1]
df['method3'] = df['warranty'].apply(function)

print(df)

Result:结果:

pattern: (?:replace|clean|remove) (\w+)

                        warranty  method1  method2  method3
0  replace battery wire from car  battery  battery  battery
1                clean fuel tank     fuel     fuel     fuel
2          remove nail from tire     nail     nail     nail

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM