简体   繁体   English

Python - 从列表中搜索并显示 dataframe 的列中的字符串

[英]Python - Search and present strings within a column in dataframe from a list

How I present in Python the words that found from a given list of words, in a text column for each row.我如何在 Python 中呈现从给定单词列表中找到的单词,在每一行的文本列中。

And if there are several words that found in the text column, I want to present them separated by ",".如果在文本列中找到多个单词,我想用“,”分隔它们。

Example:例子:

I have the following list:我有以下列表:

color_list = ['White','Yellow','Blue','Red']

which I need to search within a dataframe (df):我需要在 dataframe (df) 中搜索:

      doc    text             
0    3000 'colors White Yellow'
1    3001 'Green Black'
2    3002 'I want the color Red'

and insert the matching rows into a new column with the matching words from the list:并将匹配的行插入到具有列表中匹配单词的新列中:

 doc      text                      words
0    3000 'colors White Yellow'    'White, Yellow'
1    3001 'Green Black'             
2    3002 'I want the color Red'   'Red'

I used the code to extract the matching word, but I manage to present only one word for each row:我使用代码来提取匹配的单词,但我设法为每一行显示一个单词:

df['words'] = df.text.str.extract('(?i)({0})'.format('|'.join(color_list )))

And I can't figure out how to do this in Python (in R I did this)而且我不知道如何在 Python 中执行此操作(在 R 我这样做了)

This specific issue is new because the challenge is to present more than one string from a list and not just one value.这个特定问题是新问题,因为挑战是从列表中呈现多个字符串,而不仅仅是一个值。

Thanks in advance for your help.在此先感谢您的帮助。

You need to extract the words with str.findall and then join the results with ", ".join :您需要使用str.findall提取单词,然后使用", ".join加入结果:

import pandas as pd

color_list = ['White','Yellow','Blue','Red']
df = pd.DataFrame({"doc": [3000, 3001, 3002], "text": ["colors White Yellow", "Green Black", "I want the color Red"]})
df['words'] = df['text'].str.findall(r'(?i)\b(?:{})\b'.format('|'.join(color_list))).apply(', '.join)

Output: Output:

    doc                  text          words
0  3000   colors White Yellow  White, Yellow
1  3001           Green Black               
2  3002  I want the color Red            Red

This assumes all the terms in color_list consist of only word characters.这假设color_list中的所有术语仅包含单词字符。

result = df[df.text.isin(color_list)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM