简体   繁体   English

从列表中的数据框列中搜索部分字符串匹配 - Pandas - Python

[英]Search for a partial string match in a data frame column from a list - Pandas - Python

I have a list: 我有一个清单:

things = ['A1','B2','C3']

I have a pandas data frame with a column containing values separated by a semicolon - some of the rows will contain matches with one of the items in the list above (it won't be a perfect match since it has other parts of a string in the column.. for example, a row in that column may have 'Wow;Here;This= A1 ;10001;0') 我有一个pandas数据框,其中一列包含以分号分隔的值 - 一些行将包含与上面列表中的一个项匹配的匹配(由于它具有字符串的其他部分,因此不会完美匹配)列...例如,该列中的一行可能有'哇;这里;这= A1 ; 10001; 0')

I want to save the rows that contain a match with items from the list, and then create a new data frame with those selected rows (should have the same headers). 我想保存包含与列表中的项匹配的行,然后使用这些选定的行创建一个新的数据框(应该具有相同的标题)。 This is what I tried: 这是我试过的:

import re

for_new_df =[]

for x in df['COLUMN']:
    for mp in things:
        if df[df['COLUMN'].str.contains(mp)]:
            for_new_df.append(mp)  #This won't save the whole row - help here too, please.

This code gave me an error: 这段代码给了我一个错误:

ValueError: The truth value of a DataFrame is ambiguous. ValueError:DataFrame的真值是不明确的。 Use a.empty, a.bool(), a.item(), a.any() or a.all(). 使用a.empty,a.bool(),a.item(),a.any()或a.all()。

I'm very new to coding, so the more explanation and detail in your answer, the better! 我对编码很新,所以答案中的解释和细节越多越好! Thanks in advance. 提前致谢。

You can avoid the loop by joining your list of words to create a regex and use str.contains : 您可以通过加入单词列表来创建正则表达式并使用str.contains来避免循环:

pat = '|'.join(thing)
for_new_df = df[df['COLUMN'].str.contains(pat)]

should just work 应该工作

So the regex pattern becomes: 'A1|B2|C3' and this will match anywhere in your strings that contain any of these strings 因此正则表达式模式变为: 'A1|B2|C3' ,这将匹配包含任何这些字符串的字符串中的任何位置

Example: 例:

In [65]:
things = ['A1','B2','C3']
pat = '|'.join(things)
df = pd.DataFrame({'a':['Wow;Here;This=A1;10001;0', 'B2', 'asdasda', 'asdas']})
df[df['a'].str.contains(pat)]

Out[65]:
                          a
0  Wow;Here;This=A1;10001;0
1                        B2

As to why it failed: 至于它失败的原因:

if df[df['COLUMN'].str.contains(mp)]

this line: 这一行:

df[df['COLUMN'].str.contains(mp)]

returns a df masked by the boolean array of your inner str.contains , if doesn't understand how to evaluate an array of booleans hence the error. 返回一个由内部str.contains的boolean数组掩盖的df, if不了解如何评估一个布尔数组,从而导致错误。 If you think about it what should it do if you 1 True or all but one True? 如果你想一想,如果你是真的或者只有一个是真的,它该怎么办? it expects a scalar and not an array like value. 它期望一个标量,而不是像数组一样的值。

Pandas is actually amazing but I don't find it very easy to use. 熊猫实际上是惊人的,但我觉得它很容易使用。 However it does have many functions designed to make life easy, including tools for searching through huge data frames. 然而,它确实具有许多旨在简化生活的功能,包括用于搜索大量数据帧的工具。

Though it may not be a full solution to your problem, this may help set you off on the right foot. 虽然它可能不是您问题的完整解决方案,但这可能会帮助您摆脱困境。 I have assumed that you know which column you are searching in, column A in my example. 我假设您知道要搜索的列,在我的示例中为A列。

import pandas as pd

df = pd.DataFrame({'A' : pd.Categorical(['Wow;Here;This=A1;10001;0', 'Another;C3;Row=Great;100', 'This;D6;Row=bad100']),
                   'B' : 'foo'})
print df #Original data frame
print
print df['A'].str.contains('A1|B2|C3')  # Boolean array showing matches for col A
print
print df[df['A'].str.contains('A1|B2|C3')]   # Matching rows

The output: 输出:

                          A    B
0  Wow;Here;This=A1;10001;0  foo
1  Another;C3;Row=Great;100  foo
2        This;D6;Row=bad100  foo

0     True
1     True
2    False
Name: A, dtype: bool

                          A    B
0  Wow;Here;This=A1;10001;0  foo
1  Another;C3;Row=Great;100  foo

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当我尝试从 python 中的 Pandas 数据框创建新列时,部分关键字匹配不起作用? - Partial keyword match not working when I am trying to create a new column from a pandas data frame in python? Python Pandas从部分字符串匹配中填充列 - Python Pandas populate column from partial string match Python Pandas dataframe中的字符串列表部分匹配 - Python Pandas partial match of list of string in dataframe Python / Pandas:从列表中匹配字符串的数据框中删除(不过滤!)行 - Python/Pandas: Drop (not filter!) rows from data frame on string match from list Python / Pandas:从列表中的字符串匹配中删除数据帧中的行 - Python/Pandas: Drop rows from data frame on string match from list Python,Pandas用于匹配数据框并指示列表中的结果 - Python, Pandas to match data frame and indicate findings from a list 使用 python/pandas 从另一个 excel 列中的一个 excel 列中查找部分字符串匹配 - Find a partial string match from one excel column in another excel column using python/pandas 基于部分字符串匹配从另一个 dataframe 填充一个数据框列 - Based on Partial string Match fill one data frame column from another dataframe Python Pandas部分字符串匹配 - python pandas partial string match Python Pandas dataframe中字符串列表的部分匹配并返回所有匹配的部分字符串 - Python Pandas partial match of list of string in dataframe and return all match partial string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM