简体   繁体   English

逐行比较 dataframe 中的两个字符串列

[英]compare two string columns in dataframe row-wise

Problem description: i need to set a variable for each line, but only if it is within the range of a list in a second column in the same row.问题描述:我需要为每一行设置一个变量,但前提是它在同一行第二列的列表范围内。

Sample Dataframe:样品 Dataframe:

df = pd.Dataframe({'col1': ['A', 'T' , 'P', 'Z'], 'col2': ['A, B, C', 'D, E, F' , 'G, H, I, P', 'M, N, R, ZGTR']})

i need to recieve all rows where col1 is part of col2.我需要接收 col1 是 col2 一部分的所有行。 expected result:预期结果:

col1    col2
'A'     'A, B, C'
'P'     'G, H, I, P'

My approach which returns a TypeError about Series objects being mutable and can not be hashed:我的方法返回关于 Series 对象可变且无法散列的 TypeError:

df[df['col2'].str.match(df['col1'])]

As far as i understand i have to point out somehow that the compare should be done within one row.据我了解,我必须以某种方式指出比较应该在一行内完成。 I know itterrows would be an solution but i would prefer something without looping.我知道 itterrows 将是一个解决方案,但我更喜欢没有循环的东西。

Use list comprehension with test by in with splitted values:使用带有拆分值的 test by in列表推导:

import pandas as pd

df = pd.DataFrame({'col1': ['A', 'T' , 'P', 'Z'], 
                   'col2': ['A, B, C', 'D, E, F' , 'G, H, I, P', 'M, N, R, ZGTR']})
df = df[[b in a.split(', ') for a, b in df[['col2', 'col1']].values]]
print (df)
  col1        col2
0    A     A, B, C
2    P  G, H, I, P

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM