[英]compare two string columns in dataframe row-wise
Problem description: i need to set a variable for each line, but only if it is within the range of a list in a second column in the same row.问题描述:我需要为每一行设置一个变量,但前提是它在同一行第二列的列表范围内。
Sample Dataframe:样品 Dataframe:
df = pd.Dataframe({'col1': ['A', 'T' , 'P', 'Z'], 'col2': ['A, B, C', 'D, E, F' , 'G, H, I, P', 'M, N, R, ZGTR']})
i need to recieve all rows where col1 is part of col2.我需要接收 col1 是 col2 一部分的所有行。 expected result:
预期结果:
col1 col2
'A' 'A, B, C'
'P' 'G, H, I, P'
My approach which returns a TypeError about Series objects being mutable and can not be hashed:我的方法返回关于 Series 对象可变且无法散列的 TypeError:
df[df['col2'].str.match(df['col1'])]
As far as i understand i have to point out somehow that the compare should be done within one row.据我了解,我必须以某种方式指出比较应该在一行内完成。 I know itterrows would be an solution but i would prefer something without looping.
我知道 itterrows 将是一个解决方案,但我更喜欢没有循环的东西。
Use list comprehension with test by in
with splitted values:使用带有拆分值的 test by
in
列表推导:
import pandas as pd
df = pd.DataFrame({'col1': ['A', 'T' , 'P', 'Z'],
'col2': ['A, B, C', 'D, E, F' , 'G, H, I, P', 'M, N, R, ZGTR']})
df = df[[b in a.split(', ') for a, b in df[['col2', 'col1']].values]]
print (df)
col1 col2
0 A A, B, C
2 P G, H, I, P
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.