逐行比较 dataframe 中的两个字符串列

Question

Problem description: i need to set a variable for each line, but only if it is within the range of a list in a second column in the same row.问题描述：我需要为每一行设置一个变量，但前提是它在同一行第二列的列表范围内。

Sample Dataframe:样品 Dataframe：

df = pd.Dataframe({'col1': ['A', 'T' , 'P', 'Z'], 'col2': ['A, B, C', 'D, E, F' , 'G, H, I, P', 'M, N, R, ZGTR']})

i need to recieve all rows where col1 is part of col2.我需要接收 col1 是 col2 一部分的所有行。 expected result:预期结果：

col1    col2
'A'     'A, B, C'
'P'     'G, H, I, P'

My approach which returns a TypeError about Series objects being mutable and can not be hashed:我的方法返回关于 Series 对象可变且无法散列的 TypeError：

df[df['col2'].str.match(df['col1'])]

As far as i understand i have to point out somehow that the compare should be done within one row.据我了解，我必须以某种方式指出比较应该在一行内完成。 I know itterrows would be an solution but i would prefer something without looping.我知道 itterrows 将是一个解决方案，但我更喜欢没有循环的东西。

Answer 1

Use list comprehension with test by in with splitted values:使用带有拆分值的 test by in列表推导：

import pandas as pd

df = pd.DataFrame({'col1': ['A', 'T' , 'P', 'Z'], 
                   'col2': ['A, B, C', 'D, E, F' , 'G, H, I, P', 'M, N, R, ZGTR']})
df = df[[b in a.split(', ') for a, b in df[['col2', 'col1']].values]]
print (df)
  col1        col2
0    A     A, B, C
2    P  G, H, I, P

逐行比较 dataframe 中的两个字符串列

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-06-19 06:57:02

逐行比较 dataframe 中的两个字符串列

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-06-19 06:57:02

解决方案1
2 已采纳 2020-06-19 06:57:02