Python，Pandas：如何检查一行是否包含另一行中找到的值？

Question

I want to get an output that identifies id 1 and 2 as duplicates. 我想获得一个将id 1和2标识为重复项的输出。 Because id:2 has value 1 which is also found in id 2 which contains both values 1 and 2. Ie id 2 is a subset of one 因为id：2的值是1，在id 2中也包含值1和2。即id 2是一个值的子集。

I tried using the duplicate function but it does not identify ids 1 and 2 as duplicates. 我尝试使用重复功能，但未将ID 1和2标识为重复。

#check by id if value is a duplicate
test_df = pd.DataFrame({'id':['1', '2', '3', '4'],
                   'value':['1, 2', '1', '18', '19']}) 

print(test_df)
duplicateRowsDF = test_df['value'].duplicated() #returns boolean values
duplicateRowsDF

This should be the reflected boolean values 这应该是反映的布尔值

duplicateRowsDF 0 True 1 True 2 False 3 False Name: value, dtype: bool repeatRowsDF 0是1是2是3是否名称：value，dtype：bool

Expected output table as shown below 预期输出表如下所示

expected_output = pd.DataFrame({'id':['1', '2', '3', '4'],
                   'value':['1, 2', '1', '18', '19'], 'duplicate':['Yes', 'Yes', 'No', 'No']}) 
expected_output

Answer 1

Use for pandas 0.25+: 用于大熊猫0.25+：

#split by , and create Series with index by id column
s = test_df.set_index('id')['value'].str.split(', ').explode()

#check duplicates and get Trues per id if exist at least one, last convert to dict
d = s.duplicated(keep=False).groupby(level=0).transform('any').to_dict()
print (d)
{'1': True, '2': True, '3': False, '4': False}

#map id by dictionary and set values by mask
test_df['duplicate'] = np.where(test_df['id'].map(d), 'yes','no')
print (test_df)
  id value duplicate
0  1  1, 2       yes
1  2     1       yes
2  3    18        no
3  4    19        no

Python，Pandas：如何检查一行是否包含另一行中找到的值？

问题描述

1 个解决方案

解决方案1
0 2019-09-08 09:08:37

Python，Pandas：如何检查一行是否包含另一行中找到的值？

问题描述

1 个解决方案

解决方案1 0 2019-09-08 09:08:37

解决方案1
0 2019-09-08 09:08:37