Python/Pandas - 识别一列中与另一列中完全相同的唯一值匹配的唯一值

Question

我对使用数据框还是很陌生，我正在尝试识别一列中的唯一值，这些值在另一列中具有完全相同的唯一值。 例如，如果“A”列中的元素“a”在“B”列中具有唯一值“x”、“y”和“z”，我如何找到“A”列中的其他元素也具有“B”列中“x”、“y”和“z”的唯一值？

这篇文章中的解决方案让我走到了一半，但仍然需要手动分析来检索此信息： Pandas，对于一列中的每个唯一值，在另一列中获取唯一值

将此解决方案应用于示例：

import pandas as pd

df1 = pd.DataFrame({
'name': ['John', 'Jane', 'John', 'Bill', 'Sue', 'Fred', 'Bill'],
'response': [23, 29, 21, 21, 34, 18, 23]})

print(df1.groupby('name').apply(lambda x: 
x['response'].sort_values().unique()).reset_index())

产生以下结果：

   name         0
0  Bill  [21, 23]
1  Fred      [18]
2  Jane      [29]
3  John  [21, 23]
4   Sue      [34]

我想找到一个解决方案来确定 Bill 和 John 有相同的响应。

感谢大家！

PS 非常感谢任何有关如何重命名输出中的“0”列的建议！

Answer 1

你几乎已经明白了，只需要稍微修改列值，这样你就不会在下一次迭代中遇到与第一次迭代类似的错误。

import pandas as pd

df = pd.DataFrame({
'name': ['John', 'Jane', 'John', 'Bill', 'Sue', 'Fred', 'Bill'],
'response': [23, 29, 21, 21, 34, 18, 23]})

df.groupby('name').apply(lambda x: x['response'].sort_values().unique()).reset_index().rename(columns={0:'response'})

#consolidate values while keeping seperator, so you can iterate again error free
df.response = [str(list(x)) for x in df.response]

出去

|    | name   | response   |
|---:|:-------|:-----------|
|  0 | Bill   | [21, 23]   |
|  1 | Fred   | [18]       |
|  2 | Jane   | [29]       |
|  3 | John   | [21, 23]   |
|  4 | Sue    | [34]       |

现在再做一次迭代，类似于之前

df.groupby('response').apply(lambda x: x['name'].sort_values().unique()).reset_index().rename(columns={0:'name'})

|    | response   | name            |
|---:|:-----------|:----------------|
|  0 | [18]       | ['Fred']        |
|  1 | [21, 23]   | ['Bill' 'John'] |
|  2 | [29]       | ['Jane']        |
|  3 | [34]       | ['Sue']         |

Python/Pandas - 识别一列中与另一列中完全相同的唯一值匹配的唯一值

问题描述

1 个解决方案

解决方案1
0 2022-12-20 19:01:54

Python/Pandas - 识别一列中与另一列中完全相同的唯一值匹配的唯一值

问题描述

1 个解决方案

解决方案1 0 2022-12-20 19:01:54

解决方案1
0 2022-12-20 19:01:54