如何在 dataframe 的 A 列中找到 B 列中的 dataframe 值，如果是，将 B 列中的值替换为 A 列的值？

Question

I have a dataframe-我有一个数据框-

df = pd.DataFrame({'Col A': ['A:A', 'A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'F:F', 'G:G', 
'H:H'],
                  'Col B': ['A:A', 'F:F', 'B:B', 'C:C', 'D:D', 'E:E', 'E:E', 'F:F', 'G:G', 
'H:H']},
                  )

My end goal is to combine all duplicate values of row A, and find out if that row's value in column B exists in Column A - if it does, i want to update Column B's value to add that value to it- example below:我的最终目标是合并 A 行的所有重复值，并找出B列中该行的值是否存在于A列中 - 如果存在，我想更新 B 列的值以将该值添加到其中 - 示例如下：

Index指数	Col A可乐	Col B B栏
0 0	A:A一个：一个	A:A, F:F, E:E A:A, F:F, E:E
1 1	B:B乙：乙	B:B乙：乙
2 2	C:C C:C	C:C C:C
3 3	D:D D:D	D:D D:D
4 4	E:E E:E	E:E E:E
5 5	F:F F:F	F:F, E:E F:F, E:E
6 6	G:G G:G	G:G G:G
7 7	H:H高：高	H:H高：高

I've tried applying a depth first search:我尝试应用深度优先搜索：

visited = set()
def dfs(visited, graph, node):
    if node not in visited:
        print (node)
        visited.add(node)
        for neighbour in graph[node]:
            dfs(visited, graph, neighbour)

However, i get a key error when i try that:但是，当我尝试这样做时遇到一个关键错误：

data = np.array(['A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'])
ser = {'A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'}
ser = pd.Series(data)

df = df.groupby(['Col1'])['Col2'].apply(' , '.join).reset_index()
for i in df:
    dfs(visited, df, i)



KeyError: 'A:A'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-66-e47c7e4cac17> in <module>
     26 print(df)
     27 for i in df:
---> 28     dfs(visited, df, i)

<ipython-input-66-e47c7e4cac17> in dfs(visited, graph, node)
     17         visited.add(node)
     18         for neighbour in graph[node]:
---> 19             dfs(visited, graph, neighbour)
     20 
     21 data = np.array(['A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'])

<ipython-input-66-e47c7e4cac17> in dfs(visited, graph, node)
     16         print (node)
     17         visited.add(node)
---> 18         for neighbour in graph[node]:
     19             dfs(visited, graph, neighbour)
     20

Unfortunately, my experience in python is limited-what is the best way to go about getting my goal here?不幸的是，我在 python 方面的经验是有限的——在 go 中实现我的目标的最佳方法是什么？

Answer 1

A short, fast solution would be to group by A and then aggregate B into a list:一个简短、快速的解决方案是按A分组，然后将B聚合到一个列表中：

new_df = df.groupby('Col A')['Col B'].agg(list).str.join(', ').reset_index()

Output: Output：

>>> new_df
  Col A     Col B
0   A:A  A:A, F:F
1   B:B       B:B
2   C:C       C:C
3   D:D       D:D
4   E:E       E:E
5   F:F  E:E, F:F
6   G:G       G:G
7   H:H       H:H

如何在 dataframe 的 A 列中找到 B 列中的 dataframe 值，如果是，将 B 列中的值替换为 A 列的值？

问题描述

1 个解决方案

解决方案1
1 2022-01-31 22:01:58

如何在 dataframe 的 A 列中找到 B 列中的 dataframe 值，如果是，将 B 列中的值替换为 A 列的值？

问题描述

1 个解决方案

解决方案1 1 2022-01-31 22:01:58

解决方案1
1 2022-01-31 22:01:58