简体   繁体   English

如何在 dataframe 的 A 列中找到 B 列中的 dataframe 值,如果是,将 B 列中的值替换为 A 列的值?

[英]How do I find in dataframe value in column B exists in Column A in a dataframe, and if so, replace the value in column B with Column A's value?

I have a dataframe-我有一个数据框-

df = pd.DataFrame({'Col A': ['A:A', 'A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'F:F', 'G:G', 
'H:H'],
                  'Col B': ['A:A', 'F:F', 'B:B', 'C:C', 'D:D', 'E:E', 'E:E', 'F:F', 'G:G', 
'H:H']},
                  )

My end goal is to combine all duplicate values of row A, and find out if that row's value in column B exists in Column A - if it does, i want to update Column B's value to add that value to it- example below:我的最终目标是合并 A 行的所有重复值,并找出B列中该行的值是否存在于A列中 - 如果存在,我想更新 B 列的值以将该值添加到其中 - 示例如下:

Index指数 Col A可乐 Col B B栏
0 0 A:A一个:一个 A:A, F:F, E:E A:A, F:F, E:E
1 1 B:B乙:乙 B:B乙:乙
2 2 C:C C:C C:C C:C
3 3 D:D D:D D:D D:D
4 4 E:E E:E E:E E:E
5 5 F:F F:F F:F, E:E F:F, E:E
6 6 G:G G:G G:G G:G
7 7 H:H高:高 H:H高:高

I've tried applying a depth first search:我尝试应用深度优先搜索:

visited = set()
def dfs(visited, graph, node):
    if node not in visited:
        print (node)
        visited.add(node)
        for neighbour in graph[node]:
            dfs(visited, graph, neighbour)

However, i get a key error when i try that:但是,当我尝试这样做时遇到一个关键错误:

data = np.array(['A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'])
ser = {'A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'}
ser = pd.Series(data)

df = df.groupby(['Col1'])['Col2'].apply(' , '.join).reset_index()
for i in df:
    dfs(visited, df, i)



KeyError: 'A:A'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-66-e47c7e4cac17> in <module>
     26 print(df)
     27 for i in df:
---> 28     dfs(visited, df, i)

<ipython-input-66-e47c7e4cac17> in dfs(visited, graph, node)
     17         visited.add(node)
     18         for neighbour in graph[node]:
---> 19             dfs(visited, graph, neighbour)
     20 
     21 data = np.array(['A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'])

<ipython-input-66-e47c7e4cac17> in dfs(visited, graph, node)
     16         print (node)
     17         visited.add(node)
---> 18         for neighbour in graph[node]:
     19             dfs(visited, graph, neighbour)
     20 

Unfortunately, my experience in python is limited-what is the best way to go about getting my goal here?不幸的是,我在 python 方面的经验是有限的——在 go 中实现我的目标的最佳方法是什么?

A short, fast solution would be to group by A and then aggregate B into a list:一个简短、快速的解决方案是按A分组,然后将B聚合到一个列表中:

new_df = df.groupby('Col A')['Col B'].agg(list).str.join(', ').reset_index()

Output: Output:

>>> new_df
  Col A     Col B
0   A:A  A:A, F:F
1   B:B       B:B
2   C:C       C:C
3   D:D       D:D
4   E:E       E:E
5   F:F  E:E, F:F
6   G:G       G:G
7   H:H       H:H

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM