[英]How do I find in dataframe value in column B exists in Column A in a dataframe, and if so, replace the value in column B with Column A's value?
I have a dataframe-我有一个数据框-
df = pd.DataFrame({'Col A': ['A:A', 'A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'F:F', 'G:G',
'H:H'],
'Col B': ['A:A', 'F:F', 'B:B', 'C:C', 'D:D', 'E:E', 'E:E', 'F:F', 'G:G',
'H:H']},
)
My end goal is to combine all duplicate values of row A, and find out if that row's value in column B exists in Column A - if it does, i want to update Column B's value to add that value to it- example below:我的最终目标是合并 A 行的所有重复值,并找出B列中该行的值是否存在于A列中 - 如果存在,我想更新 B 列的值以将该值添加到其中 - 示例如下:
Index指数 | Col A可乐 | Col B B栏 |
---|---|---|
0 0 | A:A一个:一个 | A:A, F:F, E:E A:A, F:F, E:E |
1 1 | B:B乙:乙 | B:B乙:乙 |
2 2 | C:C C:C | C:C C:C |
3 3 | D:D D:D | D:D D:D |
4 4 | E:E E:E | E:E E:E |
5 5 | F:F F:F | F:F, E:E F:F, E:E |
6 6 | G:G G:G | G:G G:G |
7 7 | H:H高:高 | H:H高:高 |
I've tried applying a depth first search:我尝试应用深度优先搜索:
visited = set()
def dfs(visited, graph, node):
if node not in visited:
print (node)
visited.add(node)
for neighbour in graph[node]:
dfs(visited, graph, neighbour)
However, i get a key error when i try that:但是,当我尝试这样做时遇到一个关键错误:
data = np.array(['A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'])
ser = {'A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'}
ser = pd.Series(data)
df = df.groupby(['Col1'])['Col2'].apply(' , '.join).reset_index()
for i in df:
dfs(visited, df, i)
KeyError: 'A:A'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-66-e47c7e4cac17> in <module>
26 print(df)
27 for i in df:
---> 28 dfs(visited, df, i)
<ipython-input-66-e47c7e4cac17> in dfs(visited, graph, node)
17 visited.add(node)
18 for neighbour in graph[node]:
---> 19 dfs(visited, graph, neighbour)
20
21 data = np.array(['A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'])
<ipython-input-66-e47c7e4cac17> in dfs(visited, graph, node)
16 print (node)
17 visited.add(node)
---> 18 for neighbour in graph[node]:
19 dfs(visited, graph, neighbour)
20
Unfortunately, my experience in python is limited-what is the best way to go about getting my goal here?不幸的是,我在 python 方面的经验是有限的——在 go 中实现我的目标的最佳方法是什么?
A short, fast solution would be to group by A
and then aggregate B
into a list:一个简短、快速的解决方案是按A
分组,然后将B
聚合到一个列表中:
new_df = df.groupby('Col A')['Col B'].agg(list).str.join(', ').reset_index()
Output: Output:
>>> new_df
Col A Col B
0 A:A A:A, F:F
1 B:B B:B
2 C:C C:C
3 D:D D:D
4 E:E E:E
5 F:F E:E, F:F
6 G:G G:G
7 H:H H:H
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.