I have a dataframe-
df = pd.DataFrame({'Col A': ['A:A', 'A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'F:F', 'G:G',
'H:H'],
'Col B': ['A:A', 'F:F', 'B:B', 'C:C', 'D:D', 'E:E', 'E:E', 'F:F', 'G:G',
'H:H']},
)
My end goal is to combine all duplicate values of row A, and find out if that row's value in column B exists in Column A - if it does, i want to update Column B's value to add that value to it- example below:
Index | Col A | Col B |
---|---|---|
0 | A:A | A:A, F:F, E:E |
1 | B:B | B:B |
2 | C:C | C:C |
3 | D:D | D:D |
4 | E:E | E:E |
5 | F:F | F:F, E:E |
6 | G:G | G:G |
7 | H:H | H:H |
I've tried applying a depth first search:
visited = set()
def dfs(visited, graph, node):
if node not in visited:
print (node)
visited.add(node)
for neighbour in graph[node]:
dfs(visited, graph, neighbour)
However, i get a key error when i try that:
data = np.array(['A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'])
ser = {'A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'}
ser = pd.Series(data)
df = df.groupby(['Col1'])['Col2'].apply(' , '.join).reset_index()
for i in df:
dfs(visited, df, i)
KeyError: 'A:A'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-66-e47c7e4cac17> in <module>
26 print(df)
27 for i in df:
---> 28 dfs(visited, df, i)
<ipython-input-66-e47c7e4cac17> in dfs(visited, graph, node)
17 visited.add(node)
18 for neighbour in graph[node]:
---> 19 dfs(visited, graph, neighbour)
20
21 data = np.array(['A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'])
<ipython-input-66-e47c7e4cac17> in dfs(visited, graph, node)
16 print (node)
17 visited.add(node)
---> 18 for neighbour in graph[node]:
19 dfs(visited, graph, neighbour)
20
Unfortunately, my experience in python is limited-what is the best way to go about getting my goal here?
A short, fast solution would be to group by A
and then aggregate B
into a list:
new_df = df.groupby('Col A')['Col B'].agg(list).str.join(', ').reset_index()
Output:
>>> new_df
Col A Col B
0 A:A A:A, F:F
1 B:B B:B
2 C:C C:C
3 D:D D:D
4 E:E E:E
5 F:F E:E, F:F
6 G:G G:G
7 H:H H:H
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.