How do I find in dataframe value in column B exists in Column A in a dataframe, and if so, replace the value in column B with Column A's value?

Question

I have a dataframe-

df = pd.DataFrame({'Col A': ['A:A', 'A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'F:F', 'G:G', 
'H:H'],
                  'Col B': ['A:A', 'F:F', 'B:B', 'C:C', 'D:D', 'E:E', 'E:E', 'F:F', 'G:G', 
'H:H']},
                  )

My end goal is to combine all duplicate values of row A, and find out if that row's value in column B exists in Column A - if it does, i want to update Column B's value to add that value to it- example below:

Index	Col A	Col B
0	A:A	A:A, F:F, E:E
1	B:B	B:B
2	C:C	C:C
3	D:D	D:D
4	E:E	E:E
5	F:F	F:F, E:E
6	G:G	G:G
7	H:H	H:H

I've tried applying a depth first search:

visited = set()
def dfs(visited, graph, node):
    if node not in visited:
        print (node)
        visited.add(node)
        for neighbour in graph[node]:
            dfs(visited, graph, neighbour)

However, i get a key error when i try that:

data = np.array(['A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'])
ser = {'A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'}
ser = pd.Series(data)

df = df.groupby(['Col1'])['Col2'].apply(' , '.join).reset_index()
for i in df:
    dfs(visited, df, i)



KeyError: 'A:A'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-66-e47c7e4cac17> in <module>
     26 print(df)
     27 for i in df:
---> 28     dfs(visited, df, i)

<ipython-input-66-e47c7e4cac17> in dfs(visited, graph, node)
     17         visited.add(node)
     18         for neighbour in graph[node]:
---> 19             dfs(visited, graph, neighbour)
     20 
     21 data = np.array(['A:A', 'B:B', 'C:C', 'D:D', 'E:E', 'F:F', 'G:G', 'H:H'])

<ipython-input-66-e47c7e4cac17> in dfs(visited, graph, node)
     16         print (node)
     17         visited.add(node)
---> 18         for neighbour in graph[node]:
     19             dfs(visited, graph, neighbour)
     20

Unfortunately, my experience in python is limited-what is the best way to go about getting my goal here?

Answer 1

A short, fast solution would be to group by A and then aggregate B into a list:

new_df = df.groupby('Col A')['Col B'].agg(list).str.join(', ').reset_index()

Output:

>>> new_df
  Col A     Col B
0   A:A  A:A, F:F
1   B:B       B:B
2   C:C       C:C
3   D:D       D:D
4   E:E       E:E
5   F:F  E:E, F:F
6   G:G       G:G
7   H:H       H:H

How do I find in dataframe value in column B exists in Column A in a dataframe, and if so, replace the value in column B with Column A's value?

Question

1 answers

solution1
1 2022-01-31 22:01:58

How do I find in dataframe value in column B exists in Column A in a dataframe, and if so, replace the value in column B with Column A's value?

Question

1 answers

solution1 1 2022-01-31 22:01:58

solution1
1 2022-01-31 22:01:58