How do I compare each row with all the others and if it's the same I concatenate to a new dataframe? Python

Question

I have a DataFrame with 2 columns:

import pandas as pd

data = {'Country': ['A',  'A', 'A' ,'B', 'B'],'Capital': ['CC',  'CD','CE','CF','CG'],'Population': [5, 35, 20,34,65]}

df = pd.DataFrame(data,columns=['Country',  'Capital',  'Population'])

I want to compare each row with all others, and if it has the same Country, I would like to concatenate the pair into a new data frame (and transfor it into a new csv).

new_data =  {'Country': ['A',  'A','B'],'Capital': ['CC',  'CD','CF'],'Population': [5, 35,34],'Country_2': ['A', 'A' ,'B'],'Capital_2': ['CD','CE','CG'],'Population_2': [35, 20,65]}

df_new = pd.DataFrame(new_data,columns=['Country',  'Capital',  'Population','Country_2','Capital_2','Population_2'])

NOTE: This is a simplification of my data, I have more than 5000 rows and I would like to do it automatically I tried comparing dictionaries, and also comparing one row at a time, but I couldn't do it. Thank you for the attention

Answer 1

>>> df.join(df.groupby('Country').shift(-1), rsuffix='_2')\
...   .dropna(how='any')
  Country Capital  Population Capital_2  Population_2
0       A      CC           5        CD          35.0
1       A      CD          35        CE          20.0
3       B      CF          34        CG          65.0

This pairs every row with the next one using join + shift − but we restrict shifting only within the same country using groupby . See what the groupby + shift does on its own:

>>> df.groupby('Country').shift(-1)
  Capital  Population
0      CD        35.0
1      CE        20.0
2     NaN         NaN
3      CG        65.0
4     NaN         NaN

Then once these values are added to the right of your data with the _2 suffix, the rows that have NaN s are dropped with dropna() .

Finally note that Country_2 is not repeated as it's the same as Country , but it would be very easy to add

Answer 2

To get all combinations you can try:

from itertools import combinations,chain

df = (
    pd.concat(
        [pd.DataFrame(
            np.array(list(chain(*(combinations(k.values,2))))).reshape(-1, len(df.columns) * 2),
            columns = df.columns.append(df.columns.map(lambda x: x + '_2')))
        for g,k in df.groupby('Country')]
        )
)

How do I compare each row with all the others and if it's the same I concatenate to a new dataframe? Python

Question

2 answers

solution1
0 ACCPTED 2021-07-06 15:07:58

solution2
0 2021-07-06 18:07:03

How do I compare each row with all the others and if it's the same I concatenate to a new dataframe? Python

Question

2 answers

solution1 0 ACCPTED 2021-07-06 15:07:58

solution2 0 2021-07-06 18:07:03

solution1
0 ACCPTED 2021-07-06 15:07:58

solution2
0 2021-07-06 18:07:03