I have a DataFrame with 2 columns:
import pandas as pd
data = {'Country': ['A', 'A', 'A' ,'B', 'B'],'Capital': ['CC', 'CD','CE','CF','CG'],'Population': [5, 35, 20,34,65]}
df = pd.DataFrame(data,columns=['Country', 'Capital', 'Population'])
I want to compare each row with all others, and if it has the same Country, I would like to concatenate the pair into a new data frame (and transfor it into a new csv).
new_data = {'Country': ['A', 'A','B'],'Capital': ['CC', 'CD','CF'],'Population': [5, 35,34],'Country_2': ['A', 'A' ,'B'],'Capital_2': ['CD','CE','CG'],'Population_2': [35, 20,65]}
df_new = pd.DataFrame(new_data,columns=['Country', 'Capital', 'Population','Country_2','Capital_2','Population_2'])
NOTE: This is a simplification of my data, I have more than 5000 rows and I would like to do it automatically I tried comparing dictionaries, and also comparing one row at a time, but I couldn't do it. Thank you for the attention
>>> df.join(df.groupby('Country').shift(-1), rsuffix='_2')\
... .dropna(how='any')
Country Capital Population Capital_2 Population_2
0 A CC 5 CD 35.0
1 A CD 35 CE 20.0
3 B CF 34 CG 65.0
This pairs every row with the next one using join
+ shift
− but we restrict shifting only within the same country using groupby
. See what the groupby + shift does on its own:
>>> df.groupby('Country').shift(-1)
Capital Population
0 CD 35.0
1 CE 20.0
2 NaN NaN
3 CG 65.0
4 NaN NaN
Then once these values are added to the right of your data with the _2
suffix, the rows that have NaN
s are dropped with dropna()
.
Finally note that Country_2
is not repeated as it's the same as Country
, but it would be very easy to add
To get all combinations you can try:
from itertools import combinations,chain
df = (
pd.concat(
[pd.DataFrame(
np.array(list(chain(*(combinations(k.values,2))))).reshape(-1, len(df.columns) * 2),
columns = df.columns.append(df.columns.map(lambda x: x + '_2')))
for g,k in df.groupby('Country')]
)
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.