简体   繁体   中英

Transform one row to a data frame with multiple rows

I have a data frame containing one row:

df_1D = pd.DataFrame({'Day1':[5],
                      'Day2':[6],
                      'Day3':[7],
                   'ID':['AB12'],
                    'Country':['US'],
                    'Destination_A':['Miami'],
                     'Destination_B':['New York'],
                      'Destination_C':['Chicago'],
                    'First_Agent':['Jim'],
                      'Second_Agent':['Ron'],
                      'Third_Agent':['Cynthia']},
                       )

  Day1  Day2  Day3    ID  ... Destination_C First_Agent Second_Agent Third_Agent
0     5     6     7  AB12  ...       Chicago         Jim          Ron     Cynthia

I'm wondering if there's an easy way, to transform it into a dataframe with three rows as shown here:

   Day    ID Country Destination   Agent
0    5  AB12      US       Miami     Jim
1    6  AB12      US    New York     Ron
2    7  AB12      US     Chicago  Cynthia

One option using reshaping, which only requires to know the final columns:

# define final columns
cols = ['Day', 'ID', 'Destination', 'Country', 'Agent']

# the part below is automatic
# ------
# extract the keywords
pattern = f"({'|'.join(cols)})"
new = df_1D.columns.str.extract(pattern)[0]

# and reshape
out = (df_1D
 .set_axis(pd.MultiIndex.from_arrays([new, new.groupby(new).cumcount()]), axis=1)
 .loc[0].unstack(0).ffill()[cols]
)

Output:

   Day    ID Destination Country    Agent
0    5  AB12       Miami      US      Jim
1    6  AB12    New York      US      Ron
2    7  AB12     Chicago      US  Cynthia

alternative defining idx/cols separately

idx = ['ID', 'Country']
cols = ['Day', 'Destination', 'Agent']

df2 = df_1D.set_index(idx)

pattern = f"({'|'.join(cols)})"
new = df2.columns.str.extract(pattern)[0]

out = (df2
 .set_axis(pd.MultiIndex.from_arrays([new, new.groupby(new).cumcount().astype(str)],
                                     names=[None, None]),
           axis=1)
 .stack().reset_index(idx)
)
clomuns_day=[col for col in df_1D if col.startswith('Day')]
clomuns_dest=[col for col in df_1D if col.startswith('Destination')]
clomuns_agent=[col for col in df_1D if 'Agent'in col]
 
new_df=pd.DataFrame()
new_df['Day']=df_1D[clomuns_day].values.tolist()[0]
new_df['ID']= list(df_1D['ID'])*len(new_df)
new_df['Country']= list(df_1D['Country'])*len(new_df)
new_df['Destination']=df_1D[clomuns_dest].values.tolist()[0]
new_df['Agent']=df_1D[clomuns_agent].values.tolist()[0]

Out:

Day    ID Country Destination    Agent
0    5  AB12      US       Miami      Jim
1    6  AB12      US    New York      Ron
2    7  AB12      US     Chicago  Cynthia

you can use it whatever destination is repeat

Have you tried to pivot it with.pivot function? https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html

I don't think there is a way to treat this fully automated. It requires manual manipulation. This is the shortest code that comes to my mind. Feel free to comment:

d1 = {}

for k in ['Day', 'Destination', 'Agent']:
    d1[k] = [d[i][0] for i in d.keys() if k in i]

for k in ['ID', 'Country']:
    d1[k] = d[k] * len(d1['Day'])

d1 = pd.DataFrame(d1)

Output:

在此处输入图像描述

Hope this help.

One option is with pivot_longer from pyjanitor , where for this case, you pass a list of regexes to names_pattern , and the new column names to names_to :

# pip install pyjanitor
import janitor
import pandas as pd
(df_1D
.pivot_longer(
    index=['ID','Country'], 
    names_to = ['Day','Destination','Agent'], 
    names_pattern=['Day','Destination','Agent'])
)
     ID Country  Day Destination    Agent
0  AB12      US    5       Miami      Jim
1  AB12      US    6    New York      Ron
2  AB12      US    7     Chicago  Cynthia

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM