Pandas- Combine multiple rows into a single row and create new columns

Question

I am working with order data using pandas. Each order can contain up to two rows (one row for each component of the order, which has a max of two components).

My goal is to take two rows and turn them into one.

example input:

Order_Number	INVENTORY CODE	description1
4304	STDROFENHBSM608.511WH	8-1/2 x 11, 60# Soporset Digital HiBright Smooth
4304	STNDEN695WOL	6 x 9.5 DBL WDW ENVELOPE

example output:

Order_Number	INVENTORY CODE	description1	INVENTORY CODE2	description2
4304	STDROFENHBSM608.511WH	8-1/2 x 11, 60# Soporset Digital HiBright Smooth	STNDEN695WOL	6 x 9.5 DBL WDW ENVELOPE

Here is my current code:

#name of infile to be the name of the file being ingested
infile= ('file1.csv')
infile2= ('file2.csv')
infile3= ('file3.csv')
#length of the infile for naming purposes
size = len(infile)
size2 = len(infile2)
#name outfile to be the name of the desired output file
outfile = (infile[:size -4]+"_"+infile2[:size2 -4]+"_output.csv")


#Data read in as dataframe
df = pd.read_csv(infile, encoding = "ISO-8859-1", engine= 'python')
df2 = pd.read_csv(infile2, encoding = "ISO-8859-1", engine= 'python')
df3 = pd.read_csv(infile3, encoding = "ISO-8859-1", engine= 'python')

df_merge =df.merge(df2, on='Order_Number', how='left')
df_final = pd.merge(df_merge,df3[['PkgID?','Printer', 'Inserter']],on='PkgID?', how='left')

The above code combines the various data I am working with into a single dataframe, but has duplicate rows based off order number as mentioned above.

Answer 1

You need to reshape your dataframe. This should work

res = pd.DataFrame(df.drop(columns='Order_Number').values.reshape(len(df)//2, 4), # reshape Inventory code and description columns
                   columns=['INVENTORY CODE', 'description1', 'INVENTORY CODE2', 'description2'], # set new column names
                   index=df['Order_Number'].drop_duplicates()).reset_index() # set index by Order_Number and reset_index for a new column
res

Answer 2

LIMITATIONS: the separator used here is | which mustn't be in your data

import pandas as pd

df = pd.read_csv('df.csv')

# combine based on Order_Number
df_ = df.groupby('Order_Number').agg({'INVENTORY CODE':'|'.join,'description':'|'.join}).reset_index()

# split and expand
df_1 = df_['INVENTORY CODE'].str.split('|', expand=True).add_prefix('INVENTORY CODE_')
df_2 = df_['description'].str.split('|', expand=True).add_prefix('description_')

# combine again and remove the ['INVENTORY CODE', 'description'] columns
df_ = pd.concat([df_, df_1, df_2], axis=1).drop(['INVENTORY CODE', 'description'], axis=1)

# sort the columns to match the output
base_cols = [col for col in df_.columns if col.split('_', 1)[0] not in ['INVENTORY CODE', 'description']]
sort_columns = [col for col in df_.columns if col not in base_cols]

df_ = df_[base_cols + sorted(sort_columns, key=lambda x: (x=='Order_Number', int(x.split('_')[-1])))]

output:

Order_Number	INVENTORY CODE_0	description_0	INVENTORY CODE_1	description_1	INVENTORY CODE_2	description_2	INVENTORY CODE_3	description_3
4304	STDROFENHBSM608.511WH	8-1/2 x 11, 60# Soporset Digital HiBright Smooth	STNDEN695WOL	6 x 9.5 DBL WDW ENVELOPE	STNDEN695WOL	6 x 9.5 DBL WDW ENVELOPE
4305	STNDEN695WOL	6 x 9.5 DBL WDW ENVELOPE	STNDEN695WOL	6 x 9.5 DBL WDW ENVELOPE	STNDEN695WOL	6 x 9.5 DBL WDW ENVELOPE	STNDEN695WOL	6 x 9.5 DBL WDW ENVELOPE

Pandas- Combine multiple rows into a single row and create new columns

Question

2 answers

solution1
1 2022-06-06 21:18:13

solution2
1 2022-06-06 21:35:44

Pandas- Combine multiple rows into a single row and create new columns

Question

2 answers

solution1 1 2022-06-06 21:18:13

solution2 1 2022-06-06 21:35:44

solution1
1 2022-06-06 21:18:13

solution2
1 2022-06-06 21:35:44