简体   繁体   English

Pandas - 将多行合并为一行并创建新列

[英]Pandas- Combine multiple rows into a single row and create new columns

I am working with order data using pandas.我正在使用熊猫处理订单数据。 Each order can contain up to two rows (one row for each component of the order, which has a max of two components).每个订单最多可以包含两行(订单的每个组件一行,最多包含两个组件)。

My goal is to take two rows and turn them into one.我的目标是取两行并将它们合二为一。

example input:示例输入:

Order_Number订单号 INVENTORY CODE库存代码 description1描述1
4304 4304 STDROFENHBSM608.511WH STDROFENHBSM608.511WH 8-1/2 x 11, 60# Soporset Digital HiBright Smooth 8-1/2 x 11, 60# Soporset Digital HiBright Smooth
4304 4304 STNDEN695WOL STNDEN695WOL 6 x 9.5 DBL WDW ENVELOPE 6 x 9.5 DBL WDW 信封

example output:示例输出:

Order_Number订单号 INVENTORY CODE库存代码 description1描述1 INVENTORY CODE2库存代码2 description2描述2
4304 4304 STDROFENHBSM608.511WH STDROFENHBSM608.511WH 8-1/2 x 11, 60# Soporset Digital HiBright Smooth 8-1/2 x 11, 60# Soporset Digital HiBright Smooth STNDEN695WOL STNDEN695WOL 6 x 9.5 DBL WDW ENVELOPE 6 x 9.5 DBL WDW 信封

Here is my current code:这是我当前的代码:

#name of infile to be the name of the file being ingested
infile= ('file1.csv')
infile2= ('file2.csv')
infile3= ('file3.csv')
#length of the infile for naming purposes
size = len(infile)
size2 = len(infile2)
#name outfile to be the name of the desired output file
outfile = (infile[:size -4]+"_"+infile2[:size2 -4]+"_output.csv")


#Data read in as dataframe
df = pd.read_csv(infile, encoding = "ISO-8859-1", engine= 'python')
df2 = pd.read_csv(infile2, encoding = "ISO-8859-1", engine= 'python')
df3 = pd.read_csv(infile3, encoding = "ISO-8859-1", engine= 'python')

df_merge =df.merge(df2, on='Order_Number', how='left')
df_final = pd.merge(df_merge,df3[['PkgID?','Printer', 'Inserter']],on='PkgID?', how='left')


The above code combines the various data I am working with into a single dataframe, but has duplicate rows based off order number as mentioned above.上面的代码将我正在使用的各种数据组合到一个数据框中,但是如上所述,根据订单号有重复的行。

You need to reshape your dataframe.您需要重塑数据框。 This should work这应该工作

res = pd.DataFrame(df.drop(columns='Order_Number').values.reshape(len(df)//2, 4), # reshape Inventory code and description columns
                   columns=['INVENTORY CODE', 'description1', 'INVENTORY CODE2', 'description2'], # set new column names
                   index=df['Order_Number'].drop_duplicates()).reset_index() # set index by Order_Number and reset_index for a new column
res

在此处输入图像描述

LIMITATIONS: the separator used here is |限制:这里使用的分隔符是| which mustn't be in your data不能在您的数据中

import pandas as pd

df = pd.read_csv('df.csv')

# combine based on Order_Number
df_ = df.groupby('Order_Number').agg({'INVENTORY CODE':'|'.join,'description':'|'.join}).reset_index()

# split and expand
df_1 = df_['INVENTORY CODE'].str.split('|', expand=True).add_prefix('INVENTORY CODE_')
df_2 = df_['description'].str.split('|', expand=True).add_prefix('description_')

# combine again and remove the ['INVENTORY CODE', 'description'] columns
df_ = pd.concat([df_, df_1, df_2], axis=1).drop(['INVENTORY CODE', 'description'], axis=1)

# sort the columns to match the output
base_cols = [col for col in df_.columns if col.split('_', 1)[0] not in ['INVENTORY CODE', 'description']]
sort_columns = [col for col in df_.columns if col not in base_cols]

df_ = df_[base_cols + sorted(sort_columns, key=lambda x: (x=='Order_Number', int(x.split('_')[-1])))]

output:输出:

Order_Number订单号 INVENTORY CODE_0库存代码_0 description_0描述_0 INVENTORY CODE_1库存代码_1 description_1描述_1 INVENTORY CODE_2库存代码_2 description_2描述_2 INVENTORY CODE_3库存代码_3 description_3描述_3
4304 4304 STDROFENHBSM608.511WH STDROFENHBSM608.511WH 8-1/2 x 11, 60# Soporset Digital HiBright Smooth 8-1/2 x 11, 60# Soporset Digital HiBright Smooth STNDEN695WOL STNDEN695WOL 6 x 9.5 DBL WDW ENVELOPE 6 x 9.5 DBL WDW 信封 STNDEN695WOL STNDEN695WOL 6 x 9.5 DBL WDW ENVELOPE 6 x 9.5 DBL WDW 信封
4305 4305 STNDEN695WOL STNDEN695WOL 6 x 9.5 DBL WDW ENVELOPE 6 x 9.5 DBL WDW 信封 STNDEN695WOL STNDEN695WOL 6 x 9.5 DBL WDW ENVELOPE 6 x 9.5 DBL WDW 信封 STNDEN695WOL STNDEN695WOL 6 x 9.5 DBL WDW ENVELOPE 6 x 9.5 DBL WDW 信封 STNDEN695WOL STNDEN695WOL 6 x 9.5 DBL WDW ENVELOPE 6 x 9.5 DBL WDW 信封

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何通过添加带有熊猫的列将多行合并为单行? - How to combine multiple rows into single row by adding columns with pandas? Pandas- Groupby多列,均值来自一列 - Pandas- Groupby multiple columns and mean from a single column 如何基于多列的值将多行与python pandas合并为单行? - How to combine multiple rows into a single row with python pandas based on the values of multiple columns? 如何将多个无序行与大熊猫合并为单个行[CSV文件] - How to combine multiple unordered rows into a single row with pandas [CSV file] 如何在 pandas 中使用 id 将多行合并为一行多列(将具有相同 id 的多条记录聚集到一条记录中) - How to combine multiple rows into a single row with many columns in pandas using an id (clustering multiple records with same id into one record) 将单行中具有多列的数据集分成多行 - 熊猫 - break into multiple rows a dataset with multiple columns in a single row - pandas 熊猫将多行转换为单行,在 2 个索引上具有多列 - pandas multiple rows to single row with multiple columns on 2 indexes Pandas:将多行中的数据添加到单行的额外列中 - Pandas: Adding data from multiple rows into extra columns for a single row Pandas 基于列将多行合并为单行 - Pandas Merge multiple rows into a single row based on columns 熊猫-根据另一列的行总数创建新列的正确方法(试图在副本上设置的值)? - Pandas- correct way to create a new column based on the sum of rows of another column (value trying to be set on a copy)?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM