简体   繁体   English

如何从 pandas 数据框的列值创建新行

[英]How to create a new rows from column values of pandas data frame

I have dataframe like below我有 dataframe 如下所示

Input输入

Date         Country    Type         Zip_Incl     Zip_Excl
10/4/2020      FR   Regional        57_67_68    
2/1/2020       GB   Regional                      AB_DD
17/3/2021      GB   Regional        BT_TY         TS_TN
18/3/2021      GB   Regional        
19/1/2021     IN    Regional                      68

I need to transform the input based on below conditions:我需要根据以下条件转换输入:

1)If Zip_incl is not empty then value to Zip_incl should be passed into Zip_Final 1)如果 Zip_incl 不为空,则 Zip_incl 的值应传递给 Zip_Final

2)IF Zip_incl and Zip_Excl values are present then value of Zip_incl should be passed into Zip_Final 2)如果存在 Zip_incl 和 Zip_Excl 值,则 Zip_incl 的值应传递给 Zip_Final

3)If Zip incl is empty and value is present for Zip_Excl is present then Zip_Excl should be passed to Zip_Final 3) 如果 Zip incl 为空并且存在 Zip_Excl 的值,则 Zip_Excl 应传递给 Zip_Final

Output Output

Date      Country   Type    Zip_Incl     Zip_Excl   Zip_Final
10/4/2020   FR  Regional     57                     57
10/4/2020   FR  Regional     67                     67
10/4/2020   FR  Regional     68                     68
2/1/2020    GB  Regional                 AB         AB
2/1/2020    GB  Regional                 DD         DD
17/3/2021   GB  Regional     BT          TS         BT
17/3/2021   GB  Regional     TY          TN         TY
18/3/2021   GB  Regional            
19/1/2021   IN  Regional                 68         68

How can this be done?如何才能做到这一点?

In your case we can do bfill with axis=1 then split the string and explode it在您的情况下,我们可以使用axis=1进行bfill然后split字符串并explode

df['Zip_F']=df.filter(like='Zip').bfill(1).iloc[:,0].str.split('_')
df=df.explode('Zip_F')
df
        Date Country      Type  Zip_Incl Zip_Excl Zip_F
0  10/4/2020      FR  Regional  57_67_68      NaN    57
0  10/4/2020      FR  Regional  57_67_68      NaN    67
0  10/4/2020      FR  Regional  57_67_68      NaN    68
1   2/1/2020      GB  Regional       NaN    AB_DD    AB
1   2/1/2020      GB  Regional       NaN    AB_DD    DD
2  17/3/2021      GB  Regional     BT_TY    TS_TN    BT
2  17/3/2021      GB  Regional     BT_TY    TS_TN    TY
3  18/3/2021      GB  Regional       NaN      NaN   NaN
4  19/1/2021      IN  Regional       NaN       68    68

Update更新

df[['Zip_Incl','Zip_Excl']]=df[['Zip_Incl','Zip_Excl']].mask(df[['Zip_Incl','Zip_Excl']].notnull(),df.Zip_F,axis=0)
df
Out[178]: 
        Date Country      Type Zip_Incl Zip_Excl Zip_F
0  10/4/2020      FR  Regional       57      NaN    57
0  10/4/2020      FR  Regional       67      NaN    67
0  10/4/2020      FR  Regional       68      NaN    68
1   2/1/2020      GB  Regional      NaN       AB    AB
1   2/1/2020      GB  Regional      NaN       DD    DD
2  17/3/2021      GB  Regional       BT       BT    BT
2  17/3/2021      GB  Regional       TY       TY    TY
3  18/3/2021      GB  Regional      NaN      NaN   NaN
4  19/1/2021      IN  Regional      NaN       68    68

Assuming the dtypes are all string I'd consider the following假设 dtypes 都是字符串,我会考虑以下

import pandas as pd
import numpy as np
df = pd.DataFrame({"Type":["Regional"]*5,
                   "Zip_Incl":["57_67_68", "", "BT_TY", "", ""],
                   "Zip_Excl":["","AB_DD", "TS_TN", "", "68"]})

# this tell us the element that are not ""
(~df[["Zip_Incl", "Zip_Excl"]].eq(""))
   Zip_Incl  Zip_Excl
0      True     False
1     False      True
2      True      True
3     False     False
4     False      True

While the following returns the first not empty string in every row虽然以下返回每行中的第一个非空字符串

sel = (~df.eq("")).values.argmax(1)

Now with some numpy tricks we can get your output现在通过一些numpy技巧,我们可以获得您的 output

mat = df[["Zip_Incl", "Zip_Excl"]].values
df["Zip_Final"] = mat[np.arange(mat.shape[0]), sel]

Update In case your df is not that big and you are looking for a not numpy solution you could do更新如果您的 df 不是那么大,并且您正在寻找不是 numpy 解决方案,您可以这样做

def fun(row):
    if row["Zip_Incl"] != "":
        return row["Zip_Incl"]
    elif row["Zip_Excl"] != "":
        return row["Zip_Excl"]
    else:
        return ""

df["Zip_Final"] = df.apply(fun, axis=1)

In both cases the output is在这两种情况下,output 都是

       Type  Zip_Incl Zip_Excl Zip_Final
0  Regional  57_67_68           57_67_68
1  Regional              AB_DD     AB_DD
2  Regional     BT_TY    TS_TN     BT_TY
3  Regional                             
4  Regional                 68        68

Update2: I just realized you want then to split Zip_Final in different row. Update2:我刚刚意识到您想将Zip_Final拆分为不同的行。 Using one of the previous methods you could add these lines使用以前的方法之一,您可以添加这些行

df["Zip_Final"] = df["Zip_Final"].str.split("_")

# you need pandas > 0 .25
df = df.explode("Zip_Final")

print(df)
       Type  Zip_Incl Zip_Excl Zip_Final
0  Regional  57_67_68                 57
0  Regional  57_67_68                 67
0  Regional  57_67_68                 68
1  Regional              AB_DD        AB
1  Regional              AB_DD        DD
2  Regional     BT_TY    TS_TN        BT
2  Regional     BT_TY    TS_TN        TY
3  Regional                             
4  Regional                 68        68

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 pandas 数据框中创建新列 - How to create a new column in a pandas data frame Pandas:在数据框中创建一个新列,其中的值是从现有列 i 计算出来的。 计算最大值 - Pandas: Create a new column in a data frame with values calculated from an already existing column, i. calculate maximum 寻找一种更快的方法在数据框中创建新列,其中包含来自另一列行的字典值 - Looking for a faster way to create a new column in a data frame containing a dictionary values from the rows of another column 如何将lambda函数应用于pandas数据框中的某些行并创建新列 - How to apply a lambda function to certain rows in a data frame in pandas and create a new column 如何向具有不同列号的 Pandas 数据框添加新行? - How to add new rows to a Pandas Data Frame with varying column numbers? 如何在迭代pandas数据帧时创建新列并插入行值 - How to create new column and insert row values while iterating through pandas data frame 如何根据其他行值添加 pandas 数据框列 - How to add pandas data frame column based on other rows values 使用 Pandas 从现有列创建新列到数据框 - Create a new column to data frame from existing columns using Pandas 如何从一列中提取信息以在熊猫数据框中创建新列 - How to extract information from one column to create a new column in a pandas data frame Python Pandas:从List Column的值创建新行 - Python Pandas : Create new rows from values of a List Column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM