简体   繁体   English

Pandas:使用规则将数据框中2列的字符串值组合到新列中

[英]Pandas: Combine string values from 2 columns in data frame into a new column using rules

I have a Pandas df where I am trying to combine string values from 2 different columns into a single new column in the df using rules. 我有一个Pandas df,我试图将来自2个不同列的字符串值组合到使用规则的df中的单个新列中。 I am running into problems because I am not able to get the code to select the appropriate values in the columns based on the business logic/rules I am trying to use. 我遇到了问题,因为我无法根据我尝试使用的业务逻辑/规则获取代码以在列中选择适当的值。

Below is an example of the df: 以下是df的示例:


ID      Date        Original    New
ID1000  1/1/2019    High    
ID2000  4/10/2019               Moderate
ID3000  4/15/2019   High        Critical
ID4000  1/30/2019   Low         Moderate


#code to replicate example df

import pandas as pd

lst= [['ID1000','1/1/2019','High',''],
      ['ID2000','4/10/2019','','Moderate'],
      ['ID3000','4/15/2019','High','Critical'],
      ['ID4000','1/30/2019','Low','Moderate'],
     ]

df= pd.DataFrame(lst,columns=['ID','Date','Original','New'], dtype = float)
df


From this df I need to create a [Combined] column that follows the following rules: 从这个df我需要创建一个遵循以下规则的[Combined]列:

  1. If the [Date] is < 4/4/2019 use the value from the [Original] column row 如果[日期] <4/4/2019,请使用[原始]列行中的值
  2. If the [Date] is >= 4/4/2019 use the value from the [New] column row 如果[Date]> = 4/4/2019,请使用[New]列行中的值
  3. If the [Date] is < 4/4/2019 and there is a [New] column row value, use the value from the [New] column row. 如果[日期] <4/4/2019且存在[新]列行值,请使用[新建]列行中的值。

The resulting df should look like this: 生成的df应如下所示:


ID      Date        Original  New        Combined
ID1000  1/1/2019    High                 High
ID2000  4/10/2019             Moderate   Moderate
ID3000  4/15/2019   High      Critical   Critical
ID4000  1/30/2019   Low       Moderate   Moderate

I tried applying the rules above similar to an Excel nested IF, but without any luck. 我尝试应用上面的规则类似于Excel嵌套IF,但没有任何运气。 This is the code I used. 这是我使用的代码。


['Date']=pd.to_datetime(result['Date'])

[Combined]= if {['Date']<4/4/2019,[Original],
                if{['Date']>=4/4/2019,[New],
                if{['Date']<4/4/2019 & ['New']>0,[New]}}}

I was expecting a new column [Combined] to be created and that the values in the column would be: "High","Moderate","Critical", "Moderate". 我期待创建一个新列[Combined],并且列中的值将是:“High”,“Moderate”,“Critical”,“Moderate”。

When I applied the logic above, I got this 'invalid syntax' error below: 当我应用上面的逻辑时,我在下面得到了“无效语法”错误:

File "<ipython-input-13-33cb4e8d5ca7>", line 3
    [Combined]= if {['Date']<4/4/2019,[Original],
                 ^
SyntaxError: invalid syntax

I have looked over the past few days in the documentation, but I can't figure out how to combine values from 2 columns into a new column with the rules. 我在文档中查看过去几天,但我无法弄清楚如何将2列中的值组合成一个带有规则的新列。 Also, I haven't come across a use case similar to this one with strings. 另外,我没有遇到类似于这个字符串的用例。

Can someone help me with this? 有人可以帮我弄这个吗? Perhaps there is a better approach. 也许有更好的方法。 Thanks in advance. 提前致谢。

I am using np.select from numpy 我正在使用来自numpy np.select

import numpy as np 
con1=df.Date<'2019-04-04'
con2=df.Date>='2019-04-04'
con3=con1&df.New.ne('')
df['Combine']=np.select([con1,con2,con3],[df.Original,df.New,df.New])
df
Out[84]: 
       ID       Date Original       New   Combine
0  ID1000 2019-01-01     High                High
1  ID2000 2019-04-10           Moderate  Moderate
2  ID3000 2019-04-15     High  Critical  Critical
3  ID4000 2019-01-30      Low  Moderate       Low

You can combine your condition 2&3 and then use np.where() : 您可以组合条件2和3,然后使用np.where()

df['Date'] = pd.to_datetime(df.Date)
df['Combine'] = np.where((df.Date >= pd.datetime(2019,4,4)) | (df.New.ne('') & ~df.New.isnull()), df.New, df.Original)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将Pandas数据框列值合并到新列中 - Combine Pandas data frame column values into new column 使用 Pandas 从现有列创建新列到数据框 - Create a new column to data frame from existing columns using Pandas 如果列中的值匹配,则合并熊猫数据框 - Combine Pandas Data Frame if Values Match in a Columns 根据 pandas 数据帧中的其他列值组合列值 - Combine column values based on the other column values in pandas data frame Python数据框:创建新列,该列有条件地连接1或3个其他列中的字符串值 - Python Data frame: Create New Column that Conditionally Concatenates String Values from 1 or 3 Other Columns 如何从 pandas 数据框的列值创建新行 - How to create a new rows from column values of pandas data frame 拼接并合并两列以形成新的数据框(熊猫) - Splice and combine two columns to form a new data frame (Pandas) 如何使用多列中的值对pandas数据框进行排序? - How to sort pandas data frame using values from several columns? 将转换后的值从字符串追加到数据框中的新列 - Appending converted values from a string to a new column in a data frame 如何使用 pandas 数据框将数据框的每一列值添加到一张一张的新工作表中 - How to add each column of a data frame values in one by one new sheets using pandas data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM