简体   繁体   中英

Pandas: Combine string values from 2 columns in data frame into a new column using rules

I have a Pandas df where I am trying to combine string values from 2 different columns into a single new column in the df using rules. I am running into problems because I am not able to get the code to select the appropriate values in the columns based on the business logic/rules I am trying to use.

Below is an example of the df:


ID      Date        Original    New
ID1000  1/1/2019    High    
ID2000  4/10/2019               Moderate
ID3000  4/15/2019   High        Critical
ID4000  1/30/2019   Low         Moderate


#code to replicate example df

import pandas as pd

lst= [['ID1000','1/1/2019','High',''],
      ['ID2000','4/10/2019','','Moderate'],
      ['ID3000','4/15/2019','High','Critical'],
      ['ID4000','1/30/2019','Low','Moderate'],
     ]

df= pd.DataFrame(lst,columns=['ID','Date','Original','New'], dtype = float)
df


From this df I need to create a [Combined] column that follows the following rules:

  1. If the [Date] is < 4/4/2019 use the value from the [Original] column row
  2. If the [Date] is >= 4/4/2019 use the value from the [New] column row
  3. If the [Date] is < 4/4/2019 and there is a [New] column row value, use the value from the [New] column row.

The resulting df should look like this:


ID      Date        Original  New        Combined
ID1000  1/1/2019    High                 High
ID2000  4/10/2019             Moderate   Moderate
ID3000  4/15/2019   High      Critical   Critical
ID4000  1/30/2019   Low       Moderate   Moderate

I tried applying the rules above similar to an Excel nested IF, but without any luck. This is the code I used.


['Date']=pd.to_datetime(result['Date'])

[Combined]= if {['Date']<4/4/2019,[Original],
                if{['Date']>=4/4/2019,[New],
                if{['Date']<4/4/2019 & ['New']>0,[New]}}}

I was expecting a new column [Combined] to be created and that the values in the column would be: "High","Moderate","Critical", "Moderate".

When I applied the logic above, I got this 'invalid syntax' error below:

File "<ipython-input-13-33cb4e8d5ca7>", line 3
    [Combined]= if {['Date']<4/4/2019,[Original],
                 ^
SyntaxError: invalid syntax

I have looked over the past few days in the documentation, but I can't figure out how to combine values from 2 columns into a new column with the rules. Also, I haven't come across a use case similar to this one with strings.

Can someone help me with this? Perhaps there is a better approach. Thanks in advance.

I am using np.select from numpy

import numpy as np 
con1=df.Date<'2019-04-04'
con2=df.Date>='2019-04-04'
con3=con1&df.New.ne('')
df['Combine']=np.select([con1,con2,con3],[df.Original,df.New,df.New])
df
Out[84]: 
       ID       Date Original       New   Combine
0  ID1000 2019-01-01     High                High
1  ID2000 2019-04-10           Moderate  Moderate
2  ID3000 2019-04-15     High  Critical  Critical
3  ID4000 2019-01-30      Low  Moderate       Low

You can combine your condition 2&3 and then use np.where() :

df['Date'] = pd.to_datetime(df.Date)
df['Combine'] = np.where((df.Date >= pd.datetime(2019,4,4)) | (df.New.ne('') & ~df.New.isnull()), df.New, df.Original)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM