I have a Pandas df where I am trying to combine string values from 2 different columns into a single new column in the df using rules. I am running into problems because I am not able to get the code to select the appropriate values in the columns based on the business logic/rules I am trying to use.
Below is an example of the df:
ID Date Original New
ID1000 1/1/2019 High
ID2000 4/10/2019 Moderate
ID3000 4/15/2019 High Critical
ID4000 1/30/2019 Low Moderate
#code to replicate example df
import pandas as pd
lst= [['ID1000','1/1/2019','High',''],
['ID2000','4/10/2019','','Moderate'],
['ID3000','4/15/2019','High','Critical'],
['ID4000','1/30/2019','Low','Moderate'],
]
df= pd.DataFrame(lst,columns=['ID','Date','Original','New'], dtype = float)
df
From this df I need to create a [Combined] column that follows the following rules:
The resulting df should look like this:
ID Date Original New Combined
ID1000 1/1/2019 High High
ID2000 4/10/2019 Moderate Moderate
ID3000 4/15/2019 High Critical Critical
ID4000 1/30/2019 Low Moderate Moderate
I tried applying the rules above similar to an Excel nested IF, but without any luck. This is the code I used.
['Date']=pd.to_datetime(result['Date'])
[Combined]= if {['Date']<4/4/2019,[Original],
if{['Date']>=4/4/2019,[New],
if{['Date']<4/4/2019 & ['New']>0,[New]}}}
I was expecting a new column [Combined] to be created and that the values in the column would be: "High","Moderate","Critical", "Moderate".
When I applied the logic above, I got this 'invalid syntax' error below:
File "<ipython-input-13-33cb4e8d5ca7>", line 3
[Combined]= if {['Date']<4/4/2019,[Original],
^
SyntaxError: invalid syntax
I have looked over the past few days in the documentation, but I can't figure out how to combine values from 2 columns into a new column with the rules. Also, I haven't come across a use case similar to this one with strings.
Can someone help me with this? Perhaps there is a better approach. Thanks in advance.
I am using np.select
from numpy
import numpy as np
con1=df.Date<'2019-04-04'
con2=df.Date>='2019-04-04'
con3=con1&df.New.ne('')
df['Combine']=np.select([con1,con2,con3],[df.Original,df.New,df.New])
df
Out[84]:
ID Date Original New Combine
0 ID1000 2019-01-01 High High
1 ID2000 2019-04-10 Moderate Moderate
2 ID3000 2019-04-15 High Critical Critical
3 ID4000 2019-01-30 Low Moderate Low
You can combine your condition 2&3 and then use np.where()
:
df['Date'] = pd.to_datetime(df.Date)
df['Combine'] = np.where((df.Date >= pd.datetime(2019,4,4)) | (df.New.ne('') & ~df.New.isnull()), df.New, df.Original)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.