[英]Pandas: Combine string values from 2 columns in data frame into a new column using rules
I have a Pandas df where I am trying to combine string values from 2 different columns into a single new column in the df using rules. 我有一个Pandas df,我试图将来自2个不同列的字符串值组合到使用规则的df中的单个新列中。 I am running into problems because I am not able to get the code to select the appropriate values in the columns based on the business logic/rules I am trying to use.
我遇到了问题,因为我无法根据我尝试使用的业务逻辑/规则获取代码以在列中选择适当的值。
Below is an example of the df: 以下是df的示例:
ID Date Original New
ID1000 1/1/2019 High
ID2000 4/10/2019 Moderate
ID3000 4/15/2019 High Critical
ID4000 1/30/2019 Low Moderate
#code to replicate example df
import pandas as pd
lst= [['ID1000','1/1/2019','High',''],
['ID2000','4/10/2019','','Moderate'],
['ID3000','4/15/2019','High','Critical'],
['ID4000','1/30/2019','Low','Moderate'],
]
df= pd.DataFrame(lst,columns=['ID','Date','Original','New'], dtype = float)
df
From this df I need to create a [Combined] column that follows the following rules: 从这个df我需要创建一个遵循以下规则的[Combined]列:
The resulting df should look like this: 生成的df应如下所示:
ID Date Original New Combined
ID1000 1/1/2019 High High
ID2000 4/10/2019 Moderate Moderate
ID3000 4/15/2019 High Critical Critical
ID4000 1/30/2019 Low Moderate Moderate
I tried applying the rules above similar to an Excel nested IF, but without any luck. 我尝试应用上面的规则类似于Excel嵌套IF,但没有任何运气。 This is the code I used.
这是我使用的代码。
['Date']=pd.to_datetime(result['Date'])
[Combined]= if {['Date']<4/4/2019,[Original],
if{['Date']>=4/4/2019,[New],
if{['Date']<4/4/2019 & ['New']>0,[New]}}}
I was expecting a new column [Combined] to be created and that the values in the column would be: "High","Moderate","Critical", "Moderate". 我期待创建一个新列[Combined],并且列中的值将是:“High”,“Moderate”,“Critical”,“Moderate”。
When I applied the logic above, I got this 'invalid syntax' error below: 当我应用上面的逻辑时,我在下面得到了“无效语法”错误:
File "<ipython-input-13-33cb4e8d5ca7>", line 3
[Combined]= if {['Date']<4/4/2019,[Original],
^
SyntaxError: invalid syntax
I have looked over the past few days in the documentation, but I can't figure out how to combine values from 2 columns into a new column with the rules. 我在文档中查看过去几天,但我无法弄清楚如何将2列中的值组合成一个带有规则的新列。 Also, I haven't come across a use case similar to this one with strings.
另外,我没有遇到类似于这个字符串的用例。
Can someone help me with this? 有人可以帮我弄这个吗? Perhaps there is a better approach.
也许有更好的方法。 Thanks in advance.
提前致谢。
I am using np.select
from numpy
我正在使用来自
numpy
np.select
import numpy as np
con1=df.Date<'2019-04-04'
con2=df.Date>='2019-04-04'
con3=con1&df.New.ne('')
df['Combine']=np.select([con1,con2,con3],[df.Original,df.New,df.New])
df
Out[84]:
ID Date Original New Combine
0 ID1000 2019-01-01 High High
1 ID2000 2019-04-10 Moderate Moderate
2 ID3000 2019-04-15 High Critical Critical
3 ID4000 2019-01-30 Low Moderate Low
You can combine your condition 2&3 and then use np.where()
: 您可以组合条件2和3,然后使用
np.where()
:
df['Date'] = pd.to_datetime(df.Date)
df['Combine'] = np.where((df.Date >= pd.datetime(2019,4,4)) | (df.New.ne('') & ~df.New.isnull()), df.New, df.Original)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.