简体   繁体   English

Pandas:根据多列条件新建列

[英]Pandas: Create New Column Based on Conditions of Multiple Columns

I have the following dataset:我有以下数据集:


 ID            AAA                  BBB                  CCC                   DDD
1234    {'2015-01-01': 1}    {'2016-01-01': 1,   {'2015-01-02': 1}     {'2016-01-02': 1} 
                             '2016-02-15': 2}
1235    {'2017-11-05': 1,    {'2018-01-05': 1}         NaN             {'2017-01-06': 1} 
        '2018-06-05': 1}  

In the cell, 'key' is the date when someone is hospitalized and 'value' is the number of days.在单元格中,“key”是某人住院的日期,“value”是天数。

I need to create a new column for hospitalization ('Yes' or 'No').我需要为住院创建一个新列(“是”或“否”)。

The condition to be 'yes': “是”的条件:

  1. The column [AAA or BBB] as well as the column [CCC or DDD] both should have filled-in dates. [AAA 或 BBB] 列和 [CCC 或 DDD] 列都应填写日期。
  2. The date in the column [CCC or DDD] should be the next day of the date in the column [AAA or BBB]. [CCC 或 DDD] 栏中的日期应为 [AAA 或 BBB] 栏中日期的第二天。

For example, if [AAA or BBB] has a date of January 01, 2020. For 'yes', the date in [CCC or DDD] should be January 02, 2020.例如,如果 [AAA 或 BBB] 的日期为 2020 年 1 月 1 日。如果选择“是”,则 [CCC 或 DDD] 中的日期应为 2020 年 1 月 2 日。

Desired output:所需的 output:

 ID            AAA              BBB                  CCC                     DDD               Hospitalized
1234    {'2015-01-01': 1}    {'2016-01-01': 1,   {'2015-01-02': 1}     {'2016-01-02': 1}            Yes
                             '2016-02-15': 2}
1235    {'2017-11-05': 1,    {'2018-01-05': 1}         NaN                  NaN                      No
        '2018-06-05': 1}  
1236    {'2017-11-05': 1,    {'2018-01-05': 1}         NaN             {'2018-01-06': 1}            Yes 
        '2018-06-05': 1}  
           

I have tried the following code, but this captures if the dates are present but doesn't capture the timestamp.我尝试了以下代码,但这会捕获日期是否存在但不捕获时间戳。

df['hospitalized'] = (df
                     .apply(lambda r: 'yes' if (1 if pd.notna(r.loc[['AAA', 'BBB']]).any() else 0) + 
                                               (1 if pd.notna(r.loc[['CCC', 'DDD']]).any() else 0) > 1 
                            else 'no', axis=1))

Any suggestions would be appreciated.任何建议,将不胜感激。 Thanks!谢谢!

df:东风:

df = pd.DataFrame([[1234, {'2015-01-01': 1}, {'2016-01-01': 1, '2016-02-15': 2}, {'2015-01-02': 1}, {'2016-01-02': 1}], [1235, {'2017-11-05': 1,'2018-06-05': 1}, {'2018-01-05': 1}, np.nan, np.nan]], columns= ['ID', 'AAA', 'BBB', 'CCC', 'DDD'])

Try:尝试:

import itertools
from dateutil import parser
import datetime
def func(x):
    A_B_dates = list(map(parser.parse,list(itertools.chain(*[x['AAA'].keys()] + [x['BBB'].keys()]))))
    C_D_dates = list(map(parser.parse,list(itertools.chain(*[x['CCC'].keys()] + [x['DDD'].keys()]))))
    for date1 in A_B_dates:
        if date1+datetime.timedelta(days=1) in C_D_dates:
            return 'yes'
    return 'no'

df = df.where(df.notna(), lambda x: [{}])    
df['Hospitalised'] = df.apply(func, axis=1)

df:东风:

    ID       AAA                                BBB                                CCC                  DDD                 Hospitalised
0   1234    {'2015-01-01': 1}                   {'2016-01-01': 1, '2016-02-15': 2}  {'2015-01-02': 1}   {'2016-01-02': 1}   yes
1   1235    {'2017-11-05': 1, '2018-06-05': 1}  {'2018-01-05': 1}                   {}                  {'2017-01-06': 1}   no

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用基于 2 列的多个条件在 pandas 中创建新列? - How to use multiple conditions based on 2 columns to create the new column in pandas? 根据多个条件将现有列的值分配给 Pandas 中的新列 - Assign value of existing column to new columns in pandas based on multiple conditions 熊猫-如何根据多列的条件创建具有3个输出的列 - Pandas - How to create a column with 3 outputs based on conditions on multiple columns 基于多个条件在 Pandas 数据框中创建一个新列 - Create a new column in pandas dataframe based on multiple conditions 如何按多列分组并根据Python中的条件创建新列? - How to group by multiple columns and create a new column based on conditions in Python? 根据多个其他列的条件新建 Python DataFrame 列 - Create new Python DataFrame column based on conditions of multiple other columns 根据其他列的条件创建新列 - Create new column based on conditions of other columns 根据多列中的值和相同条件在熊猫中创建新列 - Create a new column in pandas based on values in multiple columns and the same condition Pandas,基于跨多行的其他列创建新列 - Pandas, create new column based on other columns across multiple rows 适用于熊猫的替代方法-基于多个列创建新列 - Pandas alternative to apply - to create new column based on multiple columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM