简体   繁体   中英

Multiple If Statements in Python Dataframe

I have a dataframe called df1 and a list of dataframes called list .

In each of them exists columns date, like 2019-01-01 and another columns ID (not unique), and some other stuff.

Example:

df1

ID   date         Name
111  2019-01-01   John
222  2019-01-01   Smith
333  2019-01-01   Sam

list = [df_A, df_B, df_C]

# Example from a list:

df_A 

ID   date        Name
111  2019-01-02  Katrin
222  2019-01-02  Ivan
333  2019-01-02  Leo

df_B

ID   date         Name
111  2019-01-01   John
222  2019-01-01   Smith
333  2019-01-01   Sam

df_C

ID   date        Name
111  2019-01-09  Sam_1
222  2019-01-09  Leo_1
333  2019-01-09  Marcel

I want to append values to df1 based on ID and Date from this list of dataframes.

Conditions are:

  • If the max date for ID 111 in df1 are equal than the max date for ID 111 in one of the df from list, then do nothing.
  • If the max date for ID 222 in df1 are less than the max date for ID 222 in one of the df from list, then do some stuff.

How the putput should look like:

df1

ID   date         Name
111  2019-01-01   John
222  2019-01-01   Smith
333  2019-01-01   Sam
111  2019-01-02  Katrin
222  2019-01-02  Ivan
333  2019-01-02  Leo
111  2019-01-09  Sam_1
222  2019-01-09  Leo_1
333  2019-01-09  Marcel

The date from df_B is equal to df1, so we don't update df1, but for other 2 dfs we need to append their values to df1

Don't use list as a variable name as it's a Python builtin data type. I'd also concatenate df_A , df_B , and df_C into a single dataframe for easier manipulation:

# Concatenate df_A, df_B, df_C into a single frame, called df2
df2 = pd.concat([df_A, df_B, df_C], ignore_index=True)

# Line up df1 and df2 by ID so we can compare their dates
compare = df1[['ID', 'date']].merge(df2, on='ID', suffixes=('1', '2'))

# For cases where date1 < date2, append them to df1
new_df = compare.query('date1 < date2').rename(columns={'date2': 'date'})[['ID', 'date', 'Name']]
df1 = df1.append(new_df, ignore_index=True)

i would use pandas.Dataframe.groupby and pandas.Dataframe.append(assuming your date column is in date format) and do something like:

# i don't think you should use list so i renamed it to my_list
my_list = [df_A, df_B, df_C]

for cdf in my_list:
    # in original dataframe group by ID get max date
    group_df1 =  df1.groupby(['ID']).max()['date']
    # in other dataframe group by ID get max date
    group_cdf = cdf.groupby(['ID']).max()['date']
    # get IDs to add 
    res = group_cdf > group_df1
    group_cdf = group_cdf.loc[res[res==True].index]    
    df1 = df1.append(cdf.loc[cdf['ID'].isin(group_cdf.index) & cdf['date'].isin(group_cdf)])
print(df1)

here is a full code with your example:

df1 = pd.DataFrame(          \
[[111,'2019-01-01','John'], \
[222,'2019-01-01','Smith'], \
[333,'2019-01-01','Sam']])
df1.columns = ['ID','date','Name']
df1['date'] = pd.to_datetime(df1['date'])


df_A = pd.DataFrame(          \
[[111,'2019-01-02','Katrin'], \
[222,'2019-01-02','Ivan'], \
[333,'2019-01-02','Leo']])
df_A.columns = ['ID','date','Name']
df_A['date'] = pd.to_datetime(df_A['date'])

df_B = pd.DataFrame(          \
[[111,'2019-01-01','John'], \
[222,'2019-01-01','Smith'], \
[333,'2019-01-01','Sam']])
df_B.columns = ['ID','date','Name']
df_B['date'] = pd.to_datetime(df_B['date'])


df_C = pd.DataFrame(          \
[[111,'2019-01-09','Sam_1'], \
[222,'2019-01-09','Leo_1'], \
[333,'2019-01-09','Marcel']])
df_C.columns = ['ID','date','Name']
df_C['date'] = pd.to_datetime(df_C['date'])

my_list = [df_A, df_B, df_C]

for cdf in my_list:
    group_df1 =  df1.groupby(['ID']).max()['date']
    group_cdf = cdf.groupby(['ID']).max()['date'] 
    res = group_cdf > group_df1
    group_cdf = group_cdf.loc[res[res==True].index]    
    df1 = df1.append(cdf.loc[cdf['ID'].isin(group_cdf.index) & cdf['date'].isin(group_cdf)])
print(df1)

i get the following result:

   ID  date       Name   
0  111 2019-01-01    John
1  222 2019-01-01   Smith
2  333 2019-01-01     Sam
0  111 2019-01-02  Katrin
1  222 2019-01-02    Ivan
2  333 2019-01-02     Leo
0  111 2019-01-09   Sam_1
1  222 2019-01-09   Leo_1
2  333 2019-01-09  Marcel  

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM