Python 数据框中的多个 If 语句

Question

I have a dataframe called df1 and a list of dataframes called list .我有一个名为df1的数据df1和一个名为list的数据框list 。

In each of them exists columns date, like 2019-01-01 and another columns ID (not unique), and some other stuff.在它们中的每一个中都存在日期列，例如 2019-01-01 和另一个列 ID（不是唯一的），以及其他一些东西。

Example:例子：

df1

ID   date         Name
111  2019-01-01   John
222  2019-01-01   Smith
333  2019-01-01   Sam

list = [df_A, df_B, df_C]

# Example from a list:

df_A 

ID   date        Name
111  2019-01-02  Katrin
222  2019-01-02  Ivan
333  2019-01-02  Leo

df_B

ID   date         Name
111  2019-01-01   John
222  2019-01-01   Smith
333  2019-01-01   Sam

df_C

ID   date        Name
111  2019-01-09  Sam_1
222  2019-01-09  Leo_1
333  2019-01-09  Marcel

I want to append values to df1 based on ID and Date from this list of dataframes.我想根据此数据帧列表中的 ID 和日期将值附加到df1 。

Conditions are:条件是：

If the max date for ID 111 in df1 are equal than the max date for ID 111 in one of the df from list, then do nothing.如果 df1 中 ID 111 的最大日期等于 df from 列表之一中 ID 111 的最大日期，则什么都不做。
If the max date for ID 222 in df1 are less than the max date for ID 222 in one of the df from list, then do some stuff.如果 df1 中 ID 222 的最大日期小于其中一个 df 列表中 ID 222 的最大日期，则执行一些操作。

How the putput should look like: putput 应该是这样的：

df1

ID   date         Name
111  2019-01-01   John
222  2019-01-01   Smith
333  2019-01-01   Sam
111  2019-01-02  Katrin
222  2019-01-02  Ivan
333  2019-01-02  Leo
111  2019-01-09  Sam_1
222  2019-01-09  Leo_1
333  2019-01-09  Marcel

The date from df_B is equal to df1, so we don't update df1, but for other 2 dfs we need to append their values to df1 df_B 的日期等于 df1，所以我们不更新 df1，但对于其他 2 个 dfs，我们需要将它们的值附加到 df1

Answer 1

Don't use list as a variable name as it's a Python builtin data type.不要使用list作为变量名，因为它是 Python 内置数据类型。 I'd also concatenate df_A , df_B , and df_C into a single dataframe for easier manipulation:我还将df_A 、 df_B和df_C到一个数据帧中以便于操作：

# Concatenate df_A, df_B, df_C into a single frame, called df2
df2 = pd.concat([df_A, df_B, df_C], ignore_index=True)

# Line up df1 and df2 by ID so we can compare their dates
compare = df1[['ID', 'date']].merge(df2, on='ID', suffixes=('1', '2'))

# For cases where date1 < date2, append them to df1
new_df = compare.query('date1 < date2').rename(columns={'date2': 'date'})[['ID', 'date', 'Name']]
df1 = df1.append(new_df, ignore_index=True)

Answer 2

i would use pandas.Dataframe.groupby and pandas.Dataframe.append(assuming your date column is in date format) and do something like:我会使用 pandas.Dataframe.groupby 和 pandas.Dataframe.append（假设您的日期列是日期格式）并执行以下操作：

# i don't think you should use list so i renamed it to my_list
my_list = [df_A, df_B, df_C]

for cdf in my_list:
    # in original dataframe group by ID get max date
    group_df1 =  df1.groupby(['ID']).max()['date']
    # in other dataframe group by ID get max date
    group_cdf = cdf.groupby(['ID']).max()['date']
    # get IDs to add 
    res = group_cdf > group_df1
    group_cdf = group_cdf.loc[res[res==True].index]    
    df1 = df1.append(cdf.loc[cdf['ID'].isin(group_cdf.index) & cdf['date'].isin(group_cdf)])
print(df1)

here is a full code with your example:这是您的示例的完整代码：

df1 = pd.DataFrame(          \
[[111,'2019-01-01','John'], \
[222,'2019-01-01','Smith'], \
[333,'2019-01-01','Sam']])
df1.columns = ['ID','date','Name']
df1['date'] = pd.to_datetime(df1['date'])


df_A = pd.DataFrame(          \
[[111,'2019-01-02','Katrin'], \
[222,'2019-01-02','Ivan'], \
[333,'2019-01-02','Leo']])
df_A.columns = ['ID','date','Name']
df_A['date'] = pd.to_datetime(df_A['date'])

df_B = pd.DataFrame(          \
[[111,'2019-01-01','John'], \
[222,'2019-01-01','Smith'], \
[333,'2019-01-01','Sam']])
df_B.columns = ['ID','date','Name']
df_B['date'] = pd.to_datetime(df_B['date'])


df_C = pd.DataFrame(          \
[[111,'2019-01-09','Sam_1'], \
[222,'2019-01-09','Leo_1'], \
[333,'2019-01-09','Marcel']])
df_C.columns = ['ID','date','Name']
df_C['date'] = pd.to_datetime(df_C['date'])

my_list = [df_A, df_B, df_C]

for cdf in my_list:
    group_df1 =  df1.groupby(['ID']).max()['date']
    group_cdf = cdf.groupby(['ID']).max()['date'] 
    res = group_cdf > group_df1
    group_cdf = group_cdf.loc[res[res==True].index]    
    df1 = df1.append(cdf.loc[cdf['ID'].isin(group_cdf.index) & cdf['date'].isin(group_cdf)])
print(df1)

i get the following result:我得到以下结果：

   ID  date       Name   
0  111 2019-01-01    John
1  222 2019-01-01   Smith
2  333 2019-01-01     Sam
0  111 2019-01-02  Katrin
1  222 2019-01-02    Ivan
2  333 2019-01-02     Leo
0  111 2019-01-09   Sam_1
1  222 2019-01-09   Leo_1
2  333 2019-01-09  Marcel

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

Python 数据框中的多个 If 语句

问题描述

2 个解决方案

解决方案1
1 2019-08-21 15:07:03

解决方案2
0 已采纳 2019-08-21 14:40:41

Python 数据框中的多个 If 语句

问题描述

2 个解决方案

解决方案1 1 2019-08-21 15:07:03

解决方案2 0 已采纳 2019-08-21 14:40:41

解决方案1
1 2019-08-21 15:07:03

解决方案2
0 已采纳 2019-08-21 14:40:41