[英]Python Dataframe Logical Operations on Multiple Columns using Multiple If statements
[英]Multiple If Statements in Python Dataframe
我有一個名為df1
的數據df1
和一個名為list
的數據框list
。
在它們中的每一個中都存在日期列,例如 2019-01-01 和另一個列 ID(不是唯一的),以及其他一些東西。
例子:
df1
ID date Name
111 2019-01-01 John
222 2019-01-01 Smith
333 2019-01-01 Sam
list = [df_A, df_B, df_C]
# Example from a list:
df_A
ID date Name
111 2019-01-02 Katrin
222 2019-01-02 Ivan
333 2019-01-02 Leo
df_B
ID date Name
111 2019-01-01 John
222 2019-01-01 Smith
333 2019-01-01 Sam
df_C
ID date Name
111 2019-01-09 Sam_1
222 2019-01-09 Leo_1
333 2019-01-09 Marcel
我想根據此數據幀列表中的 ID 和日期將值附加到df1
。
條件是:
putput 應該是這樣的:
df1
ID date Name
111 2019-01-01 John
222 2019-01-01 Smith
333 2019-01-01 Sam
111 2019-01-02 Katrin
222 2019-01-02 Ivan
333 2019-01-02 Leo
111 2019-01-09 Sam_1
222 2019-01-09 Leo_1
333 2019-01-09 Marcel
df_B 的日期等於 df1,所以我們不更新 df1,但對於其他 2 個 dfs,我們需要將它們的值附加到 df1
不要使用list
作為變量名,因為它是 Python 內置數據類型。 我還將df_A
、 df_B
和df_C
到一個數據幀中以便於操作:
# Concatenate df_A, df_B, df_C into a single frame, called df2
df2 = pd.concat([df_A, df_B, df_C], ignore_index=True)
# Line up df1 and df2 by ID so we can compare their dates
compare = df1[['ID', 'date']].merge(df2, on='ID', suffixes=('1', '2'))
# For cases where date1 < date2, append them to df1
new_df = compare.query('date1 < date2').rename(columns={'date2': 'date'})[['ID', 'date', 'Name']]
df1 = df1.append(new_df, ignore_index=True)
我會使用 pandas.Dataframe.groupby 和 pandas.Dataframe.append(假設您的日期列是日期格式)並執行以下操作:
# i don't think you should use list so i renamed it to my_list
my_list = [df_A, df_B, df_C]
for cdf in my_list:
# in original dataframe group by ID get max date
group_df1 = df1.groupby(['ID']).max()['date']
# in other dataframe group by ID get max date
group_cdf = cdf.groupby(['ID']).max()['date']
# get IDs to add
res = group_cdf > group_df1
group_cdf = group_cdf.loc[res[res==True].index]
df1 = df1.append(cdf.loc[cdf['ID'].isin(group_cdf.index) & cdf['date'].isin(group_cdf)])
print(df1)
這是您的示例的完整代碼:
df1 = pd.DataFrame( \
[[111,'2019-01-01','John'], \
[222,'2019-01-01','Smith'], \
[333,'2019-01-01','Sam']])
df1.columns = ['ID','date','Name']
df1['date'] = pd.to_datetime(df1['date'])
df_A = pd.DataFrame( \
[[111,'2019-01-02','Katrin'], \
[222,'2019-01-02','Ivan'], \
[333,'2019-01-02','Leo']])
df_A.columns = ['ID','date','Name']
df_A['date'] = pd.to_datetime(df_A['date'])
df_B = pd.DataFrame( \
[[111,'2019-01-01','John'], \
[222,'2019-01-01','Smith'], \
[333,'2019-01-01','Sam']])
df_B.columns = ['ID','date','Name']
df_B['date'] = pd.to_datetime(df_B['date'])
df_C = pd.DataFrame( \
[[111,'2019-01-09','Sam_1'], \
[222,'2019-01-09','Leo_1'], \
[333,'2019-01-09','Marcel']])
df_C.columns = ['ID','date','Name']
df_C['date'] = pd.to_datetime(df_C['date'])
my_list = [df_A, df_B, df_C]
for cdf in my_list:
group_df1 = df1.groupby(['ID']).max()['date']
group_cdf = cdf.groupby(['ID']).max()['date']
res = group_cdf > group_df1
group_cdf = group_cdf.loc[res[res==True].index]
df1 = df1.append(cdf.loc[cdf['ID'].isin(group_cdf.index) & cdf['date'].isin(group_cdf)])
print(df1)
我得到以下結果:
ID date Name
0 111 2019-01-01 John
1 222 2019-01-01 Smith
2 333 2019-01-01 Sam
0 111 2019-01-02 Katrin
1 222 2019-01-02 Ivan
2 333 2019-01-02 Leo
0 111 2019-01-09 Sam_1
1 222 2019-01-09 Leo_1
2 333 2019-01-09 Marcel
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.