简体   繁体   English

将两个 pandas 数据帧与两个条件结合起来

[英]Combine two pandas dataframes with two conditionals

There are two pandas dataframes I have which I would like to combine with checking of two conditionals.我有两个 pandas 数据帧,我想将它们与两个条件的检查结合起来。

Dataframe1:数据框1:

import pandas as pd 
data = [['Z085', '2020-08', 1.33], ['Z086', '2020-08', 1.83], ['Z086', '2020-09', 1.39]] 
df1 = pd.DataFrame(data, columns = ['SN', 'Date', 'Value']) 

在此处输入图像描述

Dataframe2:数据框2:

data = [['Z085', '2020-08', 0.34], ['Z085', '2020-09', 0.83], ['Z086', '2020-09', 0.29]] 
df2 = pd.DataFrame(data, columns = ['SN', 'Date', 'ValueX']) 
df2 

在此处输入图像描述

I would like to merge or append or join them in order to get the folowing dataframe: The values ("Value" and "ValueX") are being add if both "SN" and "Date" are equal.我想合并或 append 或加入它们以获得以下 dataframe:如果“SN”和“Date”相等,则添加值(“Value”和“ValueX”)。

在此处输入图像描述

I am not sure, if a new dataframe is required or to map the df2 to the df1.我不确定,如果需要新的 dataframe 或 map,df2 到 df1。

This is what i have tried:这是我试过的:

df1['ValueX'] = df1[('Date', 'SN')].map(df2_mean.set_index('Date', 'SN')['ValueX'])

With one conditional (for example: Date) it works ok, but i am not able to set up two conditionals.使用一个条件(例如:日期)它可以正常工作,但我无法设置两个条件。

This is simply a merge() operation.这只是一个merge()操作。 Don't call the columns "conditionals", just say "merge on the columns SN, Date".不要将列称为“条件”,只需说“在 SN、日期列上合并”。

However pandas (v1.1.4) has a bug (its default is to use reversed ie 'ascending') key order when doing the sort) so you can't rely on it;但是 pandas (v1.1.4) 有一个错误(它的默认设置是在进行排序时使用相反的键顺序,即“升序”)所以你不能依赖它; note below it gets sorted by 'Date' then 'SN', ie wrong-way-around:请注意下面它按“日期”然后“SN”排序,即错误的方式:

>>> dfnew_bad = df1.merge(df2, on=['SN','Date'], how='outer')

     SN     Date  Value  ValueX
0  Z085  2020-08   1.33    0.34
1  Z086  2020-08   1.83     NaN
2  Z086  2020-09   1.39    0.29
3  Z085  2020-09    NaN    0.83

So in your case to get the correct order by SN then Date :所以在你的情况下通过 SN 然后 Date 获得正确的订单

dfnew_good = df1.merge(df2, on=['SN','Date'], how='outer', sort=False).sort_values(['SN', 'Date'])
     SN     Date  Value  ValueX
0  Z085  2020-08   1.33    0.34
3  Z085  2020-09    NaN    0.83
1  Z086  2020-08   1.83     NaN
2  Z086  2020-09   1.39    0.29

Note that there's a flag .sort_values(ascending=True) but not pd.merge() You could also workaround by doing pd.merge(..., sort=False) then dfnew_workaround.sort_index(..., inplace=True)请注意,有一个标志.sort_values(ascending=True)但不是pd.merge()您也可以通过执行pd.merge(..., sort=False)然后dfnew_workaround.sort_index(..., inplace=True)来解决

Method 1: merge :方法一: merge

df_new = df1.merge(df2, on=['SN','Date'],how='outer', sort=True)
print(df_new)

Method 2: join :方法二: join

df_new = df1.join(df2.set_index(['SN','Date']), on=['SN','Date'],how='outer', sort=True)
print(df_new)

In this case, one more possible way would be to use pd.concat :在这种情况下,另一种可能的方法是使用pd.concat

df_new = pd.concat([df1.set_index(['SN','Date']),df2.set_index(['SN','Date'])],axis=1).reset_index()

Output in either case : Output 在任何一种情况下

     SN     Date  Value  ValueX
0  Z085  2020-08   1.33    0.34
3  Z085  2020-09    NaN    0.83
1  Z086  2020-08   1.83     NaN
2  Z086  2020-09   1.39    0.29

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM