[英]Combine two pandas dataframes with two conditionals
There are two pandas dataframes I have which I would like to combine with checking of two conditionals.我有两个 pandas 数据帧,我想将它们与两个条件的检查结合起来。
Dataframe1:数据框1:
import pandas as pd
data = [['Z085', '2020-08', 1.33], ['Z086', '2020-08', 1.83], ['Z086', '2020-09', 1.39]]
df1 = pd.DataFrame(data, columns = ['SN', 'Date', 'Value'])
Dataframe2:数据框2:
data = [['Z085', '2020-08', 0.34], ['Z085', '2020-09', 0.83], ['Z086', '2020-09', 0.29]]
df2 = pd.DataFrame(data, columns = ['SN', 'Date', 'ValueX'])
df2
I would like to merge or append or join them in order to get the folowing dataframe: The values ("Value" and "ValueX") are being add if both "SN" and "Date" are equal.我想合并或 append 或加入它们以获得以下 dataframe:如果“SN”和“Date”相等,则添加值(“Value”和“ValueX”)。
I am not sure, if a new dataframe is required or to map the df2 to the df1.我不确定,如果需要新的 dataframe 或 map,df2 到 df1。
This is what i have tried:这是我试过的:
df1['ValueX'] = df1[('Date', 'SN')].map(df2_mean.set_index('Date', 'SN')['ValueX'])
With one conditional (for example: Date) it works ok, but i am not able to set up two conditionals.使用一个条件(例如:日期)它可以正常工作,但我无法设置两个条件。
This is simply a merge()
operation.这只是一个
merge()
操作。 Don't call the columns "conditionals", just say "merge on the columns SN, Date".不要将列称为“条件”,只需说“在 SN、日期列上合并”。
However pandas (v1.1.4) has a bug (its default is to use reversed ie 'ascending') key order when doing the sort) so you can't rely on it;但是 pandas (v1.1.4) 有一个错误(它的默认设置是在进行排序时使用相反的键顺序,即“升序”)所以你不能依赖它; note below it gets sorted by 'Date' then 'SN', ie wrong-way-around:
请注意下面它按“日期”然后“SN”排序,即错误的方式:
>>> dfnew_bad = df1.merge(df2, on=['SN','Date'], how='outer')
SN Date Value ValueX
0 Z085 2020-08 1.33 0.34
1 Z086 2020-08 1.83 NaN
2 Z086 2020-09 1.39 0.29
3 Z085 2020-09 NaN 0.83
So in your case to get the correct order by SN then Date :所以在你的情况下通过 SN 然后 Date 获得正确的订单:
dfnew_good = df1.merge(df2, on=['SN','Date'], how='outer', sort=False).sort_values(['SN', 'Date'])
SN Date Value ValueX
0 Z085 2020-08 1.33 0.34
3 Z085 2020-09 NaN 0.83
1 Z086 2020-08 1.83 NaN
2 Z086 2020-09 1.39 0.29
Note that there's a flag .sort_values(ascending=True)
but not pd.merge()
You could also workaround by doing pd.merge(..., sort=False)
then dfnew_workaround.sort_index(..., inplace=True)
请注意,有一个标志
.sort_values(ascending=True)
但不是pd.merge()
您也可以通过执行pd.merge(..., sort=False)
然后dfnew_workaround.sort_index(..., inplace=True)
来解决
df_new = df1.merge(df2, on=['SN','Date'],how='outer', sort=True)
print(df_new)
df_new = df1.join(df2.set_index(['SN','Date']), on=['SN','Date'],how='outer', sort=True)
print(df_new)
In this case, one more possible way would be to use pd.concat
:在这种情况下,另一种可能的方法是使用
pd.concat
:
df_new = pd.concat([df1.set_index(['SN','Date']),df2.set_index(['SN','Date'])],axis=1).reset_index()
Output in either case : Output 在任何一种情况下:
SN Date Value ValueX
0 Z085 2020-08 1.33 0.34
3 Z085 2020-09 NaN 0.83
1 Z086 2020-08 1.83 NaN
2 Z086 2020-09 1.39 0.29
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.