[英]Use values from the df2 where row value matches df1 column name
I have two dataframes like these:我有两个这样的数据框:
df1: df1:
id | 2019-03-01 | 2019-04-01 | 2019-05-01 | sum
id1 | 42 | 69 | 96 | 868
id2 | 15 | 21 | 76 | 321
id3 | 34 | 45 | 35 | 675
df2: df2:
id | month| avail
id1 | 3 | 10
id2 | 4 | 54
id2 | 5 | 34
id3 | 5 | 33
I need to add value from avail column at every column where df2.month == df.columns.values[n].month and if theres no corresponding record =>add 0我需要在df2.month == df.columns.values[n].month的每一列中添加来自avail列的值,如果没有相应的记录 =>add 0
This was my attempt with np.where but I did not succeed:这是我对 np.where 的尝试,但我没有成功:
df1.columns.values[:-1] = pd.to_datetime(df1.columns.values[:-1])
for c in np.arange(start = 0, stop = len(df1.columns[:-1]), step = 1):
df1['h'+str(c+1)] = df1.iloc[: , -1].add(np.where((df2.id.isin(df1.index))&
(df1.columns.values[c].month == df2.month),
df2.avail, 0)).sub(df1.iloc[:, c])
df1 = df1.filter(like = 'h').reset_index()
The expected output is:预期的 output 为:
id | h1 | h2 | h3
id1 | 836| 767 | 671
id2 | 306| 339 | 297
id3 | 641| 596 | 594
You can do it like with set_index
and unstack
on df2, set_index
and drop
on df1, then cumsum
the difference between both result over the column, then add the column sum once reshaped with [:, None]
, plus some rename
and rename_axis
.您可以像在 df2 上使用
set_index
和unstack
一样,在 df1 上使用set_index
和drop
,然后对列上的两个结果之间的差异cumsum
,然后在使用[:, None]
重新整形后添加列总和,再加上一些rename
和rename_axis
。
df_f = (df1['sum'].values[:, None]
+ (df2.set_index(['id','month'])['avail'].unstack().fillna(0)
- df1.set_index('id').drop('sum', axis=1)
.rename(columns=lambda x: pd.to_datetime(x).month)).cumsum(axis=1))\
.rename_axis(columns=None)\
.reset_index()
print (df_f)
id 3 4 5
0 id1 836.0 767.0 671.0
1 id2 306.0 339.0 297.0
2 id3 641.0 596.0 594.0
you may want to rename the column to fit your exact output您可能需要重命名该列以适合您的确切 output
Here is another approach:这是另一种方法:
# sample data
s1 = """id|2019-03-01|2019-04-01|2019-05-01|sum
id1|42|69|96|868
id2|15|21|76|321
id3|34|45|35|675"""
df1 = pd.read_csv(StringIO(s1), sep='|')
s2 = """id|month|avail
id1|3|10
id2|4|54
id2|5|34
id3|5|33"""
df2 = pd.read_csv(StringIO(s2), sep='|')
# end sample data
# convert columns to datetime and get the month
new_col = [pd.to_datetime(x).month for x in df1.columns[1:-1]]
df1 = df1.rename(columns=dict(zip(df1.columns[1:-1], new_col)))
# set index
df1 = df1.set_index('id')
# drop the last column
df3 = df1[df1.columns[:-1]]
# pivot df2 so months are the columns
p = df2.pivot('id', 'month', 'avail').fillna(0)
# concat and sum
con = pd.concat([-df3,p]).groupby(level=0).sum()
# add df1['sum'] to the first column
con[con.columns[0]] = con[con.columns[0]] + df1['sum']
# cumsum accross columns
print(con.cumsum(axis=1))
3 4 5
id
id1 836.0 767.0 671.0
id2 306.0 339.0 297.0
id3 641.0 596.0 594.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.