[英]Calculations using two pandas dataframes
I have the following two (simplified) dataframes with me:我有以下两个(简化的)数据框:
df1=
origin destination val1 val2
0 1 A 0.8 0.9
1 1 B 0.3 0.5
2 1 c 0.4 0.2
3 2 A 0.4 0.7
4 2 B 0.2 0.1
5 2 c 0.5 0.1
df2=
org price
0 1 50
1 2 45
what I need to do is to select the price from each origin from df2, multiply it by the sum of val1+val2 in df1 and write it to a csv file.我需要做的是从 df2 中选择每个来源的价格,将其乘以 df1 中 val1+val2 的总和并将其写入 csv 文件。
The calculation for A is as follows: A 的计算如下:
A => (0.8+0.9)* 50 + (0.4+ 0.7)* 45 = 134.5 A => (0.8+0.9)* 50 + (0.4+ 0.7)* 45 = 134.5
here, the values 0.8, 0.9, 0.4 and 0.7 are coming from df1 and they correspond to val1 and val2 of A where as the values 50 and 45 come from df2 corresponding to origin 1 and 2 respectively.此处,值 0.8、0.9、0.4 和 0.7 来自 df1,它们对应于 A 的 val1 和 val2,而值 50 和 45 来自分别对应于原点 1 和 2 的 df2。 for B the calculation would be
对于 B,计算将是
B => (0.3+0.5)*50 + (0.2+0.1)*45 = 53.5 B => (0.3+0.5)*50 + (0.2+0.1)*45 = 53.5
for C the calculation would be:对于 C,计算将是:
C => (0.4+0.2)*50 + (0.5+0.1)*45 = 57 C => (0.4+0.2)*50 + (0.5+0.1)*45 = 57
The final CSV file should look like:最终的 CSV 文件应如下所示:
A,134.5 A,134.5
B,53.5乙,53.5
C,57 I've written the following python code for that: C,57 我为此编写了以下 python 代码:
# first convert the second table into a python dictionary so that I can refer price value at each origin
df2_dictionary = {}
for ind in df2.index:
df2_dictionary[df2['org'][ind]] = float(df2['price'][ind])
# now go through df1, add up val1 and val2 and add the result to the result dictionary.
result = {}
for ind in df1.index:
origin = df1['origin'][ind]
price = df2_dictionary[origin] # figure out the price from the dictionary.
r = (df1['val1'][ind] + df1['val2'][ind])*price # this is the needed calculation
destination = df1['destination'][ind] # store the result in destination
if(destination in result.keys()):
result[destination] = result[destination]+r
else:
result[destination] = r
f = open("result.csv", "w")
for key in result:
f.write(key+","+str(result[key])+"\n")
f.close()
This is lot of work and doesn't use the pandas inbuilt functions.这是很多工作,不使用熊猫内置函数。 How do I simplify this?
我如何简化这个? I'm not that worried about efficiency.
我并不那么担心效率。
Your problem can be solved with map
and then groupby
:你的问题可以用
map
然后groupby
来解决:
df1['total'] = (df1[['val1','val2']].sum(1)
.mul(df1['origin']
.map(df2.set_index('org').price)
)
)
summary = df1.groupby('destination')['total'].sum()
# save to csv
summary.to_csv('/path/to/file.csv')
Output ( summary
):输出(
summary
):
destination
A 134.5
B 53.5
c 57.0
Name: total, dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.