[英]Python, Pandas - Calculations with referances to other rows
Rather than build a large CVS file completely by hand, I'm trying to build it with some panda magic. 与其完全手动构建一个大型CVS文件,不如尝试使用一些熊猫魔术来构建它。 My current problem is that I'm trying to calculate the cost and creating 'x', and it need 'n' number of 'y' which costs 'z'
我当前的问题是我正在尝试计算成本并创建“ x”,并且它需要“ y”的“ n”个数,而成本为“ z”
The current dataframe is structured as Item(x), Price(z), Material(y),MaterialSum(n) 当前数据帧的结构为Item(x),Price(z),Material(y),MaterialSum(n)
The Material is within the data frame as it's own item 物料作为自己的物料位于数据框中
df['Cost'] = (df[df.Item == df['Material']].iloc[0])['Price'] * df['MaterialSum']
I've devised this code to build the cost column, however, it only uses the first row's material throughout the data frame, rather than each rows individual material column. 我已经设计了此代码来构建成本列,但是,它仅在整个数据框中使用第一行的物料,而不是每行使用单独的物料列。 Any tips on how to overcome it?
有什么技巧可以克服吗?
If I understand your problem correctly, this is one solution: 如果我正确理解您的问题,这是一种解决方案:
import pandas as pd, numpy as np
df = pd.DataFrame([[1, np.nan, 'A', 10],
[2, np.nan, 'B', 20],
[3, np.nan, 'C', 30],
[np.nan, 15, 'A', np.nan],
[np.nan, 10, 'B', np.nan],
[np.nan, 20, 'C', np.nan]],
columns = ['Item', 'Price', 'Material', 'MaterialSum'])
# Item Price Material MaterialSum
# 0 1.0 NaN A 10.0
# 1 2.0 NaN B 20.0
# 2 3.0 NaN C 30.0
# 3 NaN 15.0 A NaN
# 4 NaN 10.0 B NaN
# 5 NaN 20.0 C NaN
prices = df[df['Item'].isnull()].set_index('Material')['Price']
df['Cost'] = df['Material'].map(prices) * df['MaterialSum']
# Item Price Material MaterialSum Cost
# 0 1.0 NaN A 10.0 150.0
# 1 2.0 NaN B 20.0 200.0
# 2 3.0 NaN C 30.0 600.0
# 3 NaN 15.0 A NaN NaN
# 4 NaN 10.0 B NaN NaN
# 5 NaN 20.0 C NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.