简体   繁体   English

根据列值计算两个熊猫数据框之间的值

[英]calculate values between two pandas dataframe based on a column value

EDITED: let me copy the whole data set 编辑:让我复制整个数据集

df is the store sales/inventory data df是商店的销售/库存数据

  branch   daqu store     store_name       style  color  size  stocked    sold  in_stock  balance
0  huadong  wenning  C301  EE #��#��##��  EEBW52301M     39   160        7    4         3       -5
1  huadong  wenning  C301  EE #��#��##��  EEBW52301M     39   165        1    0         1        1
2  huadong  wenning  C301  EE #��#��##��  EEBW52301M     39   170        6    3         3       -3

dh is the transaction (move 'amount' from store 'from' to 'to') dh是交易(将“金额”从商店“从”移动到“到”)

    branch      daqu  from    to       style  color  size  amount  box_sum
8   huadong  shanghai  C306  C30C  EEOM52301M     59   160       1      162
18  huadong  shanghai  C306  C30C  EEOM52301M     39   160       1      162
25  huadong  shanghai  C306  C30C  EETJ52301M     52   160       9      162
26  huadong  shanghai  C306  C30C  EETJ52301M     52   155       1      162
32  huadong  shanghai  C306  C30C  EEOW52352M     19   160       2      162

What I want is the store inventory data after the transaction, which would look exactly the same format as the df, but only 'in_stock' numbers would have changed from the original df according to numbers in dh. 我想要的是交易后的商店库存数据,该数据看起来与df格式完全相同,但是根据dh中的数字,只有“ in_stock”数字会与原始df发生变化。

below is what I tried: 以下是我尝试的方法:

df['full_code'] = df['store']+df['style']+df['color'].astype(str)+df['size'].astype(str)    
dh['from_code'] = dh['from']+dh['style']+dh['color'].astype(str)+dh['size'].astype(str)
dh['to_code'] = dh['to']+dh['style']+dh['color'].astype(str)+dh['size'].astype(str)


# subtract from 'from' store
dh_from = pd.DataFrame(dh.groupby('from_code')['amount'].sum())

for code, stock in dh_from.iterrows() :
    df.loc[df['full_code'] == code, 'in_stock'] = df.loc[df['full_code'] == code, 'in_stock'] - stock

# add to 'to' store    
dh_to = pd.DataFrame(dh.groupby('to_code')['amount'].sum())

for code, stock in dh_to.iterrows() :
    df.loc[df['full_code'] == code, 'in_stock'] = df.loc[df['full_code'] == code, 'in_stock'] + stock

df.to_csv('d:/after_dh.csv')

But when I open the csv file then the 'in_stock' values for those which transaction occured are all blanks. 但是,当我打开csv文件时,发生事务的那些'in_stock'值都是空白。 I think df.loc[df['full_code'] == code, 'in_stock'] = df.loc[df['full_code'] == code, 'in_stock'] + stock this has some problem. 我认为df.loc[df['full_code'] == code, 'in_stock'] = df.loc[df['full_code'] == code, 'in_stock'] + stock这有一些问题。 What's the correct way of updating the value? 更新值的正确方法是什么?


ORIGINAL: I have two pandas dataframe: df1 is for the inventory, df2 is for the transaction 原文:我有两个熊猫数据框:df1用于库存,df2用于交易

df1 look something like this: df1看起来像这样:

   full_code in_stock
1  AAA       200
2  BBB       150
3  CCC       150

df2 look something like this: df2看起来像这样:

   from   to   full_code  amount
1  XX     XY   AAA        30
2  XX     XZ   AAA        35
3  ZY     OI   BBB        50
4  AQ     TR   AAA        15

What I want is the inventory after all transactions are done. 我想要的是所有交易完成后的库存。 In this case, 在这种情况下,

   full_code in_stock
1  AAA       120
2  BBB       100
3  CCC       150

Note that full_code is unique in df1, but not unique in df2. 请注意,full_code在df1中是唯一的,但在df2中不是唯一的。 Is there any pandas way of doing this? 有没有熊猫这样做的方法? I got messed up with the original dataframe and a view of the dataframe and got it solved by turning them into numpy array and finding matching full_codes. 我搞砸了原始数据框和数据框的视图,并通过将它们变成numpy数组并找到匹配的full_codes来解决了它。 But the resulting code is also a mess and wonder if there is a simpler way of doing this not turning everything into a numpy array. 但是生成的代码也是一团糟,想知道是否有更简单的方法可以将所有内容都转换为numpy数组。

What I would do is to set the index in df1 to the 'full_code' column and then call sub to subtract the other df. 我要做的是将df1的索引设置为“ full_code”列,然后调用sub减去另一个df。

What we pass for the values is the result of grouping on 'full_code' and calling sum on 'amount' column. 我们传递的值是在“ full_code”上分组并在“ amount”列上调用sum的结果。

An additional param for sub is fill_values this is because product 'CCC' does not exist on the rhs so we want this value to be preserved, otherwise it becomes NaN : sub的另一个参数是fill_values这是因为rhs上不存在乘积'CCC',因此我们希望保留该值,否则它将变为NaN

In [25]:

total = df1.set_index('full_code')['in_stock'].sub(df2.groupby('full_code')['amount'].sum(), fill_value=0)
total.reset_index()
​
Out[25]:
  full_code  in_stock
0       AAA       120
1       BBB       100
2       CCC       150

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据熊猫数据框中的另一列值计算值的总和? - Calculate the sum of values based on another column value in pandas dataframe? pandas dataframe 根据列范围和值计算多行 - pandas dataframe calculate multiple rows based on column ranges and values pandas DataFrame:根据另一列中的 boolean 值计算 Sum - pandas DataFrame: Calculate Sum based on boolean values in another column 计算 Pandas 数据框中列值之间的百分比变化 - Calculate percentage change between values of column in Pandas dataframe 根据其他列中的行值计算数据框中行值之间的差异 - Calculate difference between row values in dataframe based on row value in other column 根据另一列的值计算pandas数据帧索引差异 - Calculate pandas dataframe index difference based on the value of another column 计算同一 pandas 列中两个不同值之间的时间 - Calculate time between two different values in the same pandas column 过滤 pandas DataFrame 中的所有行,其中给定值介于两列值之间 - Filter all rows in a pandas DataFrame where a given value is between two column values 根据列中的值计算 pd.DataFrame() 索引的中值 - calculate a median value of pd.DataFrame() index based on values in the column 根据python中的条件区分两列pandas数据帧 - Take difference between two column of pandas dataframe based on condition in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM