[英]substract values from column in dataframe if another column in dataframe matches some value using pandas
say I have two matrix original and reference 说我有两个矩阵原始和参考
import pandas as pa
print "Original Data Frame"
# Create a dataframe
oldcols = {'col1':['a','a','b','b'], 'col2':['c','d','c','d'], 'col3':[1,2,3,4]}
a = pa.DataFrame(oldcols)
print "Original Table:"
print a
print "Reference Table:"
b = pa.DataFrame({'col1':['x','x'], 'col2':['c','d'], 'col3':[10,20]})
print b
Now I want to subtract from the third column (col3) of the original table (a), the value in the reference table (c) in the row where the second columns of the two tables match. 现在我想从原始表(a)的第三列(col3)中减去两个表的第二列匹配的行中引用表(c)中的值。 So the first row of table two should have the value 10 added to the third column, because the row of table b where the column is col2 is 'c' has a value of 10 in col3. 因此,表2的第一行应该将值10添加到第三列,因为列为col2的表b的行为'c',col3中的值为10。 Make sense? 说得通? Here's some code that does that: 这是一些代码:
col3 = []
for ix, row in a.iterrows():
col3 += [row[2] + b[b['col2'] == row[1]]['col3']]
a['col3'] = col3
print "Output Table:"
print a
and want to make it look like this: 并希望使它看起来像这样:
Output Table:
col1 col2 col3
0 a c 11
1 a d 22
2 b c 13
3 b d 24
the problem is col3 takes Name: and dtype in a array 问题是col3采用Name:和数组中的dtype
>>print col3
[0 11
Name: col3, dtype: int64, 1 22
Name: col3, dtype: int64, 0 13
Name: col3, dtype: int64, 1 24
Name: col3, dtype: int64]
Can you please help? 你能帮忙吗?
This should work: 这应该工作:
a['col3'] + a['col2'].map(b.set_index('col2')['col3'])
Out[94]:
0 11
1 22
2 13
3 24
dtype: int64
Or this: 或这个:
a.merge(b, on='col2', how='left')[['col3_x', 'col3_y']].sum(axis=1)
Out[110]:
0 11
1 22
2 13
3 24
dtype: int64
You can store this in the original, as requested, through: 您可以根据要求通过以下方式将其存储在原始文件中:
a['col3'] = a.merge(b, on='col2', how='left')[['col3_x', 'col3_y']].sum(axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.