[英]Subtract a batch of columns in pandas
I am transitioning to using pandas for handling my csv datasets. 我正在过渡到使用熊猫来处理我的csv数据集。 I am currently trying to do in pandas what I was already doing very easily in numpy: subtract a group of columns from another group several times.
我目前正在尝试在熊猫中做我已经很容易在numpy中做的事情:从另一组中减去一组列几次。 This is effectively a element-wise matrix subtraction.
这实际上是逐元素矩阵减法。
Just for reference, this used to be my numpy solution for this 仅供参考,这曾经是我为此的numpy解决方案
def subtract_baseline(data, baseline_columns, features_columns):
"""Takes in a list of baseline columns and feature columns, and subtracts the baseline values from all features"""
assert len(features_columns)%len(baseline_columns)==0, "The number of feature columns is not divisible by baseline columns"
num_blocks = len(features_columns)/len(baseline_columns)
block_size = len(baseline_columns)
for i in range(num_blocks):
#Grab each feature block and subract the baseline
init_col = block_size*i+features_columns[0]
final_col = init_col+block_size
data[:, init_col:final_col] = numpy.subtract(data[:, init_col:final_col], data[:,baseline_columns])
return data
To ilustrate better, we can create the following toy dataset: 为了更好地说明,我们可以创建以下玩具数据集:
data = [[10,11,12,13,1,10],[20,21,22,23,1,10],[30,31,32,33,1,10],[40,41,42,43,1,10],[50,51,52,53,1,10],[60,61,62,63,1,10]]
df = pd.DataFrame(data,columns=['L1P1','L1P2','L2P1','L2P2','BP1','BP2'],dtype=float)
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 10.0 11.0 12.0 13.0 1.0 10.0
1 20.0 21.0 22.0 23.0 1.0 10.0
2 30.0 31.0 32.0 33.0 1.0 10.0
3 40.0 41.0 42.0 43.0 1.0 10.0
4 50.0 51.0 52.0 53.0 1.0 10.0
5 60.0 61.0 62.0 63.0 1.0 10.0
The correct output would be the result of grabbing the values in L1P1 & L1P2 and subtracting G1P1 & G1P2 (AKA the baseline), then doing it again for L2P1, L2P2 and any other columns there might be (this is what my for loop does in the original function). 正确的输出将是以下结果:获取L1P1和L1P2中的值并减去G1P1和G1P2(又称为基准),然后再次对L2P1,L2P2和可能存在的任何其他列进行此操作(这是我的for循环所做的原始功能)。
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 9.0 1.0 11.0 3.0 1.0 10.0
1 19.0 11.0 21.0 13.0 1.0 10.0
2 29.0 21.0 31.0 23.0 1.0 10.0
3 39.0 31.0 41.0 33.0 1.0 10.0
4 49.0 41.0 51.0 43.0 1.0 10.0
5 59.0 51.0 61.0 53.0 1.0 10.0
Note that labels for the dataframe should not change, and ideally I'd want a method that relies on the columns indexes, not labels, because the actual data block is 30 columns, not 2 like in this example. 请注意,数据框的标签不应更改,理想情况下,我希望使用一种依赖于列索引而不是标签的方法,因为实际数据块为30列,而不是本例中的2列。 This is how my original function in numpy worked, the parameters baseline_columns and features_columns were just lists of the columns indexes.
这就是我在numpy中使用原始函数的方式,参数baseline_columns和features_columns只是列索引的列表。
After this the baseline columns would be deleted all together from the dataframe, as their function has already been fulfilled. 此后,基线列将从数据框中一起删除,因为它们的功能已经完成。
I tried doing this for just 1 batch using iloc but I get Nan values 我尝试使用iloc仅进行了1批处理,但是得到了Nan值
df.iloc[:,[0,1]] = df.iloc[:,[0,1]] - df.iloc[:,[4,5]]
L1P1 L1P2 L2P1 L2P2 G1P1 G1P2
0 NaN NaN 12.0 13.0 1.0 10.0
1 NaN NaN 22.0 23.0 1.0 10.0
2 NaN NaN 32.0 33.0 1.0 10.0
3 NaN NaN 42.0 43.0 1.0 10.0
4 NaN NaN 52.0 53.0 1.0 10.0
5 NaN NaN 62.0 63.0 1.0 10.0
Is there a reason you want to do it in one line? 您是否有理由要一行完成? Ie would it be okay for your purposes to do it with two lines:
即以您的目的可以用两行代码来做到这一点:
df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]
df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]
These two lines achieve what I think is your intent. 这两行符合我的意图。
Adding .values
at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN 在末尾添加
.values
,pandas数据.values
将搜索列和索引匹配以进行减法,因为该列与0,1和4,5不匹配,它将返回NaN
df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values
df
Out[176]:
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 9.0 1.0 12.0 13.0 1.0 10.0
1 19.0 11.0 22.0 23.0 1.0 10.0
2 29.0 21.0 32.0 33.0 1.0 10.0
3 39.0 31.0 42.0 43.0 1.0 10.0
4 49.0 41.0 52.0 53.0 1.0 10.0
5 59.0 51.0 62.0 63.0 1.0 10.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.