[英]Python pandas dataframe: filter columns using a list?
I have a dataframe that is large: 100000 rows * 10000 cols 我有一个大的数据框:100000行* 10000列
Now I'm given a list of labels (call this list1) that do not match exactly with the labels of the columns in this dataframe, but match part of these labels. 现在,我得到了一个标签列表(称为list1),这些标签与该数据帧中各列的标签不完全匹配,但与这些标签的一部分匹配。 For example, a label in the dataframe might be "string1,D111" and the labels in list1 might look like "D111".
例如,数据帧中的标签可能是“ string1,D111”,而列表1中的标签可能看起来像“ D111”。
So now basically I want to find out all these corresponding columns using list1, and then sum all these columns, what is the most efficient way to do this? 因此,现在基本上我想使用list1找出所有这些对应的列,然后将所有这些列求和,最有效的方法是什么?
Dataframe:
string1,D111 string2,D222 string3,D333 ...... stringn,Dnnn
1 .. .. .. ..
2
3
4
5
6
...
My list1: D111, D333,...Dxxx
In [28]: df = DataFrame(randn(10,10),columns=[ 'c_%s' % i for i in range(3)] + ['d_%s' % i for i in range(3) ] + ['e_%s' % i for i in range(4)])
In [3]: df.filter(regex='d_|e_')
Out[3]:
d_0 d_1 d_2 e_0 e_1 e_2 e_3
0 -0.022661 -0.504317 0.279227 0.286951 -0.126999 -1.658422 1.577863
1 0.501654 0.145550 -0.864171 -0.374261 -0.399360 1.217679 1.357648
2 -0.608580 1.138143 1.228663 0.427360 0.256808 0.105568 -0.037422
3 -0.993896 -0.581638 -0.937488 0.038593 -2.012554 -0.182407 0.689899
4 0.424005 -0.913518 0.405155 -1.111424 -0.180506 1.211730 0.118168
5 0.701127 0.644692 -0.188302 -0.561400 0.748692 -0.585822 1.578240
6 0.475958 -0.901369 -0.734969 1.090093 1.297208 1.140128 0.173941
7 -0.679514 -0.790529 -2.057733 0.420175 1.766671 -0.797129 -0.825583
8 -0.918645 0.916237 0.992001 -0.440573 -1.875960 -1.223502 0.084821
9 1.096687 -1.414057 -0.268211 0.253461 -0.175931 1.481261 -0.200600
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.