简体   繁体   English

根据列名称填充Pandas DataFrame来表示另一个DataFrame

[英]Populating a Pandas DataFrame frome another DataFrame based on column names

I have a DataFrame of the following form: 我有一个以下形式的DataFrame:

    a b c
0   1 4 6
1   3 2 4
2   4 1 5

And I have a list of column names that I need to use to create a new DataFrame using the columns of the first DataFrame that correspond to each label. 我有一个列名列表,我需要使用它来创建一个新的DataFrame,使用与每个标签对应的第一个DataFrame的列。 For example, if my list of columns is ['a', 'b', 'b', 'a', 'c'], the resulting DataFrame should be: 例如,如果我的列列表是['a','b','b','a','c'],则生成的DataFrame应为:

    a b b a c
0   1 4 4 1 6   
1   3 2 2 3 4
2   4 1 1 4 5

I've been trying to figure out a fast way of performing this operations because I'm dealing with extremly large DataFrames and I don't think looping is a reasonable option. 我一直试图找出一种执行此操作的快速方法,因为我正在处理极其庞大的DataFrame,我不认为循环是一个合理的选择。

You can just use the list to select them: 您可以使用列表来选择它们:

In [44]:

cols = ['a', 'b', 'b', 'a', 'c']
df[cols]
Out[44]:
   a  b  b  a  c
0  1  4  4  1  6
1  3  2  2  3  4
2  4  1  1  4  5

[3 rows x 5 columns]

So no need for a loop, once you have created your dataframe df then using a list of column names will just index them and create the df you want. 所以不需要循环,一旦你创建了数据帧df那么使用列名列表只会索引它们并创建你想要的df。

You can do that directly: 你可以直接这样做:

>>> df
   a  b  c
0  1  4  6
1  3  2  4
2  4  1  5

>>> column_names
['a', 'b', 'b', 'a', 'c']

>>> df[column_names]
   a  b  b  a  c
0  1  4  4  1  6
1  3  2  2  3  4
2  4  1  1  4  5

[3 rows x 5 columns]

From 0.17 onwards you can use reindex like 0.17开始,您可以使用reindex类的

In [795]: cols = ['a', 'b', 'b', 'a', 'c']

In [796]: df.reindex(columns=cols)
Out[796]:
   a  b  b  a  c
0  1  4  4  1  6
1  3  2  2  3  4
2  4  1  1  4  5

Note: Ideally, you don't want to have duplicate column names. 注意:理想情况下,您不希望具有重复的列名称。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM