Populating a Pandas DataFrame frome another DataFrame based on column names

Question

I have a DataFrame of the following form:

And I have a list of column names that I need to use to create a new DataFrame using the columns of the first DataFrame that correspond to each label. For example, if my list of columns is ['a', 'b', 'b', 'a', 'c'], the resulting DataFrame should be:

    a b b a c
0   1 4 4 1 6   
1   3 2 2 3 4
2   4 1 1 4 5

I've been trying to figure out a fast way of performing this operations because I'm dealing with extremly large DataFrames and I don't think looping is a reasonable option.

Answer 1

You can just use the list to select them:

In [44]:

cols = ['a', 'b', 'b', 'a', 'c']
df[cols]
Out[44]:
   a  b  b  a  c
0  1  4  4  1  6
1  3  2  2  3  4
2  4  1  1  4  5

[3 rows x 5 columns]

So no need for a loop, once you have created your dataframe df then using a list of column names will just index them and create the df you want.

Answer 2

You can do that directly:

>>> df
   a  b  c
0  1  4  6
1  3  2  4
2  4  1  5

>>> column_names
['a', 'b', 'b', 'a', 'c']

>>> df[column_names]
   a  b  b  a  c
0  1  4  4  1  6
1  3  2  2  3  4
2  4  1  1  4  5

[3 rows x 5 columns]

Answer 3

From 0.17 onwards you can use reindex like

In [795]: cols = ['a', 'b', 'b', 'a', 'c']

In [796]: df.reindex(columns=cols)
Out[796]:
   a  b  b  a  c
0  1  4  4  1  6
1  3  2  2  3  4
2  4  1  1  4  5

Note: Ideally, you don't want to have duplicate column names.

Populating a Pandas DataFrame frome another DataFrame based on column names

Question

3 answers

solution1
7 ACCPTED 2014-04-07 13:53:28

solution2
3 2014-04-07 13:53:54

solution3
0 2017-10-12 07:54:39

Populating a Pandas DataFrame frome another DataFrame based on column names

Question

3 answers

solution1 7 ACCPTED 2014-04-07 13:53:28

solution2 3 2014-04-07 13:53:54

solution3 0 2017-10-12 07:54:39

solution1
7 ACCPTED 2014-04-07 13:53:28

solution2
3 2014-04-07 13:53:54

solution3
0 2017-10-12 07:54:39