简体   繁体   中英

Python: Select multiple columns in a dataframe from another dataframe without loop

I have a dataframe (df1) that has 3000 columns. Each columns corresponds to a stock ticker. I export in a DataFrame (df2) using pd.read_csv a csv file of 500 stock tickers (1 column and 500 rows, excluding the index). How can I extract into a new datafame from df1 the 500 columns that match the stock tickers in df2?

I can write a loop that iterates over each row in df2 and extract one column at a time from df1 but I find this slow and probably not the most efficient way.

You can use loc directly to select some columns from your DataFrame (to use @waitingkuo's example):

In [11]: df1.loc[:, df2.stock]  # equivalent to df1[df2.stock]
Out[11]: 
   s1  s3
0   1   3
1   4   6
2   7   9
3  10  12

You can use join .

For simplifying your question, say we have three stock, s1, s2, and s3 in df1 . And we only have s1 and s3 in df2

In [35]: df1
Out[35]: 
   s1  s2  s3
0   1   2   3
1   4   5   6
2   7   8   9
3  10  11  12

[4 rows x 3 columns]

In [36]: df2
Out[36]: 
  stock
0    s1
1    s3

[2 rows x 1 columns]

To join df2 and df1 , we need to set the column to join on, and transpose df1 so that we have stock name as index:

In [37]: df2.join(df1.T, on='stock')
Out[37]: 
  stock  0  1  2   3
0    s1  1  4  7  10
1    s3  3  6  9  12

[2 rows x 5 columns]

If you're similar with SQL, just think it as

SELECT * FROM df2 JOIN df1.T ON df2.stock = df1.T.index

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM