I have a dataframe (df1) that has 3000 columns. Each columns corresponds to a stock ticker. I export in a DataFrame (df2) using pd.read_csv
a csv file of 500 stock tickers (1 column and 500 rows, excluding the index). How can I extract into a new datafame from df1 the 500 columns that match the stock tickers in df2?
I can write a loop that iterates over each row in df2 and extract one column at a time from df1 but I find this slow and probably not the most efficient way.
You can use loc directly to select some columns from your DataFrame (to use @waitingkuo's example):
In [11]: df1.loc[:, df2.stock] # equivalent to df1[df2.stock]
Out[11]:
s1 s3
0 1 3
1 4 6
2 7 9
3 10 12
You can use join .
For simplifying your question, say we have three stock, s1, s2, and s3 in df1
. And we only have s1 and s3 in df2
In [35]: df1
Out[35]:
s1 s2 s3
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
[4 rows x 3 columns]
In [36]: df2
Out[36]:
stock
0 s1
1 s3
[2 rows x 1 columns]
To join df2
and df1
, we need to set the column to join on, and transpose df1
so that we have stock name as index:
In [37]: df2.join(df1.T, on='stock')
Out[37]:
stock 0 1 2 3
0 s1 1 4 7 10
1 s3 3 6 9 12
[2 rows x 5 columns]
If you're similar with SQL, just think it as
SELECT * FROM df2 JOIN df1.T ON df2.stock = df1.T.index
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.