I have two dataframes related to stocks and their prices that I'm trying to cross-match data from each dataframe.
df1
= database of users who have each chosen a number of stocks:
Username Stock 1 Stock 2
0 JB3004 TSLA MSFT
1 JM3009 SHOP SPOT
2 DB0208 TWTR MSFT
3 AB3011 TWTR PTON
4 CB3004 MSFT TSLA
df2
= Today's close price for each of the stocks:
TWTR SPOT PTON SHOP MSFT TSLA
Date Adj Close Adj Close Adj Close Adj Close Adj Close Adj Close
2020-12-11 51.44 341.22 117.1 1057.87 213.26 609.99
I'm trying to match the relevant stocks for each user in df1
to the Adj Close price in df2
so that I can print a df3
with the correct closing price for the stocks each user has chosen.
How would I do this? Everything I've tried doesn't come close, so need some help!
I have faced similar problems. Then I got a solution which I am sharing with you. Hope this will help you get your answer. To see my solution, click on github
Create df1
data1 = {"Username" : ["JB3004", "JM3009", "DB0208", "AB3011", "CB3004"],
"Stock_1" : ["TSLA", "SHOP", "TWTR", "TWTR", "MSFT"],
"Stock_2" : ["MSFT", "SPOT", "MSFT", "PTON", "TSLA"]}
df1 = pd.DataFrame(data=data1)
df1.head()
Username Stock_1 Stock_2
0 JB3004 TSLA MSFT
1 JM3009 SHOP SPOT
2 DB0208 TWTR MSFT
3 AB3011 TWTR PTON
4 CB3004 MSFT TSLA
Convert wide format to long format data
df1_1 = pd.wide_to_long(df1, stubnames='Stock_', i='Username', j='Stock_num')
df1_1.reset_index(inplace=True)
df1_1
Username Stock_num Stock_
0 JB3004 1 TSLA
1 JM3009 1 SHOP
2 DB0208 1 TWTR
3 AB3011 1 TWTR
4 CB3004 1 MSFT
5 JB3004 2 MSFT
6 JM3009 2 SPOT
7 DB0208 2 MSFT
8 AB3011 2 PTON
9 CB3004 2 TSLA
rename the column name Stock_ to Stocks
df1_1.rename(columns={"Stock_": "Stocks"}, inplace=True)
df1_1
Create df2 to match your df2
closing_price.csv file contains the closing price data
# closing_price.csv
,TWTR,SPOT,PTON,SHOP,MSFT,TSLA
Date,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close
2020-12-11,51.44,341.22,117.1,1057.87,213.26,609.99
Load df2
df2 = pd.read_csv("closing_price.csv", index_col=None)
df2.head()
Unnamed: 0 TWTR SPOT PTON SHOP MSFT TSLA
0 Date Adj Close Adj Close Adj Close Adj Close Adj Close Adj Close
1 2020-12-11 51.44 341.22 117.1 1057.87 213.26 609.99
Data cleaning and transformation
df2.set_index("Unnamed: 0", inplace = True)
df2.index.name = "Date"
df2.reset_index(inplace=True)
df2.drop([0], inplace=True)
df2.head()
Date TWTR SPOT PTON SHOP MSFT TSLA
1 2020-12-11 51.44 341.22 117.1 1057.87 213.26 609.99
Convert wide format to long format data
# Convert wide format to long format data
df2_1 = pd.melt(df2, id_vars=['Date'], value_vars=["TWTR", "SPOT", "PTON", "SHOP", "MSFT", "TSLA"], var_name="Stocks", value_name="Adj Close")
df2_1
Date Stocks Adj Close
0 2020-12-11 TWTR 51.44
1 2020-12-11 SPOT 341.22
2 2020-12-11 PTON 117.1
3 2020-12-11 SHOP 1057.87
4 2020-12-11 MSFT 213.26
5 2020-12-11 TSLA 609.99
Now, df1_1 and df2_1 are as below:
df1_1
Username Stock_num Stocks
0 JB3004 1 TSLA
1 JM3009 1 SHOP
2 DB0208 1 TWTR
3 AB3011 1 TWTR
4 CB3004 1 MSFT
5 JB3004 2 MSFT
6 JM3009 2 SPOT
7 DB0208 2 MSFT
8 AB3011 2 PTON
9 CB3004 2 TSLA
df2_1
Date Stocks Adj Close
0 2020-12-11 TWTR 51.44
1 2020-12-11 SPOT 341.22
2 2020-12-11 PTON 117.1
3 2020-12-11 SHOP 1057.87
4 2020-12-11 MSFT 213.26
5 2020-12-11 TSLA 609.99
Merge df1_1 and df2_1 on column "Stocks"
# Merge df1_1 and df2_1 on column "Stocks"
df3 = pd.merge(df1_1, df2_1, on='Stocks')
df3
Username Stock_num Stocks Date Adj Close
0 JB3004 1 TSLA 2020-12-11 609.99
1 CB3004 2 TSLA 2020-12-11 609.99
2 JM3009 1 SHOP 2020-12-11 1057.87
3 DB0208 1 TWTR 2020-12-11 51.44
4 AB3011 1 TWTR 2020-12-11 51.44
5 CB3004 1 MSFT 2020-12-11 213.26
6 JB3004 2 MSFT 2020-12-11 213.26
7 DB0208 2 MSFT 2020-12-11 213.26
8 JM3009 2 SPOT 2020-12-11 341.22
9 AB3011 2 PTON 2020-12-11 117.1
Rearrange columns
# Rearrange columns
df3.set_index(["Date"], inplace=True)
df3.reset_index(inplace=True)
df3
Date Username Stock_num Stocks Adj Close
0 2020-12-11 JB3004 1 TSLA 609.99
1 2020-12-11 CB3004 2 TSLA 609.99
2 2020-12-11 JM3009 1 SHOP 1057.87
3 2020-12-11 DB0208 1 TWTR 51.44
4 2020-12-11 AB3011 1 TWTR 51.44
5 2020-12-11 CB3004 1 MSFT 213.26
6 2020-12-11 JB3004 2 MSFT 213.26
7 2020-12-11 DB0208 2 MSFT 213.26
8 2020-12-11 JM3009 2 SPOT 341.22
9 2020-12-11 AB3011 2 PTON 117.1
# Reshaping or pivoting data based on column values
df = df3.pivot(index="Username", columns="Stock_num", values=["Stocks", "Adj Close"])
df
Stocks Adj Close
Stock_num 1 2 1 2
Username
AB3011 TWTR PTON 51.44 117.1
CB3004 MSFT TSLA 213.26 609.99
DB0208 TWTR MSFT 51.44 213.26
JB3004 TSLA MSFT 609.99 213.26
JM3009 SHOP SPOT 1057.87 341.22
Just saw this and I thought I'd give it a whirl.
Use pandas.DataFrame.stack() on df2 to align everything with df1. Rename some fields, if you want.
df2t = df2.stack().reset_index().rename(
columns={
"level_0":"date",
"level_1":"stock",
0:"closing_price",
},
)
df2t = df2t.loc[df2t["date"] != "Date", :]
Data -
date stock closing_price
6 2020-12-11 TWTR 51.44
7 2020-12-11 SPOT 341.22
8 2020-12-11 PTON 117.1
9 2020-12-11 SHOP 1057.87
10 2020-12-11 MSFT 213.26
11 2020-12-11 TSLA 609.99
pandas.melt() on df1
df1m = pd.melt(df1, id_vars=["username"], value_vars=["Stock 1", "Stock 2"])
Data -
username variable value
0 JB3004 Stock 1 TSLA
1 JM3009 Stock 1 SHOP
2 DB0208 Stock 1 TWTR
3 AB3011 Stock 1 TWTR
4 CB3004 Stock 1 MSFT
5 JB3004 Stock 2 MSFT
6 JM3009 Stock 2 SPOT
7 DB0208 Stock 2 MSFT
8 AB3011 Stock 2 PTON
9 CB3004 Stock 2 TSLA
Merge the dataframes.
df = pd.merge(df1m, df2t, left_on="value", right_on="stock", sort=False)
Data -
username variable value date stock closing_price
0 JB3004 Stock 1 TSLA 2020-12-11 TSLA 609.99
1 CB3004 Stock 2 TSLA 2020-12-11 TSLA 609.99
2 JM3009 Stock 1 SHOP 2020-12-11 SHOP 1057.87
3 DB0208 Stock 1 TWTR 2020-12-11 TWTR 51.44
4 AB3011 Stock 1 TWTR 2020-12-11 TWTR 51.44
5 CB3004 Stock 1 MSFT 2020-12-11 MSFT 213.26
6 JB3004 Stock 2 MSFT 2020-12-11 MSFT 213.26
7 DB0208 Stock 2 MSFT 2020-12-11 MSFT 213.26
8 JM3009 Stock 2 SPOT 2020-12-11 SPOT 341.22
9 AB3011 Stock 2 PTON 2020-12-11 PTON 117.1
Do some cleanup and then pivot for usable results
df = df.drop("value", axis=1).rename(columns={"variable": "holding_id"})
df = df.pivot(index="username", columns="holding_id", values=["stock", "closing_price"]).rename(columns=lambda x: x.strip())
Data -
stock closing_price
holding_id Stock 1 Stock 2 Stock 1 Stock 2
username
AB3011 TWTR PTON 51.44 117.1
CB3004 MSFT TSLA 213.26 609.99
DB0208 TWTR MSFT 51.44 213.26
JB3004 TSLA MSFT 609.99 213.26
JM3009 SHOP SPOT 1057.87 341.22
Selecting data is pretty simple with multiindexing
df.loc[:,"stock"]["Stock 1"]
Data -
username
AB3011 TWTR
CB3004 MSFT
DB0208 TWTR
JB3004 TSLA
JM3009 SHOP
Name: Stock 1, dtype: object
Or, include username for targeted selections:
df.loc["AB3011","stock"]["Stock 1"]
Data -
'TWTR'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.