I have two CSV files. File 1 that looks like:
Ticker | Date | Marketcap
A | 2002-03-14 | 600000
A | 2002-06-18 | 520000
.
.
ABB | 2004-03-16 | 400000
ABB | 2005-07-11 | 800000
.
.
AD | 2004-03-16 | 680000
.
.
File 2 like:
Ticker | Date | Open | Close |
A | 2002-03-14 | 580000 | 500000 |
ABB | 2002-03-14 | 500000 | 420000 |
AD | 2002-03-16 | 700000 | 670000 |
.
.
.
.
The periods indicate that values continue on for a large number of entries for each ticker for both File 1 and File 2 . The first file has all values for every date and every ticker listed all in one line continuously whereas the second file has all values for every year and ticker listed one-by-one.
What I want to do is merge files 1 and 2 based off both "Ticker" and "Date" to look like:
Ticker | Date | Marketcap | Open | Close |
A | 2002-03-14 | 600000 | 580000 | 500000 |
ABB | 2002-03-14 | 520000 | 500000 | 420000 |
.
.
I've tried merging files using something like:
a = pd.read_csv("File1.csv")
b = pd.read_csv("File2.csv")
merged = a.merge(b, on='Date')
But I don't think this accounts for both Date and Ticker at once.
I believe you need to use ['Date', 'Ticker']
instead of just 'Date'
. Also you might need to specify the how
argument depending on what you want.
尝试这个:
merged=a.merge(b, how='left',on=['Ticker', 'Date'])
You can try the following code:
a = pd.read_csv("File1.csv", "\t")
b = pd.read_csv("File2.csv", "\t")
merged = pd.merge(a, b, how='inner', on=['Ticker', 'Date'])
print merged
If File1.csv
is:
Ticker Date Marketcap
A 2002-03-14 600000
A 2002-06-18 520000
ABB 2004-03-16 400000
ABB 2005-07-11 800000
AD 2004-03-16 680000
And File2.csv
is:
Ticker Date Open Close
A 2002-03-14 580000 500000
ABB 2004-03-16 500000 420000
AD 2004-03-16 700000 670000
Then the output of the above code will be:
Ticker Date Marketcap Open Close
0 A 2002-03-14 600000 580000 500000
1 ABB 2004-03-16 400000 500000 420000
2 AD 2004-03-16 680000 700000 670000
File1.csv
and only matching rows from File2.csv
, you can use this instead:
merged = pd.merge(a, b, how='left', on=['Ticker', 'Date'])
This will produce:
Ticker Date Marketcap Open Close
0 A 2002-03-14 600000 580000.0 500000.0
1 A 2002-06-18 520000 NaN NaN
2 ABB 2004-03-16 400000 500000.0 420000.0
3 ABB 2005-07-11 800000 NaN NaN
4 AD 2004-03-16 680000 700000.0 670000.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.