简体   繁体   中英

How to merge 2 CSV files together by multiple columns in Python

I have two CSV files. File 1 that looks like:

Ticker  |    Date     |   Marketcap 
  A     |  2002-03-14 |    600000
  A     |  2002-06-18 |    520000
                   .
                   .
  ABB   |  2004-03-16 |    400000
  ABB   |  2005-07-11 |    800000
                   .
                   .
  AD    |  2004-03-16 |    680000
                   .
                   .

File 2 like:

Ticker  |    Date     |     Open    |    Close   |
  A     |  2002-03-14 |    580000   |    500000  |
  ABB   |  2002-03-14 |    500000   |    420000  |
  AD    |  2002-03-16 |    700000   |    670000  |
                          .
                          .
                          .
                          .

The periods indicate that values continue on for a large number of entries for each ticker for both File 1 and File 2 . The first file has all values for every date and every ticker listed all in one line continuously whereas the second file has all values for every year and ticker listed one-by-one.

What I want to do is merge files 1 and 2 based off both "Ticker" and "Date" to look like:

Ticker  |    Date     |   Marketcap |    Open     |    Close   |
  A     |  2002-03-14 |    600000   |    580000   |    500000  |
  ABB   |  2002-03-14 |    520000   |    500000   |    420000  |
                                 .
                                 .

I've tried merging files using something like:

a = pd.read_csv("File1.csv")
b = pd.read_csv("File2.csv")
merged = a.merge(b, on='Date')

But I don't think this accounts for both Date and Ticker at once.

I believe you need to use ['Date', 'Ticker'] instead of just 'Date' . Also you might need to specify the how argument depending on what you want.

尝试这个:

 merged=a.merge(b, how='left',on=['Ticker', 'Date'])

You can try the following code:

a = pd.read_csv("File1.csv", "\t")
b = pd.read_csv("File2.csv", "\t")
merged = pd.merge(a, b, how='inner', on=['Ticker', 'Date'])
print merged

If File1.csv is:

Ticker  Date    Marketcap 
A   2002-03-14  600000
A   2002-06-18  520000
ABB 2004-03-16  400000
ABB 2005-07-11  800000
AD  2004-03-16  680000

And File2.csv is:

Ticker  Date    Open    Close
A   2002-03-14  580000  500000
ABB 2004-03-16  500000  420000
AD  2004-03-16  700000  670000

Then the output of the above code will be:

  Ticker        Date  Marketcap     Open   Close
0      A  2002-03-14      600000  580000  500000
1    ABB  2004-03-16      400000  500000  420000
2     AD  2004-03-16      680000  700000  670000


If you want all rows from File1.csv and only matching rows from File2.csv , you can use this instead:

merged = pd.merge(a, b, how='left', on=['Ticker', 'Date'])

This will produce:

  Ticker        Date  Marketcap       Open     Close
0      A  2002-03-14      600000  580000.0  500000.0
1      A  2002-06-18      520000       NaN       NaN
2    ABB  2004-03-16      400000  500000.0  420000.0
3    ABB  2005-07-11      800000       NaN       NaN
4     AD  2004-03-16      680000  700000.0  670000.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM