简体   繁体   English

如何通过Python中的多个列将2个CSV文件合并在一起

[英]How to merge 2 CSV files together by multiple columns in Python

I have two CSV files. 我有两个CSV文件。 File 1 that looks like: 文件1如下所示:

Ticker  |    Date     |   Marketcap 
  A     |  2002-03-14 |    600000
  A     |  2002-06-18 |    520000
                   .
                   .
  ABB   |  2004-03-16 |    400000
  ABB   |  2005-07-11 |    800000
                   .
                   .
  AD    |  2004-03-16 |    680000
                   .
                   .

File 2 like: 文件2像:

Ticker  |    Date     |     Open    |    Close   |
  A     |  2002-03-14 |    580000   |    500000  |
  ABB   |  2002-03-14 |    500000   |    420000  |
  AD    |  2002-03-16 |    700000   |    670000  |
                          .
                          .
                          .
                          .

The periods indicate that values continue on for a large number of entries for each ticker for both File 1 and File 2 . 句点表示对于文件1文件2的每个报价器,值继续存在大量条目。 The first file has all values for every date and every ticker listed all in one line continuously whereas the second file has all values for every year and ticker listed one-by-one. 第一个文件具有每个日期的所有值,并且所有行情清单连续连续地列出,而第二个文件具有每年的所有值,行情清单被逐一列出。

What I want to do is merge files 1 and 2 based off both "Ticker" and "Date" to look like: 我想要做的是基于“股票代码”和“日期”合并文件1和2看起来像:

Ticker  |    Date     |   Marketcap |    Open     |    Close   |
  A     |  2002-03-14 |    600000   |    580000   |    500000  |
  ABB   |  2002-03-14 |    520000   |    500000   |    420000  |
                                 .
                                 .

I've tried merging files using something like: 我已经尝试过使用以下方式合并文件:

a = pd.read_csv("File1.csv")
b = pd.read_csv("File2.csv")
merged = a.merge(b, on='Date')

But I don't think this accounts for both Date and Ticker at once. 但是我不认为这同时解决了Date和Ticker的问题。

I believe you need to use ['Date', 'Ticker'] instead of just 'Date' . 我相信您需要使用['Date', 'Ticker']而不只是'Date' Also you might need to specify the how argument depending on what you want. 另外,您可能需要根据需要指定how参数。

尝试这个:

 merged=a.merge(b, how='left',on=['Ticker', 'Date'])

You can try the following code: 您可以尝试以下代码:

a = pd.read_csv("File1.csv", "\t")
b = pd.read_csv("File2.csv", "\t")
merged = pd.merge(a, b, how='inner', on=['Ticker', 'Date'])
print merged

If File1.csv is: 如果File1.csv为:

Ticker  Date    Marketcap 
A   2002-03-14  600000
A   2002-06-18  520000
ABB 2004-03-16  400000
ABB 2005-07-11  800000
AD  2004-03-16  680000

And File2.csv is: File2.csv是:

Ticker  Date    Open    Close
A   2002-03-14  580000  500000
ABB 2004-03-16  500000  420000
AD  2004-03-16  700000  670000

Then the output of the above code will be: 那么以上代码的输出将是:

  Ticker        Date  Marketcap     Open   Close
0      A  2002-03-14      600000  580000  500000
1    ABB  2004-03-16      400000  500000  420000
2     AD  2004-03-16      680000  700000  670000


If you want all rows from File1.csv and only matching rows from File2.csv , you can use this instead: 如果你想从所有行File1.csv从只有匹配的行File2.csv ,你可以用这个来代替:

merged = pd.merge(a, b, how='left', on=['Ticker', 'Date'])

This will produce: 这将产生:

  Ticker        Date  Marketcap       Open     Close
0      A  2002-03-14      600000  580000.0  500000.0
1      A  2002-06-18      520000       NaN       NaN
2    ABB  2004-03-16      400000  500000.0  420000.0
3    ABB  2005-07-11      800000       NaN       NaN
4     AD  2004-03-16      680000  700000.0  670000.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM