[英]How to merge 2 CSV files together by multiple columns in Python
I have two CSV files. 我有两个CSV文件。 File 1 that looks like:
文件1如下所示:
Ticker | Date | Marketcap
A | 2002-03-14 | 600000
A | 2002-06-18 | 520000
.
.
ABB | 2004-03-16 | 400000
ABB | 2005-07-11 | 800000
.
.
AD | 2004-03-16 | 680000
.
.
File 2 like: 文件2像:
Ticker | Date | Open | Close |
A | 2002-03-14 | 580000 | 500000 |
ABB | 2002-03-14 | 500000 | 420000 |
AD | 2002-03-16 | 700000 | 670000 |
.
.
.
.
The periods indicate that values continue on for a large number of entries for each ticker for both File 1 and File 2 . 句点表示对于文件1和文件2的每个报价器,值继续存在大量条目。 The first file has all values for every date and every ticker listed all in one line continuously whereas the second file has all values for every year and ticker listed one-by-one.
第一个文件具有每个日期的所有值,并且所有行情清单连续连续地列出,而第二个文件具有每年的所有值,行情清单被逐一列出。
What I want to do is merge files 1 and 2 based off both "Ticker" and "Date" to look like: 我想要做的是基于“股票代码”和“日期”合并文件1和2看起来像:
Ticker | Date | Marketcap | Open | Close |
A | 2002-03-14 | 600000 | 580000 | 500000 |
ABB | 2002-03-14 | 520000 | 500000 | 420000 |
.
.
I've tried merging files using something like: 我已经尝试过使用以下方式合并文件:
a = pd.read_csv("File1.csv")
b = pd.read_csv("File2.csv")
merged = a.merge(b, on='Date')
But I don't think this accounts for both Date and Ticker at once. 但是我不认为这同时解决了Date和Ticker的问题。
I believe you need to use ['Date', 'Ticker']
instead of just 'Date'
. 我相信您需要使用
['Date', 'Ticker']
而不只是'Date'
。 Also you might need to specify the how
argument depending on what you want. 另外,您可能需要根据需要指定
how
参数。
尝试这个:
merged=a.merge(b, how='left',on=['Ticker', 'Date'])
You can try the following code: 您可以尝试以下代码:
a = pd.read_csv("File1.csv", "\t")
b = pd.read_csv("File2.csv", "\t")
merged = pd.merge(a, b, how='inner', on=['Ticker', 'Date'])
print merged
If File1.csv
is: 如果
File1.csv
为:
Ticker Date Marketcap
A 2002-03-14 600000
A 2002-06-18 520000
ABB 2004-03-16 400000
ABB 2005-07-11 800000
AD 2004-03-16 680000
And File2.csv
is: File2.csv
是:
Ticker Date Open Close
A 2002-03-14 580000 500000
ABB 2004-03-16 500000 420000
AD 2004-03-16 700000 670000
Then the output of the above code will be: 那么以上代码的输出将是:
Ticker Date Marketcap Open Close
0 A 2002-03-14 600000 580000 500000
1 ABB 2004-03-16 400000 500000 420000
2 AD 2004-03-16 680000 700000 670000
File1.csv
and only matching rows from File2.csv
, you can use this instead:
File1.csv
从只有匹配的行File2.csv
,你可以用这个来代替:
merged = pd.merge(a, b, how='left', on=['Ticker', 'Date'])
This will produce: 这将产生:
Ticker Date Marketcap Open Close
0 A 2002-03-14 600000 580000.0 500000.0
1 A 2002-06-18 520000 NaN NaN
2 ABB 2004-03-16 400000 500000.0 420000.0
3 ABB 2005-07-11 800000 NaN NaN
4 AD 2004-03-16 680000 700000.0 670000.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.