简体   繁体   中英

Merging data from python list into one dataframe

I have the following files in AAMC_K.txt, AAU.txt, ACU.txt, ACY.txt in a folder called AMEX. I am trying to merge these text files into one dataframe. I have tried to do so with pd.merge() but I get an error that the merge function needs a right and left parameter and my data is in a python list. How can I merge the data in the data_list into one pandas dataframe.

import pandas as pd
import os

textfile_names = os.listdir("AMEX")
textfile_names.sort()
data_list = []

for i in range(len(textfile_names)):
   data = pd.read_csv("AMEX/"+textfile_names[i], index_col=None, header=0)
   data_list.append(data)

frame = pd.merge(data_list, on='<DTYYYYMMDD>', how='outer')

"AE.txt"
<TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
AE,D,19970102,000000,12.6250,12.6250,11.7500,11.7500,144,0
AE,D,19970103,000000,11.8750,12.1250,11.8750,12.1250,25,0

AAU.txt
<TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
AAU,D,20020513,000000,0.4220,0.4220,0.4220,0.4220,0,0
AAU,D,20020514,000000,0.4177,0.4177,0.4177,0.4177,0,0

ACU.txt
<TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
ACU,D,19970102,000000,5.2500,5.3750,5.1250,5.1250,52,0
ACU,D,19970103,000000,5.1250,5.2500,5.0625,5.2500,12,0

ACY.txt
<TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
ACY,D,19980116,000000,9.7500,9.7500,8.8125,8.8125,289,0
ACY,D,19980120,000000,8.7500,8.7500,8.1250,8.1250,151,0

I want the output to be filtered with the DTYYYYMMDD and put into one dataframe frame.

OUTPUT
<TICKER>,<PER>,<DTYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>,<TICKER>,<PER>,<DTYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
ACU,D,19970102,000000,5.2500,5.3750,5.1250,5.1250,52,0,AE,D,19970102,000000,12.6250,12.6250,11.7500,11.7500,144,0
ACU,D,19970103,000000,5.1250,5.2500,5.0625,5.2500,12,0,AE,D,19970103,000000,11.8750,12.1250,11.8750,12.1250,25,0

As @busybear says, pd.concat is the right tool for this job: frame = pd.concat(data_list) .

merge is for when you're joining two dataframes which usually have some of the same columns and some different ones. You choose a column (or index or multiple) which identifies which rows in the two dataframes correspond to each other, and pandas handles making a dataframe whose rows are combinations of the corresponding rows in the two original dataframes. This function only works on 2 dataframes at a time; you'd have to do a loop to merge more in (it's uncommon to need to merge many dataframes this way).

concat is for when you have multiple dataframes and want to just append all of their rows or columns into one large dataframe. (Let's assume you're concatenating rows, as you want here.) It doesn't use an identifier to determine which rows correspond. All it does is create a new dataframe which has each row from each of the concat ed dataframes (all the rows from the first, then all from the second, etc.).

I think the above is a decent TLDR on merge vs concat but see here for a lengthy but much more comprehensive guide on using merge / join / concat with dataframes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM