简体   繁体   English

将python列表中的数据合并到一个数据框中

[英]Merging data from python list into one dataframe

I have the following files in AAMC_K.txt, AAU.txt, ACU.txt, ACY.txt in a folder called AMEX. 我在名为AMEX的文件夹中的AAMC_K.txt,AAU.txt,ACU.txt,ACY.txt中具有以下文件。 I am trying to merge these text files into one dataframe. 我正在尝试将这些文本文件合并为一个数据框。 I have tried to do so with pd.merge() but I get an error that the merge function needs a right and left parameter and my data is in a python list. 我尝试使用pd.merge()这样做,但是我收到一个错误,即合并功能需要一个左右参数,并且我的数据在python列表中。 How can I merge the data in the data_list into one pandas dataframe. 如何将data_list中的数据合并到一个pandas数据框中。

import pandas as pd
import os

textfile_names = os.listdir("AMEX")
textfile_names.sort()
data_list = []

for i in range(len(textfile_names)):
   data = pd.read_csv("AMEX/"+textfile_names[i], index_col=None, header=0)
   data_list.append(data)

frame = pd.merge(data_list, on='<DTYYYYMMDD>', how='outer')

"AE.txt"
<TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
AE,D,19970102,000000,12.6250,12.6250,11.7500,11.7500,144,0
AE,D,19970103,000000,11.8750,12.1250,11.8750,12.1250,25,0

AAU.txt
<TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
AAU,D,20020513,000000,0.4220,0.4220,0.4220,0.4220,0,0
AAU,D,20020514,000000,0.4177,0.4177,0.4177,0.4177,0,0

ACU.txt
<TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
ACU,D,19970102,000000,5.2500,5.3750,5.1250,5.1250,52,0
ACU,D,19970103,000000,5.1250,5.2500,5.0625,5.2500,12,0

ACY.txt
<TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
ACY,D,19980116,000000,9.7500,9.7500,8.8125,8.8125,289,0
ACY,D,19980120,000000,8.7500,8.7500,8.1250,8.1250,151,0

I want the output to be filtered with the DTYYYYMMDD and put into one dataframe frame. 我希望使用DTYYYYMMDD过滤输出,并将其放入一个数据帧帧。

OUTPUT
<TICKER>,<PER>,<DTYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>,<TICKER>,<PER>,<DTYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
ACU,D,19970102,000000,5.2500,5.3750,5.1250,5.1250,52,0,AE,D,19970102,000000,12.6250,12.6250,11.7500,11.7500,144,0
ACU,D,19970103,000000,5.1250,5.2500,5.0625,5.2500,12,0,AE,D,19970103,000000,11.8750,12.1250,11.8750,12.1250,25,0

As @busybear says, pd.concat is the right tool for this job: frame = pd.concat(data_list) . 正如@busybear所说, pd.concat是完成此工作的正确工具: frame = pd.concat(data_list)

merge is for when you're joining two dataframes which usually have some of the same columns and some different ones. merge用于当您连接两个通常具有某些相同列而又具有一些不同列的数据框时。 You choose a column (or index or multiple) which identifies which rows in the two dataframes correspond to each other, and pandas handles making a dataframe whose rows are combinations of the corresponding rows in the two original dataframes. 您选择一列(或索引或​​多个)来标识两个数据框中的哪些行彼此对应,而pandas处理一个数据框,该行的行是两个原始数据框中相应行的组合。 This function only works on 2 dataframes at a time; 此功能一次仅可处理2个数据帧; you'd have to do a loop to merge more in (it's uncommon to need to merge many dataframes this way). 您必须做一个循环来合并更多内容(以这种方式合并许多数据帧并不常见)。

concat is for when you have multiple dataframes and want to just append all of their rows or columns into one large dataframe. concat适用于具有多个数据框且仅要将其所有行或列附加到一个大型数据框中的情况。 (Let's assume you're concatenating rows, as you want here.) It doesn't use an identifier to determine which rows correspond. (假设您要在此处串联行。)它不使用标识符来确定对应的行。 All it does is create a new dataframe which has each row from each of the concat ed dataframes (all the rows from the first, then all from the second, etc.). 它是所有创建,其具有从每个的各行的数据帧新concat ED dataframes(所有的行从第一,那么所有从第二等)。

I think the above is a decent TLDR on merge vs concat but see here for a lengthy but much more comprehensive guide on using merge / join / concat with dataframes. 我认为以上内容是关于merge vs concat的不错的TLDR,但请参见此处,获取有关在数据帧上使用merge / join / concat的冗长但更全面的指南。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM