简体   繁体   English

如何读取多个 csv 文件并将它们转换为一个 3d dataframe

[英]How read multiple csv files and convert them to a 3d dataframe

I have multiple.csv files.我有多个 .csv 文件。 they have same column size but different number of rows.它们具有相同的列大小但行数不同。 I want to make a dataframe which the 3rd dimension shows each file.我想制作一个 dataframe,其中第 3 个维度显示每个文件。 I tried read each file and save it to a dataframe, then append them to a list, but when convert list to dataframe the output is a two dimension dataframe (if we have 5 files then out puth is (5, 1) dataframe).我尝试读取每个文件并将其保存到 dataframe,然后 append 将它们保存到列表,但是当将列表转换为 dataframe 时,output 是二维的 dataframe(如果我们有 5 个文件,则输出是 (5, 1) 数据框)。

path = "Something"
filelist = os.listdir(Path)
print(filelist)
all_csv_files = []
for x in filelist:
    df = pd.read_csv(Path + "\\" + x)
    all_csv_files.append(df)

dataset = pd.DataFrame(all_csv_files)
dataset.shape

Also tried to read each file and save it to a numpy array and stack them (np.stack) but arrays are not the same size.还尝试读取每个文件并将其保存到 numpy 数组并将它们堆叠(np.stack),但 arrays 的大小不同。 Also pandas.Panel is deprecated.另外 pandas.Panel 已弃用。

for example if we have 2 csv file like first one is:例如,如果我们有 2 个 csv 文件,例如第一个文件是:

a,b,c,d
a,b,d,c
b,x,y,z

and second one is:第二个是:

1,2,3,4
2,3,5,4

I want to output be like:我想 output 是这样的:

[
  [[a,b,c,d],[a,b,d,c],[a,x,y,z]],
  [[1,2,3,4],[2,3,5,4], [Nan, Nan, Nan, Nan]]
]

which is (2,3,4).这是(2,3,4)。

I prefer don't fill Nan but if there is no way it is also ok.我宁愿不要填写 Nan,但如果没有办法也可以。

If you have same columns in all your csv files then you can try the code below.如果您在所有 csv 文件中有相同的列,那么您可以尝试下面的代码。 I have added header=0 so that after reading csv first row can be assigned as the column names.我添加了 header=0 以便在阅读 csv 后可以将第一行指定为列名。

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

You can read this stackoverflow question( Import multiple csv files into pandas and concatenate into one DataFrame ) then you can easily dead your scenario.您可以阅读此 stackoverflow 问题( 将多个 csv 文件导入 pandas 并连接成一个 DataFrame )然后您可以轻松地死掉您的场景。

you can use Asyncio for speed up read all xyz.csv files您可以使用 Asyncio 加速读取所有 xyz.csv 文件

You can use np.stack for that您可以为此使用 np.stack

path = "Something"
filelist = os.listdir(Path)
print(filelist)
all_csv_files = []
for x in filelist:
    df = pd.read_csv(Path + "\\" + x)
    dataset = np.stack((df, df))
dataset.shape

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将多个.nc文件读取到3D熊猫数据框python中 - Read multiple .nc files into a 3D pandas dataframe python 如何读取多个文件并将其加载到数据框中 - How to read multiple files and load them into dataframe 如何读取多张图像并用它们创建3D矩阵? - How to read multiple images and create a 3D matrix with them? 读取多个 csv 文件并将它们转换为系列对象的优雅方式 - Elegant way to read multiple csv files and convert them to series objects 如何读取多个 csv 文件并将它们存储在不同的数据框中? - how to read mutliple csv files and store them in different dataframe? 如何使用 numpy python 读取多个 3d 图像并将它们存储在 4D 数组中? - How to read multiple 3d images and store them in 4D array using numpy python? 如何将数据框转换为3D ndarray - How to convert a dataframe to a 3D ndarray 如何将熊猫数据框转换为 3D 面板 - How to convert pandas dataframe to 3D Panel Python脚本读取一个目录中的多个excel文件并将它们转换为另一个目录中的.csv文件 - Python script to read multiple excel files in one directory and convert them to .csv files in another directory Pyspark 将多个 csv 文件读入数据帧(或 RDD?) - Pyspark read multiple csv files into a dataframe (OR RDD?)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM