[英]How read multiple csv files and convert them to a 3d dataframe
I have multiple.csv files.我有多个 .csv 文件。 they have same column size but different number of rows.它们具有相同的列大小但行数不同。 I want to make a dataframe which the 3rd dimension shows each file.我想制作一个 dataframe,其中第 3 个维度显示每个文件。 I tried read each file and save it to a dataframe, then append them to a list, but when convert list to dataframe the output is a two dimension dataframe (if we have 5 files then out puth is (5, 1) dataframe).我尝试读取每个文件并将其保存到 dataframe,然后 append 将它们保存到列表,但是当将列表转换为 dataframe 时,output 是二维的 dataframe(如果我们有 5 个文件,则输出是 (5, 1) 数据框)。
path = "Something"
filelist = os.listdir(Path)
print(filelist)
all_csv_files = []
for x in filelist:
df = pd.read_csv(Path + "\\" + x)
all_csv_files.append(df)
dataset = pd.DataFrame(all_csv_files)
dataset.shape
Also tried to read each file and save it to a numpy array and stack them (np.stack) but arrays are not the same size.还尝试读取每个文件并将其保存到 numpy 数组并将它们堆叠(np.stack),但 arrays 的大小不同。 Also pandas.Panel is deprecated.另外 pandas.Panel 已弃用。
for example if we have 2 csv file like first one is:例如,如果我们有 2 个 csv 文件,例如第一个文件是:
a,b,c,d
a,b,d,c
b,x,y,z
and second one is:第二个是:
1,2,3,4
2,3,5,4
I want to output be like:我想 output 是这样的:
[
[[a,b,c,d],[a,b,d,c],[a,x,y,z]],
[[1,2,3,4],[2,3,5,4], [Nan, Nan, Nan, Nan]]
]
which is (2,3,4).这是(2,3,4)。
I prefer don't fill Nan but if there is no way it is also ok.我宁愿不要填写 Nan,但如果没有办法也可以。
If you have same columns in all your csv files then you can try the code below.如果您在所有 csv 文件中有相同的列,那么您可以尝试下面的代码。 I have added header=0 so that after reading csv first row can be assigned as the column names.我添加了 header=0 以便在阅读 csv 后可以将第一行指定为列名。
import pandas as pd
import glob
path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
You can read this stackoverflow question( Import multiple csv files into pandas and concatenate into one DataFrame ) then you can easily dead your scenario.您可以阅读此 stackoverflow 问题( 将多个 csv 文件导入 pandas 并连接成一个 DataFrame )然后您可以轻松地死掉您的场景。
you can use Asyncio for speed up read all xyz.csv files您可以使用 Asyncio 加速读取所有 xyz.csv 文件
You can use np.stack for that您可以为此使用 np.stack
path = "Something"
filelist = os.listdir(Path)
print(filelist)
all_csv_files = []
for x in filelist:
df = pd.read_csv(Path + "\\" + x)
dataset = np.stack((df, df))
dataset.shape
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.