简体   繁体   English

如何逐行构建数据框,其中每一行来自不同的 csv?

[英]How to build a dataframe row by row, where each row comes from a different csv?

I have searched through perhaps a dozen variations of the question "How to build a dataframe row by row", but none of the solutions have worked for me.我已经搜索了“如何逐行构建数据框”问题的十几种变体,但没有一个解决方案对我有用。 Thus, though this is a frequently asked question, my case is unique enough to be a valid question.因此,虽然这是一个常见问题,但我的案例非常独特,足以成为一个有效的问题。 I think the problem might be that I am grabbing each row from a different csv.我认为问题可能是我从不同的 csv 中抓取每一行。 This code demonstrates that I am successfully making dataframes in the loop:这段代码表明我在循环中成功制作了数据帧:

onlyfiles = list_of_csvs 
for idx, f in enumerate(onlyfiles):
    row = pd.read_csv(mypath + f,sep="|").iloc[0:1]

But the rows are individual dataframes and cannot be combined (so far).但行是单独的数据帧,不能组合(到目前为止)。 I have attempted the following:我尝试了以下方法:

df = pd.DataFrame()
for idx, f in enumerate(onlyfiles):
    row = pd.read_csv(path + f,sep="|").iloc[0:1]
    df.iloc(idx) = row

Which returns哪个返回

    df.loc(idx) = row
    ^
SyntaxError: can't assign to function call

I think the problem is that each row, or dataframe, has its own headers.我认为问题在于每一行或数据帧都有自己的标题。 I've also tried df.loc(idx) = row[1] but that doesn't work either (where we grab row[:] when idx = 0 ).我也试过df.loc(idx) = row[1]但这也不起作用( row[:] when idx = 0我们抓住row[:] when idx = 0 )。 Neither iloc(idx) or loc(idx) works. iloc(idx) or loc(idx)iloc(idx) or loc(idx)

In the end, I want one dataframe that has the header (column names) from the first data frame, and then n rows where n is the number of files.最后,我想要一个数据帧,它具有来自第一个数据帧的标题(列名),然后是 n 行,其中 n 是文件数。

Try pd.concat() .试试pd.concat()

Note, you can read just the first line from the file directly, instead of reading in the file and then limiting to first row.请注意,您可以直接从文件中读取第一行,而不是读取文件然后限制到第一行。 pass parameter nrows=1 in pd.read_csv.在 pd.read_csv 中传递参数nrows=1

onlyfiles = list_of_csvs 
df_joint = pd.DataFrame()
for f in enumerate(onlyfiles):
    df_ = pd.read_csv(mypath + f,sep="|", nrows=1)
    df_joint = pd.concat([df_joint, df_])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM