简体   繁体   English

导入具有表示为新列的额外行的 CSV

[英]Importing CSV that has an extra row represented as new columns

As the title says, I'm attempting to import about 30 CSV files and combine into one.正如标题所说,我正在尝试导入大约 30 个 CSV 文件并将其合并为一个。 Each file has 15 of the 'correct' columns, with an additional 15 columns of only one row's worth of data.每个文件有 15 个“正确”列,另外还有 15 列仅包含一行数据。 So, in total, each file appears to have 30 columns, but in reality there should only be 15 columns, and the second set of columns should just be appended to the bottom of my data as an extra row.所以,总的来说,每个文件似乎有 30 列,但实际上应该只有 15 列,第二组列应该作为额外的行附加到我的数据的底部。 For a smaller example, it looks something like this:对于较小的示例,它看起来像这样:

Col1 Col2 Col3 Col4 Col5 5.87 6.12 5.50 4.98 2.87
2.50 3.50 5.66 5.23 2.11
1.12 9.99 1.15 5.44 3.12

I'm looking to take those 5 'extra' columns (which are really just one extra row) and move them to the bottom of my data, as another row, instead of appearing as 5 more columns.我希望将这 5 个“额外”列(实际上只是一个额外的行)并将它们移动到我的数据底部,作为另一行,而不是显示为另外 5 个列。

The problem is a botched header in each of the csv's.问题是每个 csv 中的标题都有问题。 You can setup csv readers and a writer and at the start of each new input file, check whether the header is too long.您可以设置 csv 读取器和写入器,并在每个新输入文件的开头检查标题是否太长。 You'll suppress the header after the first csv and insert the errant row as you go您将在第一个 csv 之后隐藏标题并随时插入错误的行

output_csv = 'out.csv'
have_header = False

with open(output_csv, newline='', encoding='utf-8') as out_fp:
    writer = csv.writer(out_fp)
    for input_csv in input_csvs:
        with open(input_csv, newline='', encoding='utf-8') as in_fp:
            reader = csv.reader(in_fp)
            end_rows = []
            header = next(reader)
            # check for botched first line where some upstream
            # program can't get its newlines right
            if len(header) > 15:
                header = header[:15]
                first_row = header[15:]
            else:
                first_row = None
            if not have_header:
                writer.writerow(header)
                have_header = True
            if first_row:
                writer.writerow(first_row)
            write.writerows(reader)

This would help.这会有所帮助。

data = pd.read_csv("text.csv")
data = data.iloc[:,:15].append(data.iloc[1,15:])

this will cut of the second 15 columns and append the first row in bottom.这将削减第二个 15 列并在底部附加第一行。

You can simply build a one row dataframe from the last part of the columns index, append it to the dataframe, and only keep the relevant columns:您可以简单地从列索引的最后一部分构建一个单行数据框,将其附加到数据框,并只保留相关列:

n = len(df.columns)/2
df = df.append(pd.DataFrame([df.columns[n:].tolist()], columns=df.columns[:n]))[df.columns[:n]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM