简体   繁体   中英

Importing CSV that has an extra row represented as new columns

As the title says, I'm attempting to import about 30 CSV files and combine into one. Each file has 15 of the 'correct' columns, with an additional 15 columns of only one row's worth of data. So, in total, each file appears to have 30 columns, but in reality there should only be 15 columns, and the second set of columns should just be appended to the bottom of my data as an extra row. For a smaller example, it looks something like this:

Col1 Col2 Col3 Col4 Col5 5.87 6.12 5.50 4.98 2.87
2.50 3.50 5.66 5.23 2.11
1.12 9.99 1.15 5.44 3.12

I'm looking to take those 5 'extra' columns (which are really just one extra row) and move them to the bottom of my data, as another row, instead of appearing as 5 more columns.

The problem is a botched header in each of the csv's. You can setup csv readers and a writer and at the start of each new input file, check whether the header is too long. You'll suppress the header after the first csv and insert the errant row as you go

output_csv = 'out.csv'
have_header = False

with open(output_csv, newline='', encoding='utf-8') as out_fp:
    writer = csv.writer(out_fp)
    for input_csv in input_csvs:
        with open(input_csv, newline='', encoding='utf-8') as in_fp:
            reader = csv.reader(in_fp)
            end_rows = []
            header = next(reader)
            # check for botched first line where some upstream
            # program can't get its newlines right
            if len(header) > 15:
                header = header[:15]
                first_row = header[15:]
            else:
                first_row = None
            if not have_header:
                writer.writerow(header)
                have_header = True
            if first_row:
                writer.writerow(first_row)
            write.writerows(reader)

This would help.

data = pd.read_csv("text.csv")
data = data.iloc[:,:15].append(data.iloc[1,15:])

this will cut of the second 15 columns and append the first row in bottom.

You can simply build a one row dataframe from the last part of the columns index, append it to the dataframe, and only keep the relevant columns:

n = len(df.columns)/2
df = df.append(pd.DataFrame([df.columns[n:].tolist()], columns=df.columns[:n]))[df.columns[:n]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM