简体   繁体   中英

How to delete few top rows and last row from a CSV file using python

I have CSV files those I can't edit using Excel. I want to make a dynamic code to delete top few rows (before header row) and last row without inputting row numbers. Code I am using right now is:

FIRST_ROW_NUM = 1  
ROWS_TO_DELETE = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 5421344}
with open('filename', 'r') as infile,open('filename', 'w') as outfile:
     outfile.writelines(row for row_num, row in enumerate(infile, FIRST_ROW_NUM)
                    if row_num not in ROWS_TO_DELETE)

The problem with this code is I have to manually input rows number to delete them.

Another issue I have is the number of rows to delete is not constant and changes from file to file.

The sample CSV is attached here

I want a code that can somehow delete those rows without any input from my side.

Note: There is no info about the last row in the CSV but it is something like this:

Grand Total: - -  - - - - - - - - - - - - - - -  - - - -  - -  - - - 

Open your input and output files, and then:

for line in infile:
    if <line matches header row>:
        break
outfile.write(line)
for line in infile:
    if <line matches grand total line>:
        break
    outfile.write(line)

I'd first read in the entire file as a string and split it on what seems to be the indicator for the dataframe you are trying to read 'Report Fields' . Then you can eliminate the last row by splitting on newlines and indexing the list to include all but the last with [:-1]

with open('infile.csv', 'r') as infile, open('outfile.csv', 'w') as outfile:
    txt = infile.read().split('Report Fields')[1]
    outfile.write('\n'.join(txt.split('\n')[1:-1]))
import pandas as pd
df = pd.read_csv('file_name.csv', skiprows=27)
df.drop(df.index[5421327]) #5421327 = 5421344-27

You can use pandas and read_csv module to do it. Skiprows define line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. And at the end drop col with 5421344 index.

It's for static values. For dynamic if rows before rows or NaN you can use this:

import pandas as pd
df = read_csv('file_name', skiprows=1)
df.dropna(axis=0, inplace=True)
df.drop(df.iloc[-1])

This could be done using Python's csv library to aid with parsing the file, and the use of itertools dropwhile and takewhile functions to pick out the rows you want:

import itertools    
import csv

with open('Test.csv', newline='') as f_input, open('output.csv', 'w', newline='') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    # Skip over initial lines until the header row
    next(itertools.dropwhile(lambda x: x[0] != "Report Fields", csv_input))

    # Write rows until the total row is found
    csv_output.writerows(itertools.takewhile(lambda x: "Grand Total" not in x[0], csv_input))   

This reads each row of the CSV file until it finds a row with the first column containing Report Fields . It then skips this row. Now it writes all the remaining rows to an output CSV file until the first column entry contains the words Grand Total and then stops.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM