简体   繁体   English

如何使用python从CSV文件中删除一些顶部行和最后一行

[英]How to delete few top rows and last row from a CSV file using python

I have CSV files those I can't edit using Excel. 我有无法使用Excel编辑的CSV文件。 I want to make a dynamic code to delete top few rows (before header row) and last row without inputting row numbers. 我想制作一个动态代码,以删除前几行(标题行之前)和最后一行,而无需输入行号。 Code I am using right now is: 我现在使用的代码是:

FIRST_ROW_NUM = 1  
ROWS_TO_DELETE = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 5421344}
with open('filename', 'r') as infile,open('filename', 'w') as outfile:
     outfile.writelines(row for row_num, row in enumerate(infile, FIRST_ROW_NUM)
                    if row_num not in ROWS_TO_DELETE)

The problem with this code is I have to manually input rows number to delete them. 此代码的问题是我必须手动输入行号才能删除它们。

Another issue I have is the number of rows to delete is not constant and changes from file to file. 我遇到的另一个问题是要删除的行数不是恒定的,并且在文件之间进行更改。

The sample CSV is attached here 样本CSV附在这里

I want a code that can somehow delete those rows without any input from my side. 我想要一个可以以某种方式删除这些行而无需我提供任何输入的代码。

Note: There is no info about the last row in the CSV but it is something like this: 注意:CSV中没有关于最后一行的信息,但是它是这样的:

Grand Total: - -  - - - - - - - - - - - - - - -  - - - -  - -  - - - 

Open your input and output files, and then: 打开输入和输出文件,然后:

for line in infile:
    if <line matches header row>:
        break
outfile.write(line)
for line in infile:
    if <line matches grand total line>:
        break
    outfile.write(line)

I'd first read in the entire file as a string and split it on what seems to be the indicator for the dataframe you are trying to read 'Report Fields' . 我首先以字符串形式读取整个文件,然后将其拆分为似乎是您试图读取'Report Fields'的数据框的指示器。 Then you can eliminate the last row by splitting on newlines and indexing the list to include all but the last with [:-1] 然后,您可以通过以下方式消除最后一行:在换行符上进行拆分,并为列表建立索引以包含除[:-1]

with open('infile.csv', 'r') as infile, open('outfile.csv', 'w') as outfile:
    txt = infile.read().split('Report Fields')[1]
    outfile.write('\n'.join(txt.split('\n')[1:-1]))
import pandas as pd
df = pd.read_csv('file_name.csv', skiprows=27)
df.drop(df.index[5421327]) #5421327 = 5421344-27

You can use pandas and read_csv module to do it. 您可以使用pandas和read_csv模块来做到这一点。 Skiprows define line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. Skiprows在文件的开头定义要跳过的行号(0索引)或要跳过的行数(int)。 And at the end drop col with 5421344 index. 并最终以5421344指数下跌col。

It's for static values. 它用于静态值。 For dynamic if rows before rows or NaN you can use this: 对于动态的,如果行在行之前或NaN,则可以使用以下命令:

import pandas as pd
df = read_csv('file_name', skiprows=1)
df.dropna(axis=0, inplace=True)
df.drop(df.iloc[-1])

This could be done using Python's csv library to aid with parsing the file, and the use of itertools dropwhile and takewhile functions to pick out the rows you want: 可以使用Python的csv库来帮助解析文件,并使用itertools dropwhiletakewhile函数来选择所需的行:

import itertools    
import csv

with open('Test.csv', newline='') as f_input, open('output.csv', 'w', newline='') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    # Skip over initial lines until the header row
    next(itertools.dropwhile(lambda x: x[0] != "Report Fields", csv_input))

    # Write rows until the total row is found
    csv_output.writerows(itertools.takewhile(lambda x: "Grand Total" not in x[0], csv_input))   

This reads each row of the CSV file until it finds a row with the first column containing Report Fields . 这将读取CSV文件的每一行,直到找到第一行包含Report Fields It then skips this row. 然后,它跳过此行。 Now it writes all the remaining rows to an output CSV file until the first column entry contains the words Grand Total and then stops. 现在,它将所有剩余的行写入输出CSV文件,直到第一列条目包含单词Grand Total ,然后停止。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM