简体   繁体   English

如何通过 python pandas&csv 跳过众多 CSV 文件的某些行?

[英]How to skip certain rows of numerous CSV files by python pandas&csv?

I have put numerous CSV files in a fold and would like to skip the certain row (eg the 10th row) first, and then take one row every five lines.我已将许多 CSV 文件放在一个折叠中,并想先跳过某行(例如第 10 行),然后每五行取一行。
I could do the first step however have no idea about the second one.我可以做第一步,但不知道第二步。

Thanks.谢谢。

import pandas as pd
import csv, os


# Loop through every file in the current working directory.
for csvFilename in os.listdir('path'):
    if not csvFilename.endswith('.csv'):
        continue
    # Now let's read the dataframe
    # total row number
    total_line = len(open('path' + csvFilename).readlines())
    # put the first and last to a list
    line_list = [total_line] + [1]
    df = pd.read_csv('path' + csvFilename, skiprows=line_list)
    new_file_name = csvFilename

    # And output
    df.to_csv('path' + new_file_name, index=False)

The correct code is shown as follows.正确的代码如下所示。

import numpy as np
import pandas as pd
import csv, os

# Loop through every file in the current working directory.
for csvFilename in os.listdir('path'):
    if not csvFilename.endswith('.csv'):
        continue
    # Now let's read the dataframe
    total_line = len(open('path' + csvFilename).readlines())
    skip = np.arange(total_line)
    # skip 5 rows
    skip = np.delete(skip, np.arange(0, total_line, 5))
    # skip the certain row you would like, e.g. 10
    skip = np.append(skip, 10)
    df = pd.read_csv('path' + csvFilename, skiprows=skip)

    new_file_name = '2' + csvFilename
    # And output
    df.to_csv('path' + new_file_name, index=False)

You can use a function with skiprows .您可以将 function 与skiprows一起使用。

I edited your code below:我在下面编辑了您的代码:

    import numpy as np  
    import csv, os  

    # Loop through every file in the current working directory.
    for csvFilename in os.listdir('path'):
        if not csvFilename.endswith('.csv'):
            continue
        # Now let's read the dataframe
        total_line = len(open('path' + csvFilename).readlines())

        df = pd.read_csv('path' + csvFilename, skiprows=lambda x: x in list(range(total_line))[1:-1:5])

        new_file_name = csvFilename
        # And output
        df.to_csv('path' + new_file_name, index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM