简体   繁体   English

使用Python从多个CSV文件中删除行

[英]Deleting rows from several CSV files using Python

I wanted to delete specific rows from every single csv. 我想从每个单独的csv中删除特定的行。 files in my directory (ie from row 0 to 33), but I have 224 separate csv. 目录中的文件(即从第0到33行),但是我有224个单独的csv。 files which need to be done. 需要完成的文件。 I would be happy if you help me how can I use one code to carry out this. 如果您能帮助我,我将如何使用一个代码来执行此操作,我将非常高兴。

I think you can use glob and pandas to do this quite easily, I'm not sure if you want to write over your original files something I never recommend, so be careful as this code will do that. 我认为您可以使用glob和pandas轻松地完成此操作,我不确定您是否要在原始文件中写一些我不推荐的内容,因此请谨慎使用此代码。

import os
import glob
import pandas as pd

os.chdir(r'yourdir')
allFiles = glob.glob("*.csv") # match your csvs
for file in allFiles:
   df = pd.read_csv(file)
   df = df.iloc[33:,] # read from row 34 onwards.
   df.to_csv(file)
   print(f"{file} has removed rows 0-33")

or something along those lines.. 或类似的规定..

This is a simple combination of two separate tasks. 这是两个单独任务的简单组合。

First, you need to loop through all the csv files in a folder. 首先,您需要循环浏览文件夹中的所有csv文件。 See this StackOverflow answer for how to do that. 请参阅此StackOverflow答案以了解有关操作方法。

Next, within that loop, for each file, you need to modify the csv by removing rows. 接下来,在该循环中,对于每个文件,您需要通过删除行来修改csv。 See this answer for how to read a csv, write a csv, and omit certain rows based on a condition. 有关如何读取csv,编写csv以及根据条件省略某些行的信息,请参见此答案

One final aspect is that you want to omit certain line numbers. 最后一个方面是您要省略某些行号。 A good way to do this is with the enumerate function. 一个好的方法是使用枚举函数。

So code such as this will give you the line numbers. 因此,诸如此类的代码将为您提供行号。

import csv
input = open('first.csv', 'r')
output = open('first_edit.csv', 'w')
writer = csv.writer(output)
for i, row in enumerate(input):
    if i > 33:
        writer.writerow(row)
input.close()
output.close()

Iterate over CSV files and use Pandas to remove the top 34 rows of each file then save it to an output directory. 遍历CSV文件并使用Pandas删除每个文件的前34行,然后将其保存到输出目录。

Try this code after installing pandas : 在安装pandas之后尝试以下代码:

from pathlib import Path
import pandas as pd

source_dir = Path('path/to/source/directory')
output_dir = Path('path/to/output/directory')

for file in source_dir.glob('*.csv'):
    df = pd.read_csv(file)
    df.drop(df.head(34).index, inplace=True)
    df.to_csv(output_dir.joinpath(file.name), index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM