繁体   English   中英

如何使用python从csv文件中删除行?

[英]How to delete lines from csv file using python?

我有一个CSV文件:它包含类名称和代码气味的类型,并且为每个类计算了代码气味的数量。最终计算在最后一行,因此有许多重复的类名称。 我只需要类名的最后一行。

这是我的CSV文件的一部分,因为它太长了:

NameOfClass,LazyClass,ComplexClass,LongParameterList,FeatureEnvy,LongMethod,BlobClass,MessageChain,RefusedBequest,SpaghettiCode,SpeculativeGenerality
com.nirhart.shortrain.MainActivity,NaN,NaN,NaN,NaN,NaN,NaN,1,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,1,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,2,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,2,1,1,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathPoint,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathPoint,1,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.TrainPath,NaN,NaN,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.rail.RailActionActivity,NaN,NaN,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.rail.RailActionActivity,NaN,NaN,NaN,1,1,NaN,NaN,NaN,NaN,NaN

要仅获取唯一的类名( 忽略重复的行,而不是删除它们),可以执行以下操作:

import csv

with open('my_file.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    classNames = set(row[0] for row in reader)  
print(classNames)
# {'com.nirhart.shortrain.MainActivity', 'com.nirhart.shortrain.path.PathParser', 'com.nirhart.shortrain.path.PathPoint', ...}

这只是使用csv模块打开文件,获取每一行的第一个值,然后仅获取那些文件的唯一值。 然后,您可以根据需要操纵生成的字符串集(您可能希望通过list(classNames)将其转换回list )。

如果打算以后处理大熊猫中的数据,则过滤重复项很简单:

import pandas as pd

df = pd.read_csv('file.csv')
df = df.loc[~df.NameOfClass.duplicated(keep='last')]

如果您只想使用预期的行来构建新的csv文件,则pandas会显得过大,而csv模块就足够了:

import csv

with open('file.csv') as fdin, file('new_file.csv', 'w', newline='') as fdout:
    rd = csv.reader(fdin)
    wr = csv.writer(fdout)
    wr.writerow(next(rd))    # copy the header line
    old = None
    for row in rd:
        if old is not None and old[0] != row[0]:
            wr.writerow(old)
        old = row
    wr.writerow(old)

要过滤出NameOfClass组的最后一个条目,可以使用Python的groupby()函数返回具有相同NameOfClass的行的列表。 然后可以将每个文件的最后一个条目写入文件。

from itertools import groupby
import csv

with open('data_in.csv', newline='') as f_input, open('data_out.csv', 'w', newline='') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    for key, rows in groupby(csv_input, key=lambda x: x[0]):
        csv_output.writerow(list(rows)[-1])

对于您提供的数据,这将为您提供以下输出:

NameOfClass,LazyClass,ComplexClass,LongParameterList,FeatureEnvy,LongMethod,BlobClass,MessageChain,RefusedBequest,SpaghettiCode,SpeculativeGenerality
com.nirhart.shortrain.MainActivity,NaN,NaN,NaN,NaN,NaN,NaN,1,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,2,1,1,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathPoint,1,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.TrainPath,NaN,NaN,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.rail.RailActionActivity,NaN,NaN,NaN,1,1,NaN,NaN,NaN,NaN,NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM