[英]How to delete all the rows of the DataFrame except the first row and the last row
[英]how to delete all the rows of a csv that have the word "class" in the first column, except the first row that has it
这是一个过滤掉这些行的小脚本。 它不会将整个文件加载到 memory 中,而是对每一行进行读写,除了以“类”开头的行:
import csv
with open('coords_filtered.csv', 'w', newline='') as out_f:
writer = csv.writer(out_f)
with open('coords.csv', newline='') as in_f:
reader = csv.reader(in_f)
# Transfer header
writer.writerow(next(reader))
for row in reader:
if row[0] == 'class':
continue # skip row / don't write
writer.writerow(row)
如果我理解正确,您需要清除数据中出现的所有重复标题。 如果是这种情况并且文件不是那么大,您可以在 read_csv 使用之后过滤 dataframe
import pandas as pd
df = pd.read_csv('coords.csv',sep=',',header=0)
df = df[df['class'] != 'class']
编辑:要使其正常工作,您必须将索引为 0 的第一行视为 header 以便可以过滤 dataframe
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.