简体   繁体   English

使用Python一步从csv删除特定的行和列

[英]Delete specific rows and columns from csv using Python in one step

I have a csv file where I need to delete the second and the third row and 3rd to 18th column. 我有一个csv文件,需要在其中删除第二行和第三行以及第3列至第18列。 I was able to do get it to work in two steps, which produced an interim file. 我能够分两个步骤进行操作,生成了一个临时文件。 I am thinking that there must be a better and more compact way to do this. 我认为必须有一种更好,更紧凑的方法来做到这一点。 Any suggestions would be really appreciated. 任何建议将不胜感激。

Also, if I want to remove multiple ranges of columns, how do I specify in this code. 另外,如果要删除多个范围的列,如何在此代码中指定。 For example, if I want to remove columns 25 to 29, in addition to columns 3 to 18 already specified, how would I add to the code? 例如,如果我要删除第25到29列,除了已经指定的第3到18列,我将如何添加到代码中? Thanks 谢谢

remove_from = 2
remove_to = 17

with open('file_a.csv', 'rb') as infile, open('interim.csv', 'wb') as outfile: 

    reader = csv.reader(infile)
    writer = csv.writer(outfile)

    for row in reader:
        del row[remove_from : remove_to]
        writer.writerow(row)

with open('interim.csv', 'rb') as infile, open('file_b.csv', 'wb') as outfile:

    reader = csv.reader(infile)
    writer = csv.writer(outfile)

    writer.writerow(next(reader))  

    reader.next()
    reader.next()

    for row in reader: 
        writer.writerow(row)

Here is a pandas approach: 这是一种熊猫方法:

Step 1, creating a sample dataframe 步骤1,创建样本数据框

import pandas as pd

# Create sample CSV-file (100x100)
df = pd.DataFrame(np.arange(10000).reshape(100,100))
df.to_csv('test.csv', index=False)

Step 2, doing the magic 步骤2,做魔术

import pandas as pd
import numpy as np

# Read first row to determine size of columns
size = pd.read_csv('test.csv',nrows=0).shape[1]

#want to remove columns 25 to 29, in addition to columns 3 to 18 already specified,
# Ok so let's create an array with the length of dataframe deleting the ranges
ranges = np.r_[3:19,25:30]
ar = np.delete(np.arange(size),ranges)

# Now let's read the dataframe
# let us also skip rows 2 and 3
df = pd.read_csv('test.csv', skiprows=[2,3], usecols=ar)

# And output
dt.to_csv('output.csv', index=False)

And the proof: 并证明:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM