简体   繁体   English

如何删除 CSV 文件中的列?

[英]How to delete columns in a CSV file?

I have been able to create a csv with python using the input from several users on this site and I wish to express my gratitude for your posts.我已经能够使用该站点上多个用户的输入创建一个 csv 和 python,我希望对您的帖子表示感谢。 I am now stumped and will post my first question.我现在很困惑,将发布我的第一个问题。

My input.csv looks like this:我的 input.csv 看起来像这样:

day,month,year,lat,long
01,04,2001,45.00,120.00
02,04,2003,44.00,118.00

I am trying to delete the "year" column and all of its entries.我正在尝试删除“年”列及其所有条目。 In total there is 40+ entries with a range of years from 1960-2010.总共有 40 多个条目,年份范围从 1960 年到 2010 年。

import csv
with open("source","rb") as source:
    rdr= csv.reader( source )
    with open("result","wb") as result:
        wtr= csv.writer( result )
        for r in rdr:
            wtr.writerow( (r[0], r[1], r[3], r[4]) )

BTW, the for loop can be removed, but not really simplified.顺便说一句,可以删除for循环,但并没有真正简化。

        in_iter= ( (r[0], r[1], r[3], r[4]) for r in rdr )
        wtr.writerows( in_iter )

Also, you can stick in a hyper-literal way to the requirements to delete a column.此外,您可以以超字面意思的方式满足删除列的要求。 I find this to be a bad policy in general because it doesn't apply to removing more than one column.我发现这通常是一个糟糕的政策,因为它不适用于删除多个列。 When you try to remove the second, you discover that the positions have all shifted and the resulting row isn't obvious.当您尝试删除第二个时,您发现所有位置都发生了变化,结果行并不明显。 But for one column only, this works.但仅对于一列,这是有效的。

            del r[2]
            wtr.writerow( r )

Use of Pandas module will be much easier.使用 Pandas 模块会容易得多。

import pandas as pd
f=pd.read_csv("test.csv")
keep_col = ['day','month','lat','long']
new_f = f[keep_col]
new_f.to_csv("newFile.csv", index=False)

And here is short explanation:这是简短的解释:

>>>f=pd.read_csv("test.csv")
>>> f
   day  month  year  lat  long
0    1      4  2001   45   120
1    2      4  2003   44   118
>>> keep_col = ['day','month','lat','long'] 
>>> f[keep_col]
    day  month  lat  long
0    1      4   45   120
1    2      4   44   118
>>>

Using a dict to grab headings then looping through gets you what you need cleanly.使用 dict 来抓取标题,然后循环获取您需要的内容。

import csv
ct = 0
cols_i_want = {'cost' : -1, 'date' : -1}
with open("file1.csv","rb") as source:
    rdr = csv.reader( source )
    with open("result","wb") as result:
        wtr = csv.writer( result )
        for row in rdr:
            if ct == 0:
              cc = 0
              for col in row:
                for ciw in cols_i_want: 
                  if col == ciw:
                    cols_i_want[ciw] = cc
                cc += 1
            wtr.writerow( (row[cols_i_want['cost']], row[cols_i_want['date']]) )
            ct += 1

您可以直接删除列

del variable_name['year']

I would use Pandas with col number我会使用带有 col 编号的 Pandas

f = pd.read_csv("test.csv", usecols=[0,1,3,4]) f = pd.read_csv("test.csv", usecols=[0,1,3,4])

f.to_csv("test.csv", index=False) f.to_csv("test.csv", index=False)

I will add yet another answer to this question.我将为这个问题添加另一个答案。 Since the OP did not say they needed to do it with Python, the fastest way to delete the column (specially when the input file has hundreds of thousands of lines), is by using awk .由于 OP 没有说他们需要用 Python 来做,删除列的最快方法(特别是当输入文件有数十万行时)是使用awk

This is the type of problem where awk shines:这是 awk 闪耀的问题类型:

$ awk -F, 'BEGIN {OFS=","} {print $1,$2,$4,$5}' input.csv

(feel free to append > output.csv to the command above if you need the output to be saved to a file) (如果您需要将 output 保存到文件,请随意输入 append > output.csv到上面的命令)

Credit goes 100% to @eric-wilson who provided this awesome answer, as a comment on the original question, 10 years ago, almost without any credit. 100% 归功于@eric-wilson,他提供了这个很棒的答案,作为对原始问题的评论,10 年前,几乎没有任何功劳。

you can use the csv package to iterate over your csv file and output the columns that you want to another csv file.您可以使用csv包迭代您的 csv 文件并将您想要的列输出到另一个 csv 文件。

The example below is not tested and should illustrate a solution:下面的示例未经测试,应说明解决方案:

import csv

file_name = 'C:\Temp\my_file.csv'
output_file = 'C:\Temp\new_file.csv'
csv_file = open(file_name, 'r')
## note that the index of the year column is excluded
column_indices = [0,1,3,4]
with open(output_file, 'w') as fh:
    reader = csv.reader(csv_file, delimiter=',')
    for row in reader:
       tmp_row = []
       for col_inx in column_indices:
           tmp_row.append(row[col_inx])
       fh.write(','.join(tmp_row))

Off the top of my head, this will do it without any sort of error checking nor ability to configure anything.在我的脑海里,这将在没有任何错误检查或配置任何东西的情况下完成。 That is "left to the reader".那就是“留给读者”。

outFile = open( 'newFile', 'w' )
for line in open( 'oldFile' ):
   items = line.split( ',' )
   outFile.write( ','.join( items[:2] + items[ 3: ] ) )
outFile.close()

Try:尝试:

result= data.drop('year', 1)
result.head(5)

Try python with pandas and exclude the column, you don't want to have:尝试 python 和 pandas 并排除该列,您不想拥有:

import pandas as pd

# the ',' is the default separator, but if your file has another one, you have to define it with sep= parameter
df = pd.read_csv("input.csv", sep=',')
exclude_column = "year"
new_df = df.loc[:, df.columns != exclude_column]
# you can even save the result to the same file
new_df.to_csv("input.csv", index=False, sep=',')

It depends on how you store the parsed CSV, but generally you want the del operator.这取决于您如何存储解析后的 CSV,但通常您需要 del 运算符。

If you have an array of dicts:如果你有一个字典数组:

input = [ {'day':01, 'month':04, 'year':2001, ...}, ... ]
for E in input: del E['year']

If you have an array of arrays:如果你有一个数组数组:

input = [ [01, 04, 2001, ...],
          [...],
          ...
        ]
for E in input: del E[2]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM