简体   繁体   English

如果csv中包含某个单词,该如何删除该行?

[英]How to remove a line from a csv if it contains a certain word?

I have a CSV file that looks something like this: 我有一个看起来像这样的CSV文件:

    2014-6-06 08:03:19, 439105, 1053224, Front Entrance
    2014-6-06 09:43:21, 439105, 1696241, Main Exit
    2014-6-06 10:01:54, 1836139, 1593258, Back Archway
    2014-6-06 11:34:26, 845646, external, Exit 
    2014-6-06 04:45:13, 1464748, 439105, Side Exit

I was wondering how to delete a line if it includes the word "external"? 我想知道如果删除包含“ external”一词的行吗?

I saw another post on SO that addressed a very similar issue, but I don't understand completely... 我看到了另一篇关于SO的文章 ,该文章解决了一个非常相似的问题,但我并不完全理解...

I tried to use something like this (as explained in the linked post): 我试图使用类似这样的东西(如链接文章中所述):

TXT_file = 'whatYouWantRemoved.txt'
CSV_file = 'comm-data-Fri.csv'
OUT_file = 'OUTPUT.csv'

## From the TXT, create a list of domains you do not want to include in output
with open(TXT_file, 'r') as txt:
    domain_to_be_removed_list = []

## for each domain in the TXT
## remove the return character at the end of line
## and add the domain to list domains-to-be-removed list
for domain in txt:
    domain = domain.rstrip()
    domain_to_be_removed_list.append(domain)


with open(OUT_file, 'w') as outfile:
    with open(CSV_file, 'r') as csv:

        ## for each line in csv
        ## extract the csv domain
        for line in csv:
            csv_domain = line.split(',')[0]

            ## if csv domain is not in domains-to-be-removed list,
            ## then write that to outfile
            if (csv_domain not in domain_to_be_removed_list):
                outfile.write(line)

The text file just held the one word "external" but it didn't work.... and I don't understand why. 文本文件只包含一个单词“ external”,但没有用....而且我不明白为什么。

What happens is that the program will run, and the output.txt will be generated, but nothing will change, and no lines with "external" are taken out. 发生的情况是程序将运行,并且将生成output.txt,但是什么都不会改变,并且不会删除带有“ external”的行。

I'm using Windows and python 3.4 if it makes a difference. 我正在使用Windows和python 3.4,如果有所作为。

Sorry if this seems like a really simple question, but I'm new to python and any help in this area would be greatly appreciated, thanks!! 抱歉,这似乎是一个非常简单的问题,但是我是python的新手,所以在此领域的任何帮助将不胜感激,谢谢!

It looks like you are grabbing the first element after you split the line. 分割线后,似乎正在抓取第一个元素。 That is going to give you the date, according to your example CSV file. 根据您的示例CSV文件,这将为您提供日期。

What you probably want instead (again, assuming the example is the way it will always work) is to grab the 3rd element, so something like this: 相反,您可能想要的(再次假设示例是它始终运行的方式)是获取3rd元素,因此如下所示:

csv_domain = line.split(',')[2]

But, like one of the comments said, this isn't necessarily fool proof. 但是,就像其中一条评论所说的那样,这并不一定是傻瓜。 You are assuming none of the individual cells will have commas. 您假设各个单元格都没有逗号。 Based on your example that might be a safe assumption, but in general when working with CSV files I recommend working with the Python csv module . 根据您的示例,这可能是一个安全的假设,但是通常,在使用CSV文件时,我建议使用Python csv模块

Redirect output to a new file. 将输出重定向到新文件。 It will give you every line, except those that contain "external" 它会给您每一行,但包含“外部”的行除外

import sys
import re

f = open('sum.csv', "r")
lines = f.readlines()

p = re.compile('external')

for line in lines:
    if(p.search(line)):
        continue
else:
    sys.stdout.write(line)

if you can go with something else then python, grep would work like this: 如果您可以使用其他东西,那么python,grep会像这样工作:

grep file.csv "some regex" > newfile.csv

would give you ONLY the lines that match the regex, while: 只会给您与正则表达式匹配的行,而:

grep -v file.csv "some regex" > newfile.csv 

gives everything BUT the lines matching the regex 给所有但匹配正则表达式的行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM