简体   繁体   English

python从包含NA的文本中删除行

[英]python remove lines from text that contain NA

I have data at http://people.stern.nyu.edu/ja1517/data/marketing.data , which I've saved as marketing.data.txt 我的数据位于http://people.stern.nyu.edu/ja1517/data/marketing.data ,我已将其保存为marketing.data.txt

I want to remove lines that contain NA. 我要删除包含NA的行。 I wrote a 6 line python script, but it isn't working. 我写了一个6行的python脚本,但是没有用。

Can anyone point out my mistake? 谁能指出我的错误?

import re
f = open('marketing.data.txt')
g = open('marketing_complete.txt', 'w')
for line in f:
    if re.search('NA', line) is None:
        g.write(line)

I know this hasn't worked because I tried the following at the command line. 我知道这没有用,因为我在命令行中尝试了以下操作。

grep 'NA' marketing_complete.txt | wc -l

which returns 3... :( 返回3 ... :(

You don't need re to do this: 您不需要re执行此操作:

f = open('marketing.data.txt')
g = open('marketing_complete.txt', 'w')
for line in f:
    if 'NA' not in line:
        g.write(line)

It is good practice to open files with context managers: 最好使用上下文管理器打开文件:

with open('marketing.data.txt') as f:
   with open('marketing_complete.txt', 'w') as g:
       for line in f:
           if 'NA' not in line:
               g.write(line)

您可以grep-

grep -v NA marketing.data.txt > marketing_complete.txt

If you are already grepping, just do: 如果您已经在抓紧时间,请执行以下操作:

grep -v NA marketing.data.txt > marketing_complete.txt

The '-v' option inverts the search so only lines that don't match are printed. '-v'选项使搜索反向,因此仅打印匹配的行。

对于您的if语句,请尝试

if 'NA' not in line:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM