[英]python remove lines from text that contain NA
I have data at http://people.stern.nyu.edu/ja1517/data/marketing.data , which I've saved as marketing.data.txt 我的数据位于http://people.stern.nyu.edu/ja1517/data/marketing.data ,我已将其保存为marketing.data.txt
I want to remove lines that contain NA. 我要删除包含NA的行。 I wrote a 6 line python script, but it isn't working. 我写了一个6行的python脚本,但是没有用。
Can anyone point out my mistake? 谁能指出我的错误?
import re
f = open('marketing.data.txt')
g = open('marketing_complete.txt', 'w')
for line in f:
if re.search('NA', line) is None:
g.write(line)
I know this hasn't worked because I tried the following at the command line. 我知道这没有用,因为我在命令行中尝试了以下操作。
grep 'NA' marketing_complete.txt | wc -l
which returns 3... :( 返回3 ... :(
You don't need re
to do this: 您不需要re
执行此操作:
f = open('marketing.data.txt')
g = open('marketing_complete.txt', 'w')
for line in f:
if 'NA' not in line:
g.write(line)
It is good practice to open files with context managers: 最好使用上下文管理器打开文件:
with open('marketing.data.txt') as f:
with open('marketing_complete.txt', 'w') as g:
for line in f:
if 'NA' not in line:
g.write(line)
您可以grep-
grep -v NA marketing.data.txt > marketing_complete.txt
If you are already grepping, just do: 如果您已经在抓紧时间,请执行以下操作:
grep -v NA marketing.data.txt > marketing_complete.txt
The '-v' option inverts the search so only lines that don't match are printed. '-v'选项使搜索反向,因此仅打印不匹配的行。
对于您的if语句,请尝试
if 'NA' not in line:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.