[英]Remove every word that starts with a certain string
I'm trying to remove every word that starts with a certain string in a text file.我正在尝试删除文本文件中以某个字符串开头的每个单词。 I'm stuck on how to write to the output file.
我被困在如何写入输出文件。
Input file:输入文件:
Lorem ipsum applePEAR
dolor appleBANANA sit
appleORANGE amet, consectetur
Desired output file:所需的输出文件:
Lorem ipsum
dolor sit
amet, consectetur
My approach so far:到目前为止我的方法:
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
ls = line.split()
for word in ls():
if word.startswith("apple"):
line.replace(word, "")
fout.write(line)
I think the problem with this approach is replacing words in the line split list, not the line itself.我认为这种方法的问题是替换行拆分列表中的单词,而不是行本身。
Checking Stackoverflow, I see this problem is similar to: using Python for deleting a specific line in a file , except the "nickname_to_delete" is a word that starts with a string.检查 Stackoverflow,我发现这个问题类似于: 使用 Python 删除文件中的特定行,除了“nickname_to_delete”是一个以字符串开头的单词。
I've updated your code as little as I could:我已经尽可能少地更新了你的代码:
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
ls = line.split(" ")
newline = []
for word in ls: # Don't call() the list
if not word.startswith("apple"):
newline.append(word) # Append all words that don't start with apple.
fout.write(" ".join(newline)) # Remake new line
Keep in mind a regex replacement would be better and could take care of "newword,appleshake":请记住,正则表达式替换会更好,并且可以处理“newword,appleshake”:
import re
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
fout.write(re.sub(r"\bapple\w+", "", line))
Punctuation will still suffer with \\w
but you need to choose how to deal with it.标点符号仍然会受到
\\w
但您需要选择如何处理它。
There are a few problems.有几个问题。
ls()
- should be just ls
ls()
- 应该只是ls
line.replace()
(aside from the typo) does not modify the contents of line
- it simply returns a new string, which you are then discardingline.replace()
(除了拼写错误)不会修改line
的内容 - 它只是返回一个新字符串,然后您将丢弃该字符串 Here is an alternative (note limitation: the amount of whitespace between words is not preserved).这是另一种选择(注意限制:不保留单词之间的空格量)。
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
ls = line.split()
words = [word for word in ls if not word.startswith('apple')]
line_out = ' '.join(words)
fout.write(line_out + '\n')
Filter can be also used也可以使用过滤器
word="apple"
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
string_iterable = filter(lambda x:not(x.startswith(word)), line.strip().split())
fout.write(" ".join(string_iterable))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.