删除以特定字符串开头的每个单词

Question

I'm trying to remove every word that starts with a certain string in a text file.我正在尝试删除文本文件中以某个字符串开头的每个单词。 I'm stuck on how to write to the output file.我被困在如何写入输出文件。

Input file:输入文件：

Lorem ipsum applePEAR
dolor appleBANANA sit 
appleORANGE amet, consectetur

Desired output file:所需的输出文件：

Lorem ipsum 
dolor sit
amet, consectetur

My approach so far:到目前为止我的方法：

with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        ls = line.split()
        for word in ls():
            if word.startswith("apple"):
                line.replace(word, "")
        fout.write(line)

I think the problem with this approach is replacing words in the line split list, not the line itself.我认为这种方法的问题是替换行拆分列表中的单词，而不是行本身。

Checking Stackoverflow, I see this problem is similar to: using Python for deleting a specific line in a file , except the "nickname_to_delete" is a word that starts with a string.检查 Stackoverflow，我发现这个问题类似于：使用 Python 删除文件中的特定行，除了“nickname_to_delete”是一个以字符串开头的单词。

Answer 1

I've updated your code as little as I could:我已经尽可能少地更新了你的代码：

with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        ls = line.split(" ")
        newline = []
        for word in ls:  # Don't call() the list
            if not word.startswith("apple"):
                newline.append(word)  # Append all words that don't start with apple.
        fout.write(" ".join(newline))  # Remake new line

Keep in mind a regex replacement would be better and could take care of "newword,appleshake":请记住，正则表达式替换会更好，并且可以处理“newword,appleshake”：

import re

with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        fout.write(re.sub(r"\bapple\w+", "", line))

Punctuation will still suffer with \\w but you need to choose how to deal with it.标点符号仍然会受到\\w但您需要选择如何处理它。

Answer 2

There are a few problems.有几个问题。

You are calling ls() - should be just ls您正在调用ls() - 应该只是ls
Calling line.replace() (aside from the typo) does not modify the contents of line - it simply returns a new string, which you are then discarding调用line.replace() （除了拼写错误）不会修改line的内容 - 它只是返回一个新字符串，然后您将丢弃该字符串
There is a risk in principle that by doing the replace in this way, you will also delete parts of other words unintentionally - in the line "I like pineapples and apples", the "apples" in "pineapples" would also get deleted ("I like pine and ").原则上这样做存在风险，通过这种方式进行替换，您也会无意中删除部分其他单词 - 在“我喜欢菠萝和苹果”这一行中，“菠萝”中的“苹果”也会被删除（“我喜欢松树和“）。

Here is an alternative (note limitation: the amount of whitespace between words is not preserved).这是另一种选择（注意限制：不保留单词之间的空格量）。

with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        ls = line.split()
        words = [word for word in ls if not word.startswith('apple')]
        line_out = ' '.join(words)
        fout.write(line_out + '\n')

Answer 3

Filter can be also used也可以使用过滤器

word="apple" 
with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        string_iterable = filter(lambda x:not(x.startswith(word)), line.strip().split())
        fout.write(" ".join(string_iterable))

删除以特定字符串开头的每个单词

问题描述

3 个解决方案

解决方案1
1 2020-08-24 03:46:48

解决方案2
1 2020-08-24 03:47:12

解决方案3
0 2020-08-24 03:43:21

删除以特定字符串开头的每个单词

问题描述

3 个解决方案

解决方案1 1 2020-08-24 03:46:48

解决方案2 1 2020-08-24 03:47:12

解决方案3 0 2020-08-24 03:43:21

解决方案1
1 2020-08-24 03:46:48

解决方案2
1 2020-08-24 03:47:12

解决方案3
0 2020-08-24 03:43:21