简体   繁体   English

删除以特定字符串开头的每个单词

[英]Remove every word that starts with a certain string

I'm trying to remove every word that starts with a certain string in a text file.我正在尝试删除文本文件中以某个字符串开头的每个单词。 I'm stuck on how to write to the output file.我被困在如何写入输出文件。

Input file:输入文件:

Lorem ipsum applePEAR
dolor appleBANANA sit 
appleORANGE amet, consectetur

Desired output file:所需的输出文件:

Lorem ipsum 
dolor sit
amet, consectetur

My approach so far:到目前为止我的方法:

with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        ls = line.split()
        for word in ls():
            if word.startswith("apple"):
                line.replace(word, "")
        fout.write(line)

I think the problem with this approach is replacing words in the line split list, not the line itself.我认为这种方法的问题是替换行拆分列表中的单词,而不是行本身。

Checking Stackoverflow, I see this problem is similar to: using Python for deleting a specific line in a file , except the "nickname_to_delete" is a word that starts with a string.检查 Stackoverflow,我发现这个问题类似于: 使用 Python 删除文件中的特定行,除了“nickname_to_delete”是一个以字符串开头的单词。

I've updated your code as little as I could:我已经尽可能少地更新了你的代码:

with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        ls = line.split(" ")
        newline = []
        for word in ls:  # Don't call() the list
            if not word.startswith("apple"):
                newline.append(word)  # Append all words that don't start with apple.
        fout.write(" ".join(newline))  # Remake new line

Keep in mind a regex replacement would be better and could take care of "newword,appleshake":请记住,正则表达式替换会更好,并且可以处理“newword,appleshake”:

import re

with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        fout.write(re.sub(r"\bapple\w+", "", line))

Punctuation will still suffer with \\w but you need to choose how to deal with it.标点符号仍然会受到\\w但您需要选择如何处理它。

There are a few problems.有几个问题。

  • You are calling ls() - should be just ls您正在调用ls() - 应该只是ls
  • Calling line.replace() (aside from the typo) does not modify the contents of line - it simply returns a new string, which you are then discarding调用line.replace() (除了拼写错误)不会修改line的内容 - 它只是返回一个新字符串,然后您将丢弃该字符串
  • There is a risk in principle that by doing the replace in this way, you will also delete parts of other words unintentionally - in the line "I like pineapples and apples", the "apples" in "pineapples" would also get deleted ("I like pine and ").原则上这样做存在风险,通过这种方式进行替换,您也会无意中删除部分其他单词 - 在“我喜欢菠萝和苹果”这一行中,“菠萝”中的“苹果”也会被删除(“我喜欢松树和“)。

Here is an alternative (note limitation: the amount of whitespace between words is not preserved).这是另一种选择(注意限制:不保留单词之间的空格量)。

with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        ls = line.split()
        words = [word for word in ls if not word.startswith('apple')]
        line_out = ' '.join(words)
        fout.write(line_out + '\n')

Filter can be also used也可以使用过滤器

word="apple" 
with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        string_iterable = filter(lambda x:not(x.startswith(word)), line.strip().split())
        fout.write(" ".join(string_iterable))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从字符串中删除某些单词 - Remove certain word from string 如果单词以某个字符开头,该如何返回字符串中的单词? (蟒蛇) - How to return a word in a string if it starts with a certain character? (Python) Pandas 正则表达式:将名称与以单词或字符串开头并以某些单词结尾的字符串分开 - Pandas Regex: Separate name from string that starts with word or start of string, and ends in certain words 如何从字符串中每个单词的末尾删除特殊字符? - how remove special characters from the end of every word in a string? 在以某个单词开头的路径中查找最新文件夹 - Find Latest Folder in a Path that starts with a Certain word 如何从以某个字符开头的单词列表中找到一个随机单词? - How to find a random word from a word list that starts with a certain character? 查找字符串是否以相同的单词开头和结尾 - find whether the string starts and ends with the same word 熊猫:读取以特定字符串开头的跳过行 - Pandas: Read skipping lines that starts with a certain string 如何删除字符串中某个单词的实例和可能的多个实例并返回一个字符串 (CODEWARS dubstep) - how to remove instances and possible multiple instances of a certain word in a string and return a string (CODEWARS dubstep) 检查单词是否以某些前缀开头的最有效方法是什么? - What is the most efficient way to check if a word starts with certain prefixes?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM