简体   繁体   English

Python从字符串中删除子字符串

[英]Python removing substrings from strings

I'm trying to remove some substrings from a string in a csv file. 我正在尝试从csv文件中的字符串中删除一些子字符串。

   import csv
   import string

   input_file = open('in.csv', 'r')
   output_file = open('out.csv', 'w')
   data = csv.reader(input_file)
   writer = csv.writer(output_file,quoting=csv.QUOTE_ALL)# dialect='excel')
   specials = ("i'm", "hello", "bye")

   for line in data:
     line = str(line)
     new_line = str.replace(line,specials,'')
     writer.writerow(new_line.split(','))

    input_file.close()
    output_file.close()

So for this example: 因此,对于此示例:

 hello. I'm obviously over the moon. If I am being honest I didn't think I'd get picked, so to get picked is obviously a big thing.  bye.

I'd want the output to be: 我希望输出为:

obviously over the moon. If I am being honest I didn't think I'd get picked, so to get picked is obviously a big thing.

This however only works when im searching for a single word. 但是,这仅在即时消息搜索单个单词时有效。 So that specials = "I'm" for example. 因此,例如“ Special =“ I'm”。 Do I need to add my words to a list or an array? 我需要将单词添加到列表或数组中吗?

It seems like you're already splitting the input via the csv.reader , but then you're throwing away all that goodness by turning the split line back into a string. 似乎您已经通过csv.reader分割了输入,但是随后您将分割线改回了字符串,从而丢掉了所有的好处。 It's best not to do this, but to keep working with the lists that are yielded from the csv reader. 最好不要这样做,而要继续使用csv阅读器生成的列表。 So, it becomes something like this: 因此,它变成了这样的东西:

for row in data:
    new_row = []  # A place to hold the processed row data.

    # look at each field in the row.
    for field in row:

        # remove all the special words.
        new_field = field
        for s in specials:
            new_field = new_field.replace(s, '')

        # add the sanitized field to the new "processed" row.
        new_row.append(new_field)

    # after all fields are processed, write it with the csv writer.
    writer.writerow(new_row)

It looks like you aren't iterating through specials, since it's a tuple rather than a list, so it's only grabbing one of the values. 看起来您没有在遍历特殊项目,因为它是一个元组而不是一个列表,因此它只是获取其中一个值。 Try this: 尝试这个:

specials = ["i'm, "hello", "bye"]

for line in data:
     new_line = str(line)
         for word in specials:
              new_line = str.replace(new_line, word, '')
     writer.writerow(new_line.split(','))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM