繁体   English   中英

在python中使用strip()

[英]using strip() in python

编写函数list_of_words,该函数接受如上所述的字符串列表,并返回删除了所有空格和标点符号的单个单词的列表(撇号/单引号除外)。

我的代码删除句点和空格,但不删除逗号或感叹号。

def list_of_words(list_str):
    m = []
    for i in list_str:
        i.strip('.')
        i.strip(',')
        i.strip('!')
        m = m+i.split()
    return m

print(list_of_words(["Four score and seven years ago, our fathers brought forth on",
  "this continent a new nation, conceived in liberty and dedicated",
  "to the proposition that all men are created equal.  Now we are",
  "   engaged in a great        civil war, testing whether that nation, or any",
  "nation so conceived and so dedicated, can long endure!"])

清除某些标点符号和多个空格的最简单方法之一是使用re.sub函数。

import re

sentence_list = ["Four score and seven years ago, our fathers brought forth on",
                 "this continent a new nation, conceived in liberty and dedicated",
                 "to the proposition that all men are created equal.  Now we are",
                 "   engaged in a great        civil war, testing whether that nation, or any",
                 "nation so conceived and so dedicated, can long endure!"]

sentences = [re.sub('([,.!]){1,}', '', sentence).strip() for sentence in sentence_list]
words = ' '.join([re.sub('([" "]){2,}', ' ', sentence).strip() for sentence in sentences])

print words
"Four score and seven years ago our fathers brought forth on this continent a new nation conceived in liberty and dedicated to the proposition that all men are created equal Now we are engaged in a great civil war testing whether that nation or any nation so conceived and so dedicated can long endure"

strip返回字符串,您应该捕获并应用其余的strips。 因此您的代码应更改为

for i in list_str:
    i = i.strip('.')
    i = i.strip(',')
    i = i.strip('!')
    ....

在第二个音符上, strip仅在字符串的开头和结尾删除提到的字符。 如果要删除字符串之间的字符,则应考虑replace

您可以使用正则表达式,如本问题所述 实质上,

import re

i = re.sub('[.,!]', '', i)

如前所述,您需要将i.strip()分配给i 而且如前所述,replace方法更好。 这是使用replace方法的示例:

def list_of_words(list_str:list)->list:
    m=[]
    for i in list_str:
        i = i.replace('.','')
        i = i.replace(',','')
        i = i.replace('!','')
        m.extend(i.split())
    return m

print(list_of_words([ "Four score and seven years ago, our fathers brought forth on",
  "this continent a new nation, conceived in liberty and dedicated",
  "to the proposition that all men are created equal.  Now we are",
  "   engaged in a great        civil war, testing whether that nation, or any",
  "nation so conceived and so dedicated, can long endure! ])

如您m=m+i.split() ,我还用m.append(i.split())替换了m=m+i.split() m.append(i.split())以使其更易于阅读。

最好不要依赖于自己的标点列表,而要使用python的标点列表,并且当其他具有指针时,请使用regex删除字符:

punctuations = re.sub("[`']", "", string.punctuation)
i = re.sub("[" + punctuations + "]", "", i)

还有string.whitespace ,尽管split确实会为您处理它们。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM