.clean 和 .strip 长字符串的最佳方法？

Question

Desired Outcome Is = ["This", "is", "a", "random", "sentence"] Desired Outcome Is = ["This", "is", "a", "random", "sentence"]

text = "Th,is is a? random!! sentence..."  # Eddied, added comma inside word 

clean_text = text.split()

for clean in clean_text:

    double_clean_text = clean.strip(",.!?")

    print(double_clean_text)

Managed to clean, but how do I get it all back to list??设法清理，但我如何将其全部恢复到列表中？

Is this is efficient way to do it?这是一种有效的方法吗？

Answer 1

您可以执行以下操作：

print(" ".join([clean.strip(",.!?") for clean in clean_text]))

Answer 2

您可以使用列表理解：

print([t.strip(",.!?") for t in text.split()])

Answer 3

Try this:尝试这个：

clean_text = text.split()
print([clean.strip(",.!?") for clean in clean_text])

OR或者

clean_text = text.split()
res = []
for clean in clean_text:
    double_clean_text = clean.strip(",.!?")
    res.append(double_clean_text)
print(res)

Answer 4

I would recommend you to use regular expression "\\w+" to find all words:我建议您使用正则表达式"\\w+"来查找所有单词：

import re

result = re.findall("\w+", text)

Answer 5

Instead of assigning to a new variable, assign the cleaned result back to the list.不是分配给新变量，而是将清理后的结果分配回列表。

text = "This, is a? random!! sentence..."

clean_text = text.split()

for i, clean in enumerate(clean_text):

    clean_text[i] = clean.strip(",.!?")

Then you can use ' '.join to (mostly) restore the list to its original form:然后您可以使用' '.join来（主要）将列表恢复到其原始形式：

cleaned_text = ' '.join(clean_text)

I say "mostly", because split erases information about how many spaces were removed from the original string, which may be fine, but is worth being aware of.我说“大部分”，因为split会删除有关从原始字符串中删除了多少空格的信息，这可能没问题，但值得注意。

The whole thing can be written using a single list comprehension.整个事情可以使用单个列表理解来编写。

clean_text = ' '.join([clean.strip(",.!?") for clean in text.split()])

Answer 6

Either use re and simply put r'\\w+' greedily captures all alpha characters.要么使用re并简单地放置r'\\w+'贪婪地捕获所有字母字符。

>>> import re
>>> text = "This, is a? random!! sentence..."
>>> re.findall(r'\w+', text)
['This', 'is', 'a', 'random', 'sentence']

Or you could use str.strip and str.split and an easy way to supply all punctuation to strip is using string.punctuation .或者你可以使用str.strip和str.split并且提供所有标点符号的简单方法是使用string.punctuation 。 This will split the text by whitespace then remove all punctuation from each sub string.这将按空格拆分文本，然后从每个子字符串中删除所有标点符号。

>>> from string import punctuation
>>> text = "This, is a? random!! sentence..."
>>> [s.strip(punctuation) for s in text.split()]
['This', 'is', 'a', 'random', 'sentence']

Answer 7

Since you already good pretty good answers, I'd like to introduce regular expressions既然你已经很好很好的答案，我想介绍正则表达式

import re
text = "This, is a? random!! sentence..."
clean_list = re.split('[.,?! ]+', text)

Where the chars inside the square brackets are the chars you want to split by and strip方括号内的字符是您要拆分和剥离的字符

.clean 和 .strip 长字符串的最佳方法？

问题描述

7 个解决方案

解决方案1
2 2020-02-06 13:46:13

解决方案2
2 2020-02-06 13:48:02

解决方案3
0 2020-02-06 13:45:42

解决方案4
0 2020-02-06 13:48:43

解决方案5
0 2020-02-06 13:48:53

解决方案6
0 2020-02-06 13:56:38

解决方案7
0 2020-02-06 13:59:48

.clean 和 .strip 长字符串的最佳方法？

问题描述

7 个解决方案

解决方案1 2 2020-02-06 13:46:13

解决方案2 2 2020-02-06 13:48:02

解决方案3 0 2020-02-06 13:45:42

解决方案4 0 2020-02-06 13:48:43

解决方案5 0 2020-02-06 13:48:53

解决方案6 0 2020-02-06 13:56:38

解决方案7 0 2020-02-06 13:59:48

解决方案1
2 2020-02-06 13:46:13

解决方案2
2 2020-02-06 13:48:02

解决方案3
0 2020-02-06 13:45:42

解决方案4
0 2020-02-06 13:48:43

解决方案5
0 2020-02-06 13:48:53

解决方案6
0 2020-02-06 13:56:38

解决方案7
0 2020-02-06 13:59:48