计算列表中字符串中元素的出现次数？

Question

我试图在我收集到的一些讲话中计算发生口头收缩的次数。 一个特定的演讲如下：

speech = "I've changed the path of the economy, and I've increased jobs in our own
home state. We're headed in the right direction - you've all been a great help."

因此，在这种情况下，我想计算四（4）次收缩。 我有一个宫缩清单，以下是前几个名词：

contractions = {"ain't": "am not; are not; is not; has not; have not",
"aren't": "are not; am not",
"can't": "cannot",...}

我的代码如下所示：

count = 0
for word in speech:
    if word in contractions:
        count = count + 1
print count

但是，我对此一无所知，因为代码遍历每个字母而不是整个单词。

Answer 1

使用str.split()在str.split()分割字符串：

for word in speech.split():

这将在任意空格上分割; 这意味着空格，制表符，换行符以及其他一些奇异的空白字符，并且它们可以连续任意数量。

您可能需要使用小写你的话str.lower()否则Ain't不会被发现，例如），并去掉标点符号：

from string import punctuation

count = 0
for word in speech.lower().split():
    word = word.strip(punctuation)
    if word in contractions:
        count += 1

我在这里使用str.strip()方法； 它将从单词的开头和结尾删除在string.punctuation字符串中找到的所有内容。

Answer 2

您正在遍历字符串。 因此，项目是字符。 要从字符串中获取单词，您可以使用诸如str.split()这样的天真的方法（现在您可以遍历字符串列表（在str.split（）的参数上拆分的单词，默认为split）在空白上），甚至还有re.split() ，其功能更强大。但是我认为您不需要使用正则表达式来分割文本。

您至少要做的是使用str.lower()小写字符串或将所有可能出现的内容（也包括大写字母）放入字典中。 我强烈建议第一种选择。 后者并不切实可行。 删除标点符号也是为此的责任。 但这仍然很幼稚。 如果您需要更复杂的方法，则必须通过单词标记器拆分文本。 NLTK是一个很好的起点，请参阅nltk标记器。 但是我强烈认为这个问题不是您的主要问题，或者确实会影响您解决问题。 :)

speech = """I've changed the path of the economy, and I've increased jobs in our own home state. We're headed in the right direction - you've all been a great help."""
# Maybe this dict makes more sense (list items as values). But for your question it doesn't matter.
contractions = {"ain't": ["am not", "are not", "is not", "has not", "have not"], "aren't": ["are not", "am not"], "i've": ["i have", ]} # ...

# with re you can define advanced regexes, but maybe
# from string import punctuation (suggestion from Martijn Pieters answer
# is still enough for you)
import re

def abbreviation_counter(input_text, abbreviation_dict):   
    count = 0
    # what you want is a list of words. str.split() does this job for you.
    # " " is default and you can also omit this. But if you really need better
    # methods (see answer text abover), you have to take a word tokenizer tool
    # or have to write your own.
    for word in input_text.split(" "):
        # and also clean word (remove ',', ';', ...) afterwards. The advantage of 
        # using re over `from string import punctuation` is that you have more
        # control in what you want to remove. That means that you can add or
        # remove easily any punctuation mark. It could be very handy. It could be
        # also overpowered. If the latter is the case, just stick to Martijn Pieters
        # solution.
        if re.sub(',|;', '', word).lower() in abbreviation_dict:
            count += 1

    return count

print abbrev_counter(speech, contractions)
2 # yeah, it worked - I've included I've in your list :)

像Martijn Pieters一样，同时给出答案有点令人沮丧；），但我希望我仍然为您带来了一些价值。 因此，我修改了问题，为您提供一些进一步的建议。

Answer 3

Python中的for循环迭代可迭代对象中的所有元素。 对于字符串，元素是字符。

您需要将字符串拆分为包含单词的字符串列表（或元组）。 您可以为此使用.split(delimiter) 。

您的问题很普遍，因此Python有一个快捷方式： speech.split()在任意数量的空格/制表符/换行符之间进行拆分，因此您只能在列表中使用单词。

因此，您的代码应如下所示：

count = 0
for word in speech.split():
    if word in contractions:
        count = count + 1
print(count)

speech.split(" ")也可以，但是只能在空格上分割，而不能在制表符或换行符上分割，如果有双倍空格，则结果列表中将出现空元素。

计算列表中字符串中元素的出现次数？

问题描述

3 个解决方案

解决方案1
5 已采纳 2015-10-06 20:28:23

解决方案2
1 2015-10-06 20:24:21

解决方案3
0 2015-10-06 20:38:18

计算列表中字符串中元素的出现次数？

问题描述

3 个解决方案

解决方案1 5 已采纳 2015-10-06 20:28:23

解决方案2 1 2015-10-06 20:24:21

解决方案3 0 2015-10-06 20:38:18

解决方案1
5 已采纳 2015-10-06 20:28:23

解决方案2
1 2015-10-06 20:24:21

解决方案3
0 2015-10-06 20:38:18