计算列表中字符串中元素的出现次数？

Question

I'm trying to count the number of occurrences of verbal contractions in some speeches I've gathered. 我试图在我收集到的一些讲话中计算发生口头收缩的次数。 One particular speech looks like this: 一个特定的演讲如下：

speech = "I've changed the path of the economy, and I've increased jobs in our own
home state. We're headed in the right direction - you've all been a great help."

So, in this case, I'd like to count four (4) contractions. 因此，在这种情况下，我想计算四（4）次收缩。 I have a list of contractions, and here are some of the first few terms: 我有一个宫缩清单，以下是前几个名词：

contractions = {"ain't": "am not; are not; is not; has not; have not",
"aren't": "are not; am not",
"can't": "cannot",...}

My code looks something like this, to begin with: 我的代码如下所示：

count = 0
for word in speech:
    if word in contractions:
        count = count + 1
print count

I'm not getting anywhere with this, however, as the code's iterating over every single letter, as opposed to whole words. 但是，我对此一无所知，因为代码遍历每个字母而不是整个单词。

Answer 1

Use str.split() to split your string on whitespace: 使用str.split()在str.split()分割字符串：

for word in speech.split():

This will split on arbitrary whitespace ; 这将在任意空格上分割; this means spaces, tabs, newlines, and a few more exotic whitespace characters, and any number of them in a row. 这意味着空格，制表符，换行符以及其他一些奇异的空白字符，并且它们可以连续任意数量。

You may need to lowercase your words using str.lower() (otherwise Ain't won't be found, for example), and strip punctuation: 您可能需要使用小写你的话str.lower()否则Ain't不会被发现，例如），并去掉标点符号：

from string import punctuation

count = 0
for word in speech.lower().split():
    word = word.strip(punctuation)
    if word in contractions:
        count += 1

I use the str.strip() method here; 我在这里使用str.strip()方法； it removes everything found in the string.punctuation string from the start and end of a word. 它将从单词的开头和结尾删除在string.punctuation字符串中找到的所有内容。

Answer 2

You're iterating over a string. 您正在遍历字符串。 So the items are characters. 因此，项目是字符。 To get the words from a string you can use naive methods like str.split() that makes this for you (now you can iterate over a list of strings (the words splitted on the argument of str.split(), default: split on whitespace). There is even re.split() , which is more powerful. But I don't think that you need splitting the text with regexes. 要从字符串中获取单词，您可以使用诸如str.split()这样的天真的方法（现在您可以遍历字符串列表（在str.split（）的参数上拆分的单词，默认为split）在空白上），甚至还有re.split() ，其功能更强大。但是我认为您不需要使用正则表达式来分割文本。

What you have to do at least is to lowercase your string with str.lower() or to put all possible occurences (also with capital letters) in the dictionary. 您至少要做的是使用str.lower()小写字符串或将所有可能出现的内容（也包括大写字母）放入字典中。 I strongly recommending the first alternative. 我强烈建议第一种选择。 The latter isn't really practicable. 后者并不切实可行。 Removing the punctuation is also a duty for this. 删除标点符号也是为此的责任。 But this is still naive. 但这仍然很幼稚。 If you're need a more sophisticated method, you have to split the text via a word tokenizer. 如果您需要更复杂的方法，则必须通过单词标记器拆分文本。 NLTK is a good starting point for that, see the nltk tokenizer . NLTK是一个很好的起点，请参阅nltk标记器。 But I strongly feel that this problem is not your major one or affects you really in solving your question. 但是我强烈认为这个问题不是您的主要问题，或者确实会影响您解决问题。 :) :)

speech = """I've changed the path of the economy, and I've increased jobs in our own home state. We're headed in the right direction - you've all been a great help."""
# Maybe this dict makes more sense (list items as values). But for your question it doesn't matter.
contractions = {"ain't": ["am not", "are not", "is not", "has not", "have not"], "aren't": ["are not", "am not"], "i've": ["i have", ]} # ...

# with re you can define advanced regexes, but maybe
# from string import punctuation (suggestion from Martijn Pieters answer
# is still enough for you)
import re

def abbreviation_counter(input_text, abbreviation_dict):   
    count = 0
    # what you want is a list of words. str.split() does this job for you.
    # " " is default and you can also omit this. But if you really need better
    # methods (see answer text abover), you have to take a word tokenizer tool
    # or have to write your own.
    for word in input_text.split(" "):
        # and also clean word (remove ',', ';', ...) afterwards. The advantage of 
        # using re over `from string import punctuation` is that you have more
        # control in what you want to remove. That means that you can add or
        # remove easily any punctuation mark. It could be very handy. It could be
        # also overpowered. If the latter is the case, just stick to Martijn Pieters
        # solution.
        if re.sub(',|;', '', word).lower() in abbreviation_dict:
            count += 1

    return count

print abbrev_counter(speech, contractions)
2 # yeah, it worked - I've included I've in your list :)

It's a litte bit frustrating to give an answer at the same time as Martijn Pieters does ;), but I hope I still have generated some values for you. 像Martijn Pieters一样，同时给出答案有点令人沮丧；），但我希望我仍然为您带来了一些价值。 That's why I've edited my question to give you some hints for future work in addition. 因此，我修改了问题，为您提供一些进一步的建议。

Answer 3

A for loop in Python iterates over all elements in an iterable. Python中的for循环迭代可迭代对象中的所有元素。 In the case of strings the elements are the characters. 对于字符串，元素是字符。

You need to split the string into a list (or tuple) of strings that contain the words. 您需要将字符串拆分为包含单词的字符串列表（或元组）。 You can use .split(delimiter) for this. 您可以为此使用.split(delimiter) 。

Your problem is quite common, so Python has a shortcut: speech.split() splits at any number of spaces/tabs/newlines, so you only get your words in the list. 您的问题很普遍，因此Python有一个快捷方式： speech.split()在任意数量的空格/制表符/换行符之间进行拆分，因此您只能在列表中使用单词。

So your code should look like this: 因此，您的代码应如下所示：

count = 0
for word in speech.split():
    if word in contractions:
        count = count + 1
print(count)

speech.split(" ") works too, but only splits on whitespaces but not tabs or newlines and if there are double spaces you'd get empty elements in your resulting list. speech.split(" ")也可以，但是只能在空格上分割，而不能在制表符或换行符上分割，如果有双倍空格，则结果列表中将出现空元素。

计算列表中字符串中元素的出现次数？

问题描述

3 个解决方案

解决方案1
5 已采纳 2015-10-06 20:28:23

解决方案2
1 2015-10-06 20:24:21

解决方案3
0 2015-10-06 20:38:18

计算列表中字符串中元素的出现次数？

问题描述

3 个解决方案

解决方案1 5 已采纳 2015-10-06 20:28:23

解决方案2 1 2015-10-06 20:24:21

解决方案3 0 2015-10-06 20:38:18

解决方案1
5 已采纳 2015-10-06 20:28:23

解决方案2
1 2015-10-06 20:24:21

解决方案3
0 2015-10-06 20:38:18