使用python查找字符串中连接词的出现次数

Question

I have a text where all the words are tagged with "parts of speech" tags. 我有一个文本，其中所有单词都标有“词性”标签。 example of the text here: 这里的文字示例：

What/NOUN could/VERB happen/VERB next/ADJ ?/PUNCT 什么/ NOUN可以/ VERB发生/ VERB next / ADJ？/ PUNCT

I need to find all the occurrences where there is a /PUNCT followed by either NOUN , PRON or PROPN - and also count which one occurs the most often. 我需要找到所有出现的地方有一个/PUNCT后接NOUN ， PRON或PROPN -也算其中一个出现得最频繁。

So one of the answers would appear like this: ?/PUNCT What/NOUN or ./PUNCT What/NOUN 所以其中一个答案如下所示： ?/PUNCT What/NOUN ./PUNCT What/NOUN或./PUNCT What/NOUN

Further on the word "Deal" appears 6 times, and I need to show this by code. 此外，“交易”一词出现了6次，我需要通过代码来显示。

I am not allowed to use NLTK, only Collections. 我不允许使用NLTK，只允许使用集合。

Tried several different things, but don't really know what to do here. 尝试了几个不同的东西，但不知道该怎么做。 I think I need to use defaultdict, and then somehow do a while loop, that gives me back a list with the right connectives. 我想我需要使用defaultdict，然后以某种方式做一个while循环，这给了我一个带有正确连接词的列表。

Answer 1

Here is a test program that does what you want. 这是一个测试程序，可以满足您的需求。

It first splits the long string by spaces ' ' which creates a list of word/class elements. 它首先用空格' '拆分长字符串' ' ，它创建一个单词/类元素列表。 The for loop then check if the combination of PUNCT followed by NOUN, PRON, or PROPN occurs and saves that to a list. 然后for循环检查PUNCT后跟NOUN，PRON或PROPN的组合是否发生并将其保存到列表中。

The code is as follows: 代码如下：

from collections import Counter
string = "What/NOUN could/VERB happen/VERB next/ADJ ?/PUNCT What/NOUN could/VERB happen/VERB next/ADJ ?/PUNCT"
words = string.split(' ')

found = []

for n, (first, second) in enumerate(zip(words[:-1], words[1:])):
    first_class = first.split('/')[1]
    second_class = second.split('/')[1]
    if first_class == 'PUNCT' and second_class in ["NOUN", "PRON", "PROPN"]:
        print(f"Found occurence at data list index {n} and {n+1} with {first_class}, {second_class}")
        found.append(f'{words[n]} {words[n+1]}')

To count the words: 计算单词：

words_only = [i.split('/')[0] for i in words]
word_counts = Counter(words_only).most_common()

使用python查找字符串中连接词的出现次数

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-03-25 10:42:01

使用python查找字符串中连接词的出现次数

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-03-25 10:42:01

解决方案1
0 已采纳 2019-03-25 10:42:01