计算文件中的单词/位置

Question

I've almost solved an exercise from my python lectures. 我的python讲座几乎解决了一个练习。 I've been asked to write a program that counts how often each word occurs in a file, and how often it has been tagged with which POS. 我被要求写一个程序来计算每个单词在文件中出现的频率，以及它被哪个POS标记的频率。 The counts should be written to a new file, which is also given on the command line. 计数应写入一个新文件，该文件也在命令行中给出。

For instance, 例如，

python3 wordcount-pos.py wsj00-pos.txt counts-wsj00-pos.txt

should produce a output like this: 应该产生这样的输出：

   Mortimer 1   NNP 1

   foul 1   JJ  1

   ...

   reported 16  VBN 7   VBD 9

   ...

   before   26  RB  6   IN  20

   ...

   allow    4   VB  2   VBP 2

My code produces an output such as: 我的代码产生如下输出：

   Mortimer 1   {NNP:   1}

   foul 1   {JJ: 1}

   ...

   reported 2   {VBN:   7   VBD:    9}

   ...

   before   2   {RB:    6   IN: 20}

   ...

   allow    2   {VB:    2   VBP:    2}

It doesn't print the occurrences of "word" in my dictionary 它不会在我的词典中显示“单词”的出现

Here it is my code: 这是我的代码：

import sys
from collections import defaultdict


def main():
    if len(sys.argv) != 3:
        print('Usage: python poscount.py <input file>', file=sys.stderr)
        sys.exit(1)

    input_filename = sys.argv[1]
    output_filename = sys.argv[2]
    # your code
    freq = defaultdict(list)
    with open(input_filename) as f:
        for line in f:
            # skip empty lines
            if line.strip() != '':
                #  split a word/pos pair into two separate strings
                word, pos = line.strip().rsplit("/", 1)
                # add word and list of pos as k, v into "freq" dictionary
                freq[word].append(pos)

    for k, v in freq.items():
        D = defaultdict(list)
        for i, item in enumerate(v):
            D[item].append(i)
        D = {k: len(v) for k, v in D.items()}
        # Output file
        with open(output_filename, "a") as f:
            print(k + "\t" + str(len(D.items())) + "\t" + str(D), file=f)


if __name__ == '__main__':
    main()

file from where extract the data: https://paste.elnota.space/nezemivaku.sql 从中提取数据的文件： https : //paste.elnota.space/nezemivaku.sql

Partial content of the file: 文件的部分内容：

Pierre/NNP 皮埃尔/ NNP

Vinken/NNP 温肯/ NNP

,/, ，/，

61/CD 61 / CD

years/NNS 年/ NNS

old/JJ 老/ JJ

,/, ，/，

will/MD 意志/ MD

join/VB 加入/ VB

the/DT / DT

board/NN 板/ NN

as/IN as / IN

a/DT / DT

nonexecutive/JJ 非执行/ JJ

director/NN 导演/ NN

Nov./NNP 11月/ NNP

29/CD 29 /张

./. ./。

Mr./NNP Vinken/NNP 先生/ NNP温肯/ NNP

is/VBZ 是/ VBZ

chairman/NN 董事长/ NN

Answer 1

我认为这可以解决您的问题

print(k + "\t" + str(sum(D.values())) + "\t" + str(D), file=f)

计算文件中的单词/位置

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-07-31 19:08:19

计算文件中的单词/位置

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-07-31 19:08:19

解决方案1
2 已采纳 2019-07-31 19:08:19