繁体   English   中英

如何找到相对词频?

[英]How can I find relative word frequencies?

这是我的代码:

for question in questions:
    print('Processing ' + str( question))
    counts = Counter(dataset_final[str(question)])
    print(counts)

打印出类似以下内容的内容:

Processing 1
Counter({'would': 18, 'think': 12, 'patient': 11, 'condition': 11, 'might': 10, 'increased': 1})

Processing 2
Counter({'cancer': 32, 'condition': 22, 'prostate': 20, 'educational': 1})

我想获得相对的词频,所以我想做类似的事情:

for question in questions:
    print("Processing " + str(question))
    counts = Counter(dataset_final[str(question)])
    length = len(dataset_final[str(question)])
    print(counts/length)

但是我得到一个错误:

TypeError: unsupported operand type(s) for /: 'Counter' and 'int'

我怎样才能做到这一点?

编辑:我的意思是相对词频,不规范

让:

 count = collections.Counter({'would': 18, 'think': 12, 'patient': 11, 'condition': 11, 'might': 10, 'increased': 1})

您可以使用列表理解来标准化值:

 normalized_count = {w:c/sum(count.values()) for w,c in count.items()}

每个单词计数的数量将除以单词总数。

输出:

{'would': 0.2857142857142857, 'think': 0.19047619047619047, 'patient': 0.1746031746031746, 'condition': 0.1746031746031746, 'might': 0.15873015873015872, 'increased': 0.015873015873015872}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM