简体   繁体   English

Python 按字长计算的字数

[英]Python count of words by word length

I was given a.txt file with a text.我得到了一个带有文本的 .txt 文件。 I have already cleaned the text (removed punctuation, uppercase, symbols), and now I have a string with the words.我已经清理了文本(删除了标点符号、大写字母、符号),现在我有了一个带有单词的字符串。 I am now trying to get the count of characters len() of each item on the string.我现在正在尝试获取字符串上每个项目的字符数len() Then make a plot where N of characters is on the X-axis and the Y-axis is the number of words that have such N len() of characters然后制作一个 plot ,其中 N 个字符在 X 轴上,Y 轴是具有这样 N len()个字符的单词数

So far I have:到目前为止,我有:

text = "sample.txt"

def count_chars(txt):
    result = 0
    for char in txt:
        result += 1     # same as result = result + 1
    return result

print(count_chars(text))

So far this is looking for the total len() of the text instead of by word.到目前为止,这是在寻找文本的总len()而不是按单词。

I would like to get something like the function Counter Counter() this returns the word with the count of how many times it repeated throughout the text.我想得到像 function Counter Counter()这样的东西,它返回带有在整个文本中重复次数的单词。

from collections import Counter
word_count=Counter(text)

I want to get the # of characters per word.我想获得每个单词的字符数。 Once we have such a count the plotting should be easier.一旦我们有了这样的计数,绘图应该会更容易。

Thanks and anything helps!谢谢,有什么帮助!

Okay, first of all you need to open the sample.txt file.好的,首先您需要打开sample.txt文件。

with open('sample.txt', 'r') as text_file:
    text = text_file.read()

or或者

text = open('sample.txt', 'r').read()

Now we can count the words in the text and put it, for example, in a dict.现在我们可以计算文本中的单词并将其放入例如字典中。

counter_dict = {}
for word in text.split(" "):
    counter_dict[word] = len(word)
print(counter_dict)

It looks like the accepted answer doesn't solve the problem as it was posed by the querent看起来接受的答案并不能解决问题,因为它是由提问者提出的

Then make a plot where N of characters is on the X-axis and the Y-axis is the number of words that have such N len() of characters然后制作一个 plot ,其中 N 个字符在 X 轴上,Y 轴是具有 N len() 个字符的单词数

import matplotlib.pyplot as plt

# ch10 = ... the text of "Moby Dick"'s chapter 10, as found
# in https://www.gutenberg.org/files/2701/2701-h/2701-h.htm

# split chap10 into a list of words,
words = [w for w in ch10.split() if w]
# some words are joined by an em-dash
words = sum((w.split('—') for w in words), [])
# remove suffixes and one prefix
for suffix in (',','.',':',';','!','?','"'):
    words = [w.removesuffix(suffix) for w in words]
words = [w.removeprefix('"') for w in words]

# count the different lenghts using a dict
d = {}
for w in words:
    l = len(w)
    d[l] = d.get(l, 0) + 1

# retrieve the relevant info from the dict 
lenghts, counts = zip(*d.items())

# plot the relevant info
plt.bar(lenghts, counts)
plt.xticks(range(1, max(lenghts)+1))
plt.xlabel('Word lengths')
plt.ylabel('Word counts')
# what is the longest word?
plt.title(' '.join(w for w in words if len(w)==max(lenghts)))

# T H E   E N D

plt.show()

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用于单词计数,平均单词长度,单词频率和以字母开头的单词频率的Python程序 - Python program for word count, average word length, word frequency and frequency of words starting with letters of the alphabet 如何计算python中每个不同长度的单词? - How to count words for each different length in python? 在 python 中创建一个列表列表,其中包含单词的长度和计数 - Create a list of lists in python with length and count for words 来自文本文件的Python字长计数 - Python word length count from a text file 如何使用 python 计算句子中单词的长度 - How to count the length of a word in a sentence using python 将带撇号的单词计为一个单词,但返回两个单词(python) - Count the word with apostrophe as one word BUT returns two pieces of words (python) 计算独特的单词并用Python创建单词和计数字典 - Count unique words and create dict with word and count in Python Python:使用.isalpha() 计算字数中的特定单词/字符 - Python: Using .isalpha() to count specific words/characters in a word count 计算Python中句子使用的词数和平均词长[暂停] - Calculate the number of words and average word length used in the sentence in Python [on hold] python熊猫在单词中乘以复数“ s”以准备单词计数 - python pandas get ride of plural “s” in words to prepare for word count
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM