简体   繁体   English

如何获取歌词歌曲中最常用的 50 个单词(Python)

[英]How to get 50 most common words in lyrics songs (Python)

I am new in python and I am trying to return the most 50 commons word's in a lyrics of songs and I have a problem that I don't really understand why it's happens.我是 python 的新手,我正在尝试返回歌曲歌词中最多的 50 个常用词,但我有一个问题,我真的不明白为什么会发生这种情况。

the name "lyrics" in the code is a string of the song lyrics from a text file.代码中的名称“lyrics”是来自文本文件的歌词字符串。 every iteration of the loop is different string of lyrics that I need to include in the total of how much word are shows up in the songs.循环的每次迭代都是不同的歌词字符串,我需要将它们包含在歌曲中出现的单词总数中。

if someone know where the problem is and can help it would be very nice.如果有人知道问题出在哪里并且可以提供帮助,那就太好了。

my output is not with words is in characters: "[(' ', 46), ('o', 24), ('e', 23), ('n', 15), ('t', 15), ('h', 12), ('a', 12), ('w', 8), ('r', 8), ('s', 8), ('\n', 7), ('f', 7), ('d', 6), ('u', 5), ('y', 5), ('m', 5), ('I', 4)..." and i need to get something like: ("the", 555), ("you", 365)... without include white spaces and \n我的 output 不带字是字符:“[('', 46), ('o', 24), ('e', 23), ('n', 15), ('t', 15) , ('h', 12), ('a', 12), ('w', 8), ('r', 8), ('s', 8), ('\n', 7), ('f', 7), ('d', 6), ('u', 5), ('y', 5), ('m', 5), ('I', 4)... " 我需要得到类似: ("the", 555), ("you", 365)... 不包括空格和 \n

    count = {}
    for songs in the_dict.values():
        songs = songs[0]
        for lyrics in songs.values():
            lyrics = lyrics[2]
            count = Counter(lyrics)
    return count.most_common(50)

Call the split() method on the lyrics before counting:在计算之前调用lyricssplit()方法:

split_lyrics = lyrics[2].split()
count = Counter(split_lyrics)

see https://www.geeksforgeeks.org/find-k-frequent-words-data-set-python/https://www.geeksforgeeks.org/find-k-frequent-words-data-set-python/

You should split your lyrics at every whitespace and newline, so that you get an array of words (instead of parsing in the string immediately like you do now).您应该在每个空格和换行符处拆分歌词,以便获得一个单词数组(而不是像现在那样立即解析字符串)。

So you should use所以你应该使用

lyrics = lyrics[2].split()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM