简体   繁体   English

Python wordcloud 中 generate_from_frequencies 方法所需的元组数组

[英]Array of tuples necessary for generate_from_frequencies method in Python wordcloud

I am trying to make a word cloud in Python from the significance of strings and their corresponding data values in an Excel document.我正在尝试根据字符串的重要性及其在 Excel 文档中的相应数据值在 Python 中制作一个词云。 The generate_from_frequencies method takes a frequencies parameter which the docs say is supposed to take an array of tuples. generate_from_frequencies 方法采用频率参数,文档说该参数应该采用元组数组。

Partial code from wordcloud source code :来自wordcloud源代码的部分代码

def generate_from_frequencies(self, frequencies):
    """Create a word_cloud from words and frequencies.
    Parameters
    ----------
    frequencies : array of tuples
        A tuple contains the word and its frequency.
    Returns
    -------
    self
    """
    # make sure frequencies are sorted and normalized
    frequencies = sorted(frequencies, key=item1, reverse=True)
    frequencies = frequencies[:self.max_words]
    # largest entry will be 1
    max_frequency = float(frequencies[0][1])

    frequencies = [(word, freq / max_frequency) for word, freq in frequencies]

I tried using a regular list, then I tried a ndarray from numpy, but PyCharm shows warnings that the parameter type should be array.py, which I read is only supposed to take characters, integers, and floating point numbers ( array.py docs ):我尝试使用常规列表,然后尝试使用 numpy 中的 ndarray,但 PyCharm 显示参数类型应为 array.py 的警告,我读取的参数类型仅应采用字符、整数和浮点数( array.py 文档):

This module defines an object type which can compactly represent an array of basic values: characters, integers, floating point numbers.该模块定义了一个 object 类型,它可以紧凑地表示一组基本值:字符、整数、浮点数。

My test code:我的测试代码:

import os
import numpy
import wordcloud

d = os.path.dirname(__file__)
cloud = wordcloud.WordCloud()
array = numpy.array([("hi", 6), ("seven"), 17])
cloud.generate_from_frequencies(array)  # <= what should go in the parentheses

If I run the code above despite the PyCharm warning, I get the following error, which I suppose is another way of telling me that it can't accept the ndarray type:如果我不顾 PyCharm 警告运行上面的代码,我会收到以下错误,我想这是告诉我它不能接受 ndarray 类型的另一种方式:

  File "C:/Users/Caitlin/Documents/BioDataSorter/tag_cloud_test.py", line 8, in <module>
cloud.generate_from_frequencies(array)  # <= what should go in the parentheses
  File "C:\Python34\lib\site-packages\wordcloud\wordcloud.py", line 263, in generate_from_frequencies
frequencies = sorted(frequencies, key=item1, reverse=True)
TypeError: 'int' object is not subscriptable

Another potential problem could be that wordcloud was written in Python 2 but I am using Python 3.4, which may have rendered some of the code unusable.另一个潜在的问题可能是 wordcloud 是用 Python 2 编写的,但我使用的是 Python 3.4,这可能导致某些代码无法使用。 What type should I pass this method?我应该通过什么类型的方法?

From your test code ... # <= what should go in this parentheses 从您的测试代码... # <=这个括号中应该包含什么

I believe you should have a tuple (("hi", float(6/(6+17)),("seven", float(17/(6+17)))) 我相信你应该有一个元组(("hi", float(6/(6+17)),("seven", float(17/(6+17))))

Thanks to J Herron and selva for the answer to use tuples instead of a list object-- and I ended up with this: 感谢J Herron和selva回答使用元组而不是列表对象 - 我最终得到了这个:

cloud.generate_from_frequencies((("hi", 3),("seven", 7)))

生成的词云

It still came up as an error in my IDE, which was misleading, but it worked the way it was supposed to. 它仍然在我的IDE中出现错误,这是一种误导,但它按照预期的方式工作。

Building on CCCodes' answer, here is the new version of the provided way with the weight mapped to the word in a dict:基于 CCCodes 的回答,这里是所提供方式的新版本,权重映射到字典中的单词:

cloud.generate_from_frequencies({"hi": 3,"seven": 7})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM