简体   繁体   English

在Python中使用hmmlearn学习字符序列

[英]Learn characters sequences using hmmlearn in Python

Here is my problem, I'm trying to teach a Hidden Markov Models using hmmlearn. 这是我的问题,我正在尝试使用hmmlearn教授隐马尔可夫模型。 I'm new to the language, and I have some difficulties to understand the differences between lists and arrays. 我是该语言的新手,我很难理解列表和数组之间的区别。 Here is my code: 这是我的代码:

from hmmlearn import hmm
from babel import lists
import numpy as np
import unidecode as u
from numpy import char

l = []
data = []
gods_egypt = ["Amon","Anat","Anouket","Anubis","Apis","Atoum","Bastet","Bès","Gheb","Hâpy","Harmachis","Hathor","Heh","Héket","Horus","Isis","Ka","Khepri","Khonsou","Khnoum","Maât","Meresger","Mout","Nefertoum","Neith","Nekhbet","Nephtys","Nout","Onouris","Osiris","Ouadjet","Oupaout","Ptah","Rê","Rechef","Renenoutet","Satet","Sebek","Sekhmet","Selkis","Seth","Shou","Sokaris","Tatenen","Tefnout","Thot","Thouéris"]
for i in range(0, len(gods_egypt)):
    data.append([])
    for j in range(0, len(gods_egypt[i])):
        data[i].append([u.unidecode(gods_egypt[i][j].lower())])
    l.append(len(data[i]))
data = np.asarray(data).reshape(-1,1)
model = hmm.MultinomialHMM(20, verbose=True)
model = model.fit(data, l)

and the resulting output 和结果输出

Traceback (most recent call last):
  File "~~~\HMM_test.py", line 17, in <module>
    model = model.fit(data, l)
  File "~~~\Python\Python36\site-packages\hmmlearn\base.py", line 420, in fit
    X = check_array(X)
  File "~~~\Python36-32\lib\site-packages\sklearn\utils\validation.py", line 402, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.

I have seen at ValueError: setting an array element with a sequence that it might be a problem of different array length, but I can't figure out how to solve it. 我在ValueError上看到过:用一个序列设置数组元素可能会导致数组长度不同的问题,但是我不知道该如何解决。

Any suggestion ? 有什么建议吗?

The error itself comes from the fact that model.fit() is expecting an array of arrays of numerical values. 该错误本身来自于model.fit()期望数值数组组成的数组这一事实。 Right now your input data is an array of arrays of list of lists of string. 现在,您的输入data是一个字符串列表列表数组。 This is what provokes an error as the function finds that the array element that it is expecting is a sequence ie, the list (of lists of strings). 这是由于函数发现期望的array element is a sequence即列表(字符串列表)而引发错误。

However, even if you fix the list issue, another issue will arise: Learning an HMM implies computing numerical quantities via some set of equations. 但是,即使您解决了列表问题,也会出现另一个问题:学习HMM意味着通过一些方程式计算数值。 The input data to learn an HMM should be numerical, not a set of letters. 学习HMM的输入数据应该是数字,而不是字母。 (Except if hmmlearn has a very special option for characters that I am not aware of.) (除非hmmlearn对于我不知道的字符有一个非常特殊的选项。)

You need to first transform the letters into numbers if you want to work with HMMs. 如果要使用HMM,则需要先将字母转换为数字。

I do not know what you end goal is. 我不知道你的最终目标是什么。 HMM are aimed at modeling data for generation or classification purpose (if several HMMs are trained). HMM旨在为生成或分类目的建模数据(如果训练了几个HMM)。 What are you intending to do once you have a trained model from the letters composing the words? 一旦有了组成单词的字母的训练有素的模型,您打算做什么?

As for the format in which the data should be provided to the different functions, I suggest that you give a look at the documentation . 至于将数据提供给不同功能的格式,建议您看一下文档 It includes tutorials for the use of the library. 它包括使用该库的教程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM