简体   繁体   English

seqmining:如何在python上计算序列的频率

[英]seqmining: how to calculate frequency of a sequence on python

I'm trying to use pymining on Python to generate frequent sequences from my dataset. 我正在尝试在Python上使用pymining从数据集中生成频繁序列。 My code below appears to be working well: 我的以下代码似乎运行良好:

from pymining import seqmining
seqs = ( 'caabc', 'abcb', 'cabc', 'abbca')
freq_seqs = seqmining.freq_seq_enum(seqs, 2)
sorted(freq_seqs)

However, when i want to use it with my dataset: 但是,当我想将其与数据集一起使用时:

    import numpy as np
    import pandas as pd
    from pymining import seqmining

    def importdata():
    filename = pd.read_csv('C:/Users/asus/Desktop/memoire/sequences-code.csv', sep= ';', header = None)

data=importdata()
seqs = data
freq_seqs = seqmining.freq_seq_enum(seqs, 2)
sorted(freq_seqs)

I get this error: 我收到此错误:

TypeError: 'NoneType' object is not iterable

this is all the error: 这就是所有错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-4-19e2af14465a> in <module>()
      8 data=importdata()
      9 seqs = data
---> 10 freq_seqs = seqmining.freq_seq_enum(seqs, 2)
     11 sorted(freq_seqs)
     12 

~\Anaconda3\lib\site-packages\pymining\seqmining.py in freq_seq_enum(sequences, min_support)
      9     '''
     10     freq_seqs = set()
---> 11     _freq_seq(sequences, tuple(), 0, min_support, freq_seqs)
     12     return freq_seqs
     13 

~\Anaconda3\lib\site-packages\pymining\seqmining.py in _freq_seq(sdb, prefix, prefix_support, min_support, freq_seqs)
     16     if prefix:
     17         freq_seqs.add((prefix, prefix_support))
---> 18     locally_frequents = _local_freq_items(sdb, prefix, min_support)
     19     if not locally_frequents:
     20         return

~\Anaconda3\lib\site-packages\pymining\seqmining.py in _local_freq_items(sdb, prefix, min_support)
     28     items = defaultdict(int)
     29     freq_items = []
---> 30     for entry in sdb:
     31         visited = set()
     32         for element in entry:

TypeError: 'NoneType' object is not iterable

The simplest change you can make to your code is to get rid of importdata , which is just a wrapper on pd.read_csv . 您可以对代码进行的最简单的更改就是摆脱importdata ,它只是pd.read_csv的包装。 Try: 尝试:

filename = 'C:/Users/asus/Desktop/memoire/sequences-code.csv'
data = pd.read_csv(filename, sep=';', header=None)

Let me know if that helps. 让我知道是否有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM