簡體   English   中英

PowerIterationFailedConvergence: (PowerIterationFailedConvergence(…), '冪迭代未能在 500 次迭代內收斂')

[英]PowerIterationFailedConvergence: (PowerIterationFailedConvergence(…), 'power iteration failed to converge within 500 iterations')

我正在嘗試找出每個相似度矩陣的 textrank 分數。 總結 function 被定義為產生總結。 function 被稱為句子列表,即result ,但在使用 PageRank 算法對句子進行排序時會出現錯誤。 我嘗試通過手動更改 PageRank function 中的max_iter值來調試它,錯誤仍然相同。

get_score function

在總結 function 中調用它。 此 function 內部出現錯誤。

def get_score(sim_mat):
    import networkx as nx
    nx_graph = nx.from_numpy_array(sim_mat)
    score = nx.pagerank(nx_graph, max_iter=500)
    return score

Summarize function獲取原始文本並返回摘要

def summarize(text):

    sentences = sent_tokenize(text) 
    t_clean_sentences = []
    for i in range(len(sentences)):
        obj = text_preprocessing(sentences[i])
        j = obj.text_cleaner()
        t_clean_sentences.append(j)
      
    clean_sentences = []
    for i in range(len(t_clean_sentences)):
        a = gb.predict(vectorizer.transform([t_clean_sentences[i]]))
        if a[0] != 'whQuestion' and a[0] != 'ynQuestion':
            clean_sentences.append(t_clean_sentences[i])

    from nltk.corpus import stopwords
    from nltk.tokenize import word_tokenize

    stop_words = set(stopwords.words('english'))

    filtered_sentences = []

    for i in range(len(clean_sentences)):
        word_tokens = word_tokenize(clean_sentences[i])
        filtered_sentence = [w for w in word_tokens if not w in stop_words]
        filtered_sentences.append(" ".join(filtered_sentence))
    filtered_sentences
    import numpy as np
    #sentence vectors
    sentence_vectors = []
    for i in filtered_sentences:
        if len(i) != 0:
            v = sum([word_embeddings.get(w, np.zeros((100,))) for w in i.split()])/(len(i.split())+0.001)
        else:
            v = np.zeros((100,))
        sentence_vectors.append(v)

    from sklearn.metrics.pairwise import cosine_similarity
    sim_mat = np.zeros([len(clean_sentences), len(clean_sentences)])

    for i in range(len(clean_sentences)):
          for j in range(len(clean_sentences)):
                if i != j:
                      sim_mat[i][j] = cosine_similarity(sentence_vectors[i].reshape(1,100), sentence_vectors[j].reshape(1,100))[0,0]
    
    #pagerank scores
    scores = get_score(sim_mat)
    ranked_sentences = sorted(((scores[i],s) for i,s in enumerate(clean_sentences)), reverse=True)
    # Specify number of sentences to form the summary
  

    # Generate summary
    summary = []
    for i in range(len(ranked_sentences)):
        summary.append(ranked_sentences[i][1].capitalize())
    return summary

function 調用

result的大小是100 ,當我在result的前50個句子列表中嘗試它時,它工作正常。 然后我做了一個系統,其中循環一次只總結50個句子列表並繼續直到達到result的大小,但它仍然顯示相同的錯誤。

#text is the raw text from the TXT file
 
result = list(filter(lambda x : x != '', text.split(':')))
compiled = []
for r in result:
  compiled.append(summarize(r))

錯誤

---------------------------------------------------------------------------
PowerIterationFailedConvergence           Traceback (most recent call last)
<ipython-input-22-a04a4d4d0dfb> in <module>()
      1 compiled = []
      2 for r in range(len(result)):
----> 3   compiled.append(summarize(result[r]))

3 frames
<ipython-input-21-c7462482feb4> in summarize(text)
     45 
     46     #pagerank scores
---> 47     scores = get_score(sim_mat)
     48     ranked_sentences = sorted(((scores[i],s) for i,s in enumerate(clean_sentences)), reverse=True)
     49     # Specify number of sentences to form the summary

<ipython-input-10-798a017cf041> in get_score(sim_mat)
      2     import networkx as nx
      3     nx_graph = nx.from_numpy_array(sim_mat)
----> 4     score = nx.pagerank(nx_graph)
      5     return score

<decorator-gen-431> in pagerank(G, alpha, personalization, max_iter, tol, nstart, weight, dangling)

/usr/local/lib/python3.6/dist-packages/networkx/utils/decorators.py in _not_implemented_for(not_implement_for_func, *args, **kwargs)
     80             raise nx.NetworkXNotImplemented(msg)
     81         else:
---> 82             return not_implement_for_func(*args, **kwargs)
     83     return _not_implemented_for
     84 

/usr/local/lib/python3.6/dist-packages/networkx/algorithms/link_analysis/pagerank_alg.py in pagerank(G, alpha, personalization, max_iter, tol, nstart, weight, dangling)
    156         if err < N * tol:
    157             return x
--> 158     raise nx.PowerIterationFailedConvergence(max_iter)
    159 
    160 

PowerIterationFailedConvergence: (PowerIterationFailedConvergence(...), 'power iteration failed to converge within 100 iterations')

我找到了解決方案。 我只是使用nx.pagerank_numpy(nx_graph)而不是nx.pagerank(nx_graph) 這解決了這個問題,因為我使用的圖形和相似度矩陣采用nx_graph = nx.from_numpy_array(sim_mat)的形式。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM