如何在報紙3k中訪問緩存的文章

Question

報紙是一個很棒的圖書館，可以抓取網絡數據，但是我對文章緩存感到有些困惑。 它緩存文章以加快操作速度，但是如何訪問這些文章？

我有這樣的東西。 現在，當我使用相同的文章集兩次運行此命令時，第二次獲得返回類型None 。 如何訪問那些以前緩存的文章進行處理？

newspaper_articles = [Article(url) for url in links]

Answer 1

看着這樣的： https://github.com/codelucas/newspaper/issues/481似乎緩存方法“cache_disk”在https://github.com/codelucas/newspaper/blob/master/newspaper/utils.py可能有一個錯誤。 它確實確實將結果緩存到磁盤（搜索文件夾“ .newspaper_scraper”），但此后不加載它們。

一種解決方法是在構建報紙或使用Config類時將memoize_articles = False設置為。

newspaper.build(url, memoize_articles=False)

Answer 2

從源代碼檢查后，這取決於。

DATA_DIRECTORY = '.newspaper_scraper'

TOP_DIRECTORY = os.path.join(tempfile.gettempdir(), DATA_DIRECTORY)

因此，請在您的python解釋器中運行此命令以獲取緩存位置

import tempfile
tempfile.gettempdir()