I'm using newspaper3k
inside Scrapy
parse method. I want to extract links but I don't want to fetch the website again.
Is it possible to use this:
newspaper.build(..)
with plain html
so I can call .articles
than?
I found this solution:
import httpx
from newspaper import Article
async def get_article(url):
with httpx.AsyncClient() as client:
response = await client.get(url)
article = Article(url)
article.set_html(response.text)
article.parse()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.