[英]Newspaper 0.0.6 for web scraping in Python
I used Python Newspaper lib to develop a web scraping script. 我使用Python Newspaper lib开发了一个Web抓取脚本。 I needed to extract the following - URL, Title, Summary, Author and date of publication.
我需要提取以下内容 - URL,标题,摘要,作者和出版日期。 I got all except the date of publication.
除了出版日期之外,我得到了所有。 My question is, has anyone used the Newspaper lib to capture publication date?
我的问题是,是否有人使用报纸库来捕捉出版日期?
hn.write("***********Article no" + str(x+1) + "************\r\n");
hn.write("URL: "+ article.url+ "\r\n");
hn.write("Title: "+ article.title + "\r\n");
hn.write( "Authors: "+' '.join(map(str, article.authors)));
hn.write("\r\n");
hn.write("Summary: "+ article.summary+ "\r\n);
hn.write("Key words: ");
hn.write(str(article.keywords).strip('[]'));
Is there a way to get the date of publication using Newspaper lib? 有没有办法使用Newspaper lib获取发布日期?
Thanks 谢谢
Mukesh 穆克什
There is line 195 in newspaper/article.py
newspaper/article.py
有第195行
# TODO self.publish_date = self.config.publishDateExtractor.extract(self.doc)
It seems it is not ready yet. 它似乎尚未准备好。 But you can try to uncomment this code.
但您可以尝试取消注释此代码。
Source: https://github.com/codelucas/newspaper/blob/master/newspaper/article.py#L195 资料来源: https : //github.com/codelucas/newspaper/blob/master/newspaper/article.py#L195
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.