[英]Python Read URLs from File and print to file
我在一个文本文件中有一个URL列表,我想从中获取文章文本,作者和文章标题。 获得这三个元素后,我希望将它们写入文件。 到目前为止,我可以从文本文件中读取URL,但是Python仅打印出URL和一个(最后一篇文章)。 如何重新编写脚本,以便Python读取和写入每个URL和内容?
我必须使用以下Python脚本(2.7版-Mac OS X Yosemite):
from newspaper import Article
f = open('text.txt', 'r') #text file containing the URLS
for line in f:
print line
url = line
first_article = Article(url)
first_article.download()
first_article.parse()
# write/append to file
with open('anothertest.txt', 'a') as f:
f.write(first_article.title)
f.write(first_article.text)
print str(first_article.title)
for authors in first_article.authors:
print authors
if not authors:
print 'No author'
print str(first_article.text)
您正在获得上一篇文章,因为您正在遍历文件的所有行:
for line in f:
print line
循环结束后,line包含最后一个值。
url = line
如果在循环内移动代码的内容,则:
with open('text.txt', 'r') as f: #text file containing the URLS
with open('anothertest.txt', 'a') as fout:
for url in f:
print(u"URL Line: {}".format(url.encode('utf-8')))
# you might want to remove endlines and whitespaces from
# around the URL, which what strip() does
article = Article(url.strip())
article.download()
article.parse()
# write/append to file
fout.write(article.title)
fout.write(article.text)
print(u"Title: {}".format(article.title.encode('utf-8')))
# print authors only if there are authors to show.
if len(article.authors) == 0:
print('No author!')
else:
for author in article.authors:
print(u"Author: {}".format(author.encode('utf-8')))
print("Text of the article:")
print(article.text.encode('utf-8'))
我还做了一些更改来改进您的代码:
fout
以避免隐藏第一个文件 fout
的打开调用,以避免每次迭代都打开/关闭文件, article.authors
长度,而不是检查authors
是否存在,因为当您不在循环中时,因为article.authors
为空, authors
将不存在。 HTH
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.