繁体   English   中英

Python从文件读取URL并打印到文件

[英]Python Read URLs from File and print to file

我在一个文本文件中有一个URL列表,我想从中获取文章文本,作者和文章标题。 获得这三个元素后,我希望将它们写入文件。 到目前为止,我可以从文本文件中读取URL,但是Python仅打印出URL和一个(最后一篇文章)。 如何重新编写脚本,以便Python读取和写入每个URL和内容?

我必须使用以下Python脚本(2.7版-Mac OS X Yosemite):

from newspaper import Article

f = open('text.txt', 'r') #text file containing the URLS
for line in f:
    print line

url = line
first_article = Article(url)
first_article.download()

first_article.parse()

# write/append to file 
with open('anothertest.txt', 'a') as f:
    f.write(first_article.title)
    f.write(first_article.text)

print str(first_article.title)

for authors in first_article.authors:
    print authors
if not authors:
    print 'No author'

print str(first_article.text)

您正在获得上一篇文章,因为您正在遍历文件的所有行:

for line in f:
    print line

循环结束后,line包含最后一个值。

url = line

如果在循环内移动代码的内容,则:

with open('text.txt', 'r') as f: #text file containing the URLS
    with open('anothertest.txt', 'a') as fout:
        for url in f:
            print(u"URL Line: {}".format(url.encode('utf-8')))

            # you might want to remove endlines and whitespaces from 
            # around the URL, which what strip() does
            article = Article(url.strip())
            article.download()
            article.parse()

            # write/append to file 
            fout.write(article.title)
            fout.write(article.text)

            print(u"Title: {}".format(article.title.encode('utf-8')))

            # print authors only if there are authors to show.
            if len(article.authors) == 0:
                print('No author!')
            else:
                for author in article.authors:
                    print(u"Author: {}".format(author.encode('utf-8')))

            print("Text of the article:")
            print(article.text.encode('utf-8'))

我还做了一些更改来改进您的代码:

  • 与open()一起使用还可以读取文件,以在不再需要文件描述符时适当地释放它;
  • 调用输出文件fout以避免隐藏第一个文件
  • 在进入循环之前进行一次fout的打开调用,以避免每次迭代都打开/关闭文件,
  • 请检查article.authors长度,而不是检查authors是否存在,因为当您不在循环中时,因为article.authors为空, authors将不存在。

HTH

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM