简体   繁体   English

在csv文件中保存正文文本| Python 3

[英]Save body text on csv file | Python 3

I am trying to create a database with several articles for Text mining purposes. 我正在尝试使用几篇文章创建一个用于文本挖掘目的的数据库。 I am extracting the body via web scraping and then save the body of these articles on a csv file. 我通过网络抓取提取身体,然后将这些文章的正文保存在csv文件中。 However, I couldn't manage to save all the body texts. 但是,我无法保存所有正文。 The code that I came up with saves only the text the last URL (article) while if I print what I am scraping (and what I am supposed to save) I obtain the body of all the articles. 我想出的代码只保存文本的最后一个URL(文章),而如果我打印我正在抓取的内容(以及我应该保存的内容)我获取所有文章的正文。

I just included some of the URL from the list (which contains a larger number of URLs) just to give you an idea: 我刚刚从列表中包含了一些URL(其中包含大量的URL),只是为了给你一个想法:

import requests
from bs4 import BeautifulSoup
import csv

r=["http://www.nytimes.com/2016/10/12/world/europe/germany-arrest-syrian-refugee.html",
"http://www.nytimes.com/2013/06/16/magazine/the-effort-to-stop-the-    attack.html",
"http://www.nytimes.com/2016/10/06/world/europe/police-brussels-knife-terrorism.html",
"http://www.nytimes.com/2016/08/23/world/europe/france-terrorist-attacks.html",
"http://www.nytimes.com/interactive/2016/09/09/us/document-Review-of-the-San-Bernardino-Terrorist-Shooting.html",
]

for url in r:
    t= requests.get(url)
    t.encoding = "ISO-8859-1"
    soup = BeautifulSoup(t.content, 'lxml')
    text = soup.find_all(("p",{"class": "story-body-text story-content"}))
    print(text)
with open('newdb30.csv', 'w', newline='') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ',quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(text)

Try declaring variable such as all_text = "" before the for loop and adding text to all_text by all_text += text + "\\n" at the end of the for loop (the \\n creates a new line). 尝试在for循环之前声明变量,例如all_text = ""并在for循环all_text通过all_text += text + "\\n"text添加到all_text\\n创建一个新行)。

Then, in the last row, instead of writing text , you write all_text . 然后,在最后一行中,您不是编写text ,而是编写all_text

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM