[英]How to fix writing to csv file
I want my program to write article dates, titles, and body text to a csv file. 我希望我的程序将文章日期,标题和正文文本写入csv文件。 When I print the body text in the console, it prints everything, however on the csv file, it only prints the last line of the article.
当我在控制台中打印正文时,它会打印所有内容,但是在csv文件中,它只会打印文章的最后一行。
CSV Result: CSV结果:
Console Print: 控制台打印:
I've tried writing the date, title, and body text into rows in separate lines of code as apposed to as a list and it had the same result. 我尝试将日期,标题和正文文本写成一行,并以单独的代码行作为列表,结果具有相同的结果。
from bs4 import BeautifulSoup
from urllib.request import urlopen
import csv
csvfile = "C:/Users/katew/Dropbox/granularitygrowth/Politico/pol.csv"
with open(csvfile, mode='w', newline='') as pol:
csvwriter = csv.writer(pol, delimiter='|', quoting=csv.QUOTE_MINIMAL)
csvwriter.writerow(["Date", "Title", "Article"])
#for each page on Politico archive
for p in range(0,1):
url = urlopen("https://www.politico.com/newsletters/playbook/archive/%d" % p)
content = url.read()
#Parse article links from page
soup = BeautifulSoup(content,"lxml")
articleLinks = soup.findAll('article', attrs={'class':'story-frag format-l'})
#Each article link on page
for article in articleLinks:
link = article.find('a', attrs={'target':'_top'}).get('href')
#Open and read each article link
articleURL = urlopen(link)
articleContent = articleURL.read()
#Parse body text from article page
soupArticle = BeautifulSoup(articleContent, "lxml")
#Limits to div class = story-text tag (where article text is)
articleText = soupArticle.findAll('div', attrs={'class':'story-text'})
for div in articleText:
#Find date
footer = div.find('footer', attrs={'class':'meta'})
date = footer.find('time').get('datetime')
print(date)
#Find title
headerSection = div.find('header')
title = headerSection.find('h1').text
print(title)
bodyText = div.findAll('p')
for p in bodyText:
p_string = str(p.text)
textContent = "" + p_string
print(textContent)
#Adds data to csv file
csvwriter.writerow([date, title, textContent])
I expect the csv file to include the date, title and full body text. 我希望csv文件包含日期,标题和全文。
The problem's in your for p in bodyText:
loop. 问题出在您
for p in bodyText:
循环中。 You're assigning the text of the last p to your textContent
variable. 您正在将最后一个p的文本分配给您的
textContent
变量。 Try something like: 尝试类似:
textContent = ""
bodyText = div.findAll('p')
for p in bodyText:
p_string = str(p.text)
textContent += p_string + ' '
print(textContent)
csvwriter.writerow([date, title, textContent])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.