[英]Issues while writing special characters to csv file
I am writing the crawled output of a webpage to CSV files. 我正在将网页的抓取输出写入CSV文件。 However few special characters such as 'hyphen' is not getting parsed correctly.
但是,很少有特殊字符(例如“连字符”)无法正确解析。
Original Text : Amazon Forecast - Now Generally Available 原始文本:Amazon Forecast-现在普遍可用
Result in csv : Amazon Forecast – Now Generally Available CSV格式的结果:Amazon Forecast –现在普遍可用
I tried the below code 我尝试了以下代码
from bs4 import BeautifulSoup
from datetime import date
import requests
import csv
source = requests.get('https://aws.amazon.com/blogs/aws/').text
soup = BeautifulSoup(source, 'lxml')
# csv_file = open('aitrendsresults.csv', 'w')
csv_file = open('aws_cloud_results.csv', 'w' , encoding = 'utf8' )
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['title','img','src','summary'])
match = soup.find_all('div',class_='lb-row lb-snap')
for n in match:
imgsrc= n.div.img.get('src')
titlesrc= n.find('div',{'class':'lb-col lb-mid-18 lb-tiny-24'})
titletxt= titlesrc.h2.text
anchortxt= titlesrc.a.get('href')
sumtxt= titlesrc.section.p.text
print(sumtxt)
csv_writer.writerow([titletxt,imgsrc,anchortxt,sumtxt])
csv_file.close()
Can you please help me to get the text like the same in original text provided above. 您能帮我得到与上面提供的原始文本相同的文本吗?
Create a function to handle ASCII characters (ie Hyphen, Semicolon) and pass the string as argument inside the function below: 创建一个函数来处理ASCII字符(例如,连字符,分号),并将字符串作为参数传递给下面的函数:
def decode_ascii(string):
return string.encode('ascii', 'ignore').decode('ascii')
input_text = 'Amazon Forecast - Now Generally Available'
output_text = decode_ascii(input_text)
print(output_text)
Output should be Amazon Forecast - Now Generally Available
in the CSV. 输出应为
Amazon Forecast - Now Generally Available
在CSV中Amazon Forecast - Now Generally Available
。
I've been working with BS as well and I think you've only made a minor mistake. 我也一直在与BS合作,我认为您只是犯了一个小错误。 In line 8, where you open the csv file, the encoding should be "UTF-8" instead of "utf8".
在第8行中,打开csv文件,编码应为“ UTF-8”而不是“ utf8”。 See if that helps.
看看是否有帮助。
Using title as test the following works for me 使用标题测试以下对我有用
from bs4 import BeautifulSoup
import requests, csv
source = requests.get('https://aws.amazon.com/blogs/aws/').text
soup = BeautifulSoup(source, 'lxml')
with open("aws_cloud_results.csv", "w", encoding="utf-8-sig", newline='') as csv_file:
w = csv.writer(csv_file, delimiter = ";", quoting=csv.QUOTE_MINIMAL)
w.writerow(['title'])
match = soup.find_all('div',class_='lb-row lb-snap')
for n in match:
titlesrc= n.find('div',{'class':'lb-col lb-mid-18 lb-tiny-24'})
titletxt= titlesrc.h2.text
w.writerow([titletxt])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.