CSV 在不应该创建新行时

Question

I'm currently working on a project for myself, and that includes scraping this specific website.我目前正在为自己开发一个项目，其中包括抓取这个特定的网站。

My code currently looks like this:我的代码目前如下所示：

for i in range(0,4):
my_url = 'https://www.kickante.com.br/campanhas-crowdfunding?page='+str(i)
uclient = ureq(my_url)
page_html = uclient.read()
uclient.close()

page_soup = soup(page_html, 'html.parser')

containers = page_soup.find_all("div", {"class":"campaign-card-wrapper views-row"})
for container in containers:
    #Achando os títulos das campanhas
    titleCampaignBruto = container.div.div.a.img["title"].replace('Crowdfunding para: ', '')
    titleCampaignParsed = titleCampaignBruto.strip().replace(",", ";")
    #Achando o valor da campanha
    arrecadadoFind = container.div.find_all("div",{"class":"funding-raised"})
    arrecadado = arrecadadoFind[0].text.strip().replace(",", ".")

    #Número de doadores
    doadoresBruto = container.div.find_all('span', {"class":"contributors-value"})
    doadoresParsed = doadoresBruto[0].text.strip().replace(",",";")

    #target da campanha
    fundingGoal = container.div.find_all('div', {"class":"funding-progress"})
    quantoArrecadado = fundingGoal[0].text.strip().replace(",",";")

    #Descricao da campanha
    descricaoBruta = container.div.find_all('div', {"class":"field field-name-field-short-description field-type-text-long field-label-hidden"})
    descricaoParsed = descricaoBruta[0].text.strip().replace(",",";")

    #link da campanha
    linkCampanha = container.div.find_all('href')
    print("Título da campanha: " + titleCampaignParsed)
    print("Valor da campanha: " +arrecadado)
    print("Doadores: "+ doadoresParsed)
    print("target: " + quantoArrecadado)
    print("descricao: " + descricaoParsed)

    f.write(titleCampaignParsed + "," + arrecadado + "," + doadoresParsed + "," + quantoArrecadado+ "," + descricaoParsed.replace("," ,";") + "\n")
i = i+1
f.close()

When I open the csv file it generated, I see that some lines are broken where they shouldn't be (example: See line 31 on the csv file ).当我打开它生成的 csv 文件时，我看到有些行在不应该出现的地方被破坏（例如：参见 csv 文件的第 31 行）。 That line should be a part of the previous line (line 30) as the body of the description.该行应该是前一行（第 30 行）的一部分，作为描述的主体。

Does anyone have an idea of what can be causing that?有谁知道是什么原因造成的？ Thanks in advance.提前致谢。

Answer 1

Some of the text you're writing to CSV might contain newlines.您写给 CSV 的某些文本可能包含换行符。 You can remove them like so:您可以像这样删除它们：

csv_line_entries = [
    titleCampaignParsed, arrecadado,  doadoresParsed, 
    quantoArrecadado, descricaoParsed.replace("," ,";")
]
csv_line = ','.join([
    entry.replace('\n', ' ') for entry in csv_line_entries
])
f.write(csv_line + '\n')

Cause of the bug错误的原因

The strip() method removes only leading and trailing newlines/whitespace. strip()方法仅删除前导和尾随换行符/空格。

import bs4
soup = bs4.BeautifulSoup('<p>Whatever\nelse\n</p>')
soup.find('p').text.strip()
>>> 'Whatever\nelse'

Notice that the inner \n is not removed.请注意，内部的\n没有被删除。

Answer 2

You have newlines in the middle of the text.文本中间有换行符。 strip() only removes whitespace on the start and end of a string, so you need to use replace('\n','') instead. strip()仅删除字符串开头和结尾的空格，因此您需要使用replace('\n','')代替。 This replaces all of the newlines \n with nothing ''这将所有换行符\n替换为空''

CSV 在不应该创建新行时

问题描述

2 个解决方案

解决方案1
0 已采纳 2020-06-29 19:00:30

Cause of the bug错误的原因

解决方案2
0 2020-06-29 19:05:37

CSV 在不应该创建新行时

问题描述

2 个解决方案

解决方案1 0 已采纳 2020-06-29 19:00:30

Cause of the bug错误的原因

解决方案2 0 2020-06-29 19:05:37

解决方案1
0 已采纳 2020-06-29 19:00:30

解决方案2
0 2020-06-29 19:05:37