在BeautifulSoup之后写入csv文件

Question

Using BeautifulSoup to extract some text, and then I want to save the entries into a csv file. 使用BeautifulSoup提取一些文本，然后将条目保存到csv文件中。 My code as follows: 我的代码如下：

for trTag in trTags:
    tdTags = trTag.find("td", class_="result-value")
    tdTags_string = tdTags.get_text(strip=True)
    saveFile = open("some.csv", "a")
    saveFile.write(str(tdTags_string) + ",")
    saveFile.close()

saveFile = open("some.csv", "a")
saveFile.write("\n")
saveFile.close()

It did what I want for the most part EXCEPT whenever if there is a comma (",") within the entry, it sees it as a separator and split the single entry into two different cells (which is not what I want). 只要条目中有逗号（“，”），它就可以满足我的大部分需求，它将其视为分隔符并将单个条目拆分为两个不同的单元格（这不是我想要的）。 So I searched around the net and found people suggested of using the csv module and I changed my codes into: 因此，我在网上搜索，发现有人建议使用csv模块，然后将代码更改为：

for trTag in trTags:
    tdTags = trTag.find("td", class_="result-value")
    tdTags_string = tdTags.get_text(strip=True)
    print tdTags_string

    with open("some.csv", "a") as f:
        writeFile = csv.writer(f)
        writeFile.writerow(tdTags_string)

saveFile = open("some.csv", "a")
saveFile.write("\n")
saveFile.close()

This made it even worse, now each letter/number of a word or number occupies a single cell in the csv file. 这就变得更糟了，现在单词或数字的每个字母/数字都占用了csv文件中的单个单元格。 For example, if the entry is "Cat". 例如，如果条目为“猫”。 The "C" is in one cell, "a" is the next cell, and "t" is the third cell, etc. “ C”在一个单元格中，“ a”是下一个单元格，“ t”是第三个单元格，依此类推。

Edited version: 编辑版本：

import urllib2
import re
import csv
from bs4 import BeautifulSoup

SomeSiteURL = "https://SomeSite.org/xyz"
OpenSomeSiteURL = urllib2.urlopen(SomeSiteURL)
Soup_SomeSite = BeautifulSoup(OpenSomeSiteURL, "lxml")
OpenSomeSiteURL.close()

# finding name
NameParentTag = Soup_SomeSite.find("tr", class_="result-item highlight-person")
Name = NameParentTag.find("td", class_="result-value-bold").get_text(strip=True)
saveFile = open("SomeSite.csv", "a")
saveFile.write(str(Name) + ",")
saveFile.close()

# finding other info
# <tbody> -> many <tr> -> in each <tr>, extract second <td>
tbodyTags = Soup_SomeSite.find("tbody")
trTags = tbodyTags.find_all("tr", class_="result-item ")

for trTag in trTags:
    tdTags = trTag.find("td", class_="result-value")
    tdTags_string = tdTags.get_text(strip=True)

    with open("SomeSite.csv", "a") as f:
        writeFile = csv.writer(f)
        writeFile.writerow([tdTags_string])

2nd edition: 第二版：

placeHolder = []

for trTag in trTags:
    tdTags = trTag.find("td", class_="result-value")
    tdTags_string = tdTags.get_text(strip=True)
    placeHolder.append(tdTags_string)

with open("SomeSite.csv", "a") as f:
    writeFile = csv.writer(f)
    writeFile.writerow(placeHolder)

Updated output: 更新的输出：

u'stuff1'
u'stuff2'
u'stuff3'

Output example: 输出示例：

u'record1'  u'31 Mar 1901'  u'California'

u'record1'  u'31 Mar 1901'  u'California'

record1     31-Mar-01       California

Another edited codes (still having one issue - skipping one line below): 另一个已编辑的代码（仍然有一个问题-跳过以下一行）：

import urllib2
import re
import csv
from bs4 import BeautifulSoup

SomeSiteURL = "https://SomeSite.org/xyz"
OpenSomeSiteURL = urllib2.urlopen(SomeSiteURL)
Soup_SomeSite = BeautifulSoup(OpenSomeSiteURL, "lxml")
OpenSomeSiteURL.close()

# finding name
NameParentTag = Soup_SomeSite.find("tr", class_="result-item highlight-person")
Name = NameParentTag.find("td", class_="result-value-bold").get_text(strip=True)
saveFile = open("SomeSite.csv", "a")
saveFile.write(str(Name) + ",")
saveFile.close()

# finding other info
# <tbody> -> many <tr> -> in each <tr>, extract second <td>
tbodyTags = Soup_SomeSite.find("tbody")
trTags = tbodyTags.find_all("tr", class_="result-item ")

placeHolder = []

for trTag in trTags:
    tdTags = trTag.find("td", class_="result-value")
    tdTags_string = tdTags.get_text(strip=True)
    #print repr(tdTags_string)
    placeHolder.append(tdTags_string.rstrip('\n'))

with open("SomeSite.csv", "a") as f:
    writeFile = csv.writer(f)
    writeFile.writerow(placeHolder)

Answer 1

with open("some.csv", "a") as f:
        writeFile = csv.writer(f)
        writeFile.writerow([tdTags_string]) # put in a list

writeFile.writerow will iterate over what you pass in so a string "foo" becomes f,o,o three separate values, wrapping it in a list will prevent this as writer will iterate over the list not the string writeFile.writerow将遍历你在这样一个字符串传递什么"foo"变成f,o,o三个独立的值，在其包装list可以防止这种作为作家会遍历列表不是字符串

You should open your file once as opposed to every time through your loop: 您应该一次打开文件，而不是每次循环都打开文件：

with open("SomeSite.csv", "a") as f:
    writeFile = csv.writer(f)
    for trTag in trTags:
        tdTags = trTag.find("td", class_="result-value")
        tdTags_string = tdTags.get_text(strip=True) # 
        writeFile.writerow([tdTags_string])

Answer 2

For the latest problem of skipping line, I have found an answer. 对于最新的跳线问题，我找到了答案。 Instead of 代替

with open("SomeSite.csv", "a") as f:
    writeFile = csv.writer(f)
    writeFile.writerow(placeHolder)

Use this: 用这个：

with open("SomeSite.csv", "ab") as f:
    writeFile = csv.writer(f)
    writeFile.writerow(placeHolder)

Source: https://docs.python.org/3/library/functions.html#open . 来源： https : //docs.python.org/3/library/functions.html#open 。 The "a" mode is the appending mode, where as "ab" is an appending mode while opening the file as binary file which solves the problem of skipping one extra line. “ a”模式是附加模式，其中“ ab”是将文件作为二进制文件打开时的附加模式，解决了跳过多余一行的问题。

在BeautifulSoup之后写入csv文件

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-08-12 22:19:43

解决方案2
1 2014-08-18 21:07:14

在BeautifulSoup之后写入csv文件

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-08-12 22:19:43

解决方案2 1 2014-08-18 21:07:14

解决方案1
1 已采纳 2014-08-12 22:19:43

解决方案2
1 2014-08-18 21:07:14