Python网络抓取到csv

Question

I did a web-scraping, and I get a table which one I want to write into CSV. 我进行了一次网络抓取，然后得到了一张要写入CSV的表格。

When I try it, I get this message : 当我尝试它时，我收到以下消息：

"Traceback (most recent call last): “追踪（最近一次致电过去）：

File "C:/Python27/megoldas3.py", line 27, in <module> <module>中的文件“ C：/Python27/megoldas3.py”，第27行
file.write(bytes(header,encoding="ascii",errors="ignore")) TypeError: file.write（bytes（header，encoding =“ ascii”，errors =“ ignore”））TypeError：
str() takes at most 1 argument (3 given)" str（）最多接受1个参数（给定3个）”

What's wrong with this code? 此代码有什么问题？ I use Python 2.7.13. 我使用Python 2.7.13。

import urllib2
from bs4 import BeautifulSoup
import csv
import os

out=open("proba.csv","rb")
data=csv.reader(out)

def make_soup(url):
    thepage = urllib2.urlopen(url)
    soupdata = BeautifulSoup(thepage, "html.parser")
    return soupdata

maindatatable=""
soup = make_soup("https://www.mnb.hu/arfolyamok")

for record in soup.findAll('tr'):
    datatable=""
    for data in record.findAll('td'):
        datatable=datatable+","+data.text
    maindatatable = maindatatable + "\n" + datatable[1:]

header = "Penznem,Devizanev,Egyseg,Penznemforintban"
print maindatatable

file = open(os.path.expanduser("proba.csv"),"wb")
file.write(bytes(header,encoding="ascii",errors="ignore"))
file.write(bytes(maindatatable,encoding="ascii",errors="ignore"))

Answer 1

You have misplaced parens. 您放错了paren。 encoding and errors are parameters of file.write() not bytes() . encoding和errors是file.write()而不是bytes() 。

file.write(bytes(header),encoding="ascii",errors="ignore")

Answer 2

How about encoding your strings before trying to write them? 在尝试编写字符串之前如何对其进行编码？

utf8_str = maindatatable.encode('utf8')
file.write(utf8_str)

Also don't forget to file.close() 也不要忘记file.close（）

Answer 3

I think this will work for you. 我认为这对您有用。 Just remove encoding="ascii",errors="ignore" from bytes 只需从字节中删除encoding="ascii",errors="ignore"

# import re
# data = [['revenue', 'margins'], ['revenue', 'liquidity'], ['revenue', 'ratio'], ['revenue', 'pricing'], ['revenue', 'assets'], ['revenue', 'recent trends']]
# with open('a.txt') as f:
#   txt = f.read()
#   for d in data:
#       c1 = re.findall(d[0],txt)
#       c2 = re.findall(d[1],txt)
#       if c1 and c2:
#           print {c1[0]:len(c1),c2[0]:len(c2)}


import urllib2
from bs4 import BeautifulSoup
import csv
import os

out=open("proba.csv","rb")
data=csv.reader(out)

def make_soup(url):
    thepage = urllib2.urlopen(url)
    soupdata = BeautifulSoup(thepage, "html.parser")
    return soupdata

maindatatable=""
soup = make_soup("https://www.mnb.hu/arfolyamok")

for record in soup.findAll('tr'):
    datatable=""
    for data in record.findAll('td'):
        datatable=datatable+","+data.text
    maindatatable = maindatatable + "\n" + datatable[1:]

header = "Penznem,Devizanev,Egyseg,Penznemforintban"
print maindatatable

file = open(os.path.expanduser("proba.csv"),"wb")
file.write(header.encode('utf-8').strip())
file.write(maindatatable.encode('utf-8').strip())

Answer 4

This should work 这应该工作

file.write(bytes(header.encode('ascii','ignore')))
file.write(bytes(maindatatable.encode('ascii','ignore')))

Python网络抓取到csv

问题描述

4 个解决方案

解决方案1
1 2017-06-09 12:29:39

解决方案2
1 已采纳 2017-06-09 12:30:55

解决方案3
0 2017-06-09 12:34:08

解决方案4
0 2017-06-09 12:39:15

Python网络抓取到csv

问题描述

4 个解决方案

解决方案1 1 2017-06-09 12:29:39

解决方案2 1 已采纳 2017-06-09 12:30:55

解决方案3 0 2017-06-09 12:34:08

解决方案4 0 2017-06-09 12:39:15

解决方案1
1 2017-06-09 12:29:39

解决方案2
1 已采纳 2017-06-09 12:30:55

解决方案3
0 2017-06-09 12:34:08

解决方案4
0 2017-06-09 12:39:15