[英]How can i read these cells from an html code with python web-scraping?
I want scraping the exchange prices informations from this website and after take it into a database: https://www.mnb.hu/arfolyamok 我想从这个网站上抓取交换价格信息,然后把它带到一个数据库: https : //www.mnb.hu/arfolyamok
I need this part of html: 我需要这部分html:
<tbody>
<tr>
<td class="valute"><b>CHF</b></td>
<td class="valutename">svájci frank</td>
<td class="unit">1</td>
<td class="value">284,38</td>
</tr>
<tr>
<td class="valute"><b>EUR</b></td>
<td class="valutename">euro</td>
<td class="unit">1</td>
<td class="value">308,54</td>
</tr>
<tr>
<td class="valute"><b>USD</b></td>
<td class="valutename">USA dollár</td>
<td class="unit">1</td>
<td class="value">273,94</td>
</tr>
</tbody>
Thats why i wrote a code, but something wrong with it. 这就是为什么我写了一个代码,但它有问题。 How can i fix it, where i have to change it?
我怎么能解决它,我必须改变它? I need only the "valute", "valutename", "unit" and the "value" dataes.
我只需要“valute”,“valutename”,“unit”和“value”数据。 I am working with Python 2.7.13 on Windows 7.
我在Windows 7上使用Python 2.7.13。
The error message is the next: "There's an error in your program: unindent does not match any outer indentation level" 错误消息是下一个: “程序中存在错误:unindent与任何外部缩进级别不匹配”
The code is here: 代码在这里:
import csv
import requests
from BeautifulSoup import BeautifulSoup
url = 'https://www.mnb.hu/arfolyamok'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('tbody', attrs={'class': 'stripe'})
table = str(soup)
table = table.split("<tbody>")
list_of_rows = []
for row in table[1].findAll('tr')[1:]:
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(' ', '')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
print list_of_rows
outfile = open("./inmates.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Pénznem", "Devizanév", "Egység", "Forintban kifejezett érték"])
writer.writerows(list_of_rows)
You have a space
problem in your code from the line 18 for cell in row.findAll('td'):
to line 20 list_of_cells.append(text)
. 您的代码中存在
space
问题,第18 for cell in row.findAll('td'):
第20行list_of_cells.append(text)
。 Here's the fixed Code : 这是固定代码:
import csv
import requests
from bs4 import BeautifulSoup
url = 'https://www.mnb.hu/arfolyamok'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('tbody', attrs={'class': 'stripe'})
table = str(soup)
table = table.split("<tbody>")
list_of_rows = []
for row in table[1].findAll('tr')[1:]:
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(' ', '')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
print list_of_rows
outfile = open("./inmates.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Pénznem", "Devizanév", "Egység", "Forintban kifejezett érték"])
writer.writerows(list_of_rows)
But, after executing this code, you'll face another problem, that's an character encoding error. 但是,执行此代码后,您将面临另一个问题,即字符编码错误。 It'll read "
SyntaxError: Non-ASCII character '\\xc3' in file testoasd.py on line 27, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
" 它将
SyntaxError: Non-ASCII character '\\xc3' in file testoasd.py on line 27, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
读取“ SyntaxError: Non-ASCII character '\\xc3' in file testoasd.py on line 27, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
”
How to fix that? 如何解决? Simple enough... add the shebang
# -*- coding: utf-8 -*-
at the very top of your code (1st line). 很简单......在代码的最顶部添加shebang
# -*- coding: utf-8 -*-
(第1行)。 It should fix it. 它应该解决它。
EDIT : Just noticed that you're using BeautifulSoup in wrong way and importing it wrong as well. 编辑:刚刚注意到你以错误的方式使用BeautifulSoup并导入错误。 I've fixed the import to
from bs4 import BeautifulSoup
and when using BeautifulSoup, you need to specify a parser as well. 我已经
from bs4 import BeautifulSoup
修复了导入,当使用BeautifulSoup时,你还需要指定一个解析器。 So, 所以,
soup = BeautifulSoup(html)
would become : 会成为 :
soup = BeautifulSoup(html, "html.parser")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.