[英]how i can split word and number after scraping website with BeautifulSoup?
It's look difficult to me to scrap data from website and that data is inside a table. 对于我来说,从网站上抓取数据很困难,而且数据位于表格内。 I use BeautifulSoup and urllib from Python and when i run the program, it's look like this IndexAceh5.82Bali6.23Banten5.85Bengkulu4.81DKI6.
我使用Python中的BeautifulSoup和urllib,当我运行该程序时,它看起来像这个IndexAceh5.82Bali6.23Banten5.85Bengkulu4.81DKI6.
. 。 How i can remove Index
, split word like Aceh
and number 5.82
into something like this 我如何删除Index
,将Aceh
单词和5.82
拆分成这样的内容
prov = ['Aceh', 'Bali']
number = [5.82, 6.23]
and this is my code and website link : 这是我的代码和网站链接:
import urllib2
from bs4 import BeautifulSoup
quote_page = "MY LINK"
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, "html.parser")
pemerintah = soup.find("table", attrs={"cellspacing": "0"}); #cellspacing="0"
name = pemerintah.text.strip()
print name
I found same case in here , but when i try, it not working because on my case i have .
我在这里找到了同样的情况,但是当我尝试时,它不起作用,因为就我而言,我有.
and if i use ade12.3
for example it will give me result ade, 12
, not ade, 12.3
如果我使用ade12.3
例如它将给我结果ade, 12
而不是ade, 12.3
Use the th
& td
tags to search. 使用th
& td
标签进行搜索。
Ex: 例如:
import urllib2
from bs4 import BeautifulSoup
quote_page = "http://www.kemitraan.or.id/igi/index.php/index.php?option=com_content&view=article&id=235"
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, "html.parser")
pemerintah = soup.find("table", attrs={"cellspacing": "0"}); #cellspacing="0"
for i in pemerintah.find_all("tr"):
if i.find("th"):
print i.th.text, " = ", i.td.text
Output: 输出:
Aceh = 5.82
Bali = 6.23
Banten = 5.85
Bengkulu = 4.81
....
There are easier ways to get the values you want with BS4. 有更简单的方法来获取所需的BS4值。 But if you want to work with strings, you can use re. 但是,如果要使用字符串,则可以使用re。
import re
y = 'IndexAceh5.82Bali6.23Banten5.85Bengkulu4.81'
k = re.split('(\w+)(\d.?\.\d.?)', y.replace('Index',''))
k = [i for i in k if i] #removes ‘’
prov = [item for i,item in enumerate(k) if i%2==0]
num = [item for i,item in enumerate(k) if i%2!=0]
del y,k,i,item #cleaning
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.