[英]python BeautifulSoup soup.findAll(), how to make search result match
I recent learned BeautifulSoup and as an exercise , I want to use BeautifulSoup to read and extract company and location information from job posting.here is my code: 我最近学习了BeautifulSoup,作为练习,我想使用BeautifulSoup从工作发布中读取和提取公司和位置信息。我的代码是:
import urllib
from BeautifulSoup import *
url="http://www.indeed.com/jobs?q=hadoop&start=50"
html=urllib.urlopen(url).read()
soup=BeautifulSoup(html)
company=soup.findAll("span",{"class":"company"})
location=soup.findAll("span",{"class":"location"})
# for c in company:
# print c.text
# print
# for l in location:
# print l.text
print len(company)
print len(location)
I found the length of company and location are not same. 我发现公司的长度和位置不一样。 So I don't know which (company, location) pair is incomplete.
所以我不知道哪一对(公司,地点)不完整。 How can I make them match?
我怎样才能让它们匹配?
You need to iterate over search results block and get the company-location pairs for each block : 您需要遍历搜索结果块并获取每个块的公司位置对 :
for result in soup.find_all("div", {"class": "result"}): # or soup.select("div.result")
company = result.find("span", {"class": "company"}).get_text(strip=True)
location = result.find("span", {"class": "location"}).get_text(strip=True)
print(company, location)
You should also switch to BeautifulSoup4
, the version you are using is quite old: 您还应该切换到
BeautifulSoup4
,您使用的版本已经很老了:
pip install beautifulsoup4
And replace: 并替换:
from BeautifulSoup import *
with: 有:
from bs4 import BeautifulSoup
The code above prints: 上面的代码打印:
(u'PsiNapse', u'San Mateo, CA')
(u'Videology', u'Baltimore, MD')
(u'Charles Schwab', u'Lone Tree, CO')
(u'Cognizant', u'Dover, NH')
...
(u'Concur', u'Bellevue, WA')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.