简体   繁体   中英

How do use the soup.find, soup.find_all

Here is my code and the output

import requests from bs4 import BeautifulSoup

res = requests.get("https://www.jobberman.com/jobs")
soup = BeautifulSoup(res.text, "html.parser")
job = soup.find("div", class_ = "relative inline-flex flex-col w-full text-sm font-normal pt-2")
company_name = job.find('a[href*="jobs"]')
print(company_name)

output is none

None

But when i use the select method, i got the desired result but cant use.text on it

import requests
from bs4 import BeautifulSoup

res = requests.get("https://www.jobberman.com/jobs")
soup = BeautifulSoup(res.text, "html.parser")
job = soup.find("div", class_ = "relative inline-flex flex-col w-full text-sm font-normal pt-2")
company_name = job.select('a[href*="jobs"]').text
print(company_name)

output

AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

Change your selection strategy - Cause main issue here is, that not all company names are linked:

job.find('div',{'class':'search-result__job-meta'}).text.strip()

or

job.select_one('.search-result__job-meta').text.strip()

Example

Also store your information in a structured way for post processing:

import requests
from bs4 import BeautifulSoup

res = requests.get("https://www.jobberman.com/jobs")
soup = BeautifulSoup(res.text, "html.parser")
data = []
for job in soup.select('div:has(>.search-result__body)'):
    data.append({
        'job':job.h3.text,
        'company':job.select_one('.search-result__job-meta').text.strip()
    })
data
Output
[{'job': 'Restaurant Manager', 'company': 'Balkaan Employments service'},
 {'job': 'Executive Assistant', 'company': 'Nolla Fresh & Frozen ltd'},
 {'job': 'Portfolio Manager/Instructor 1', 'company': 'Fun Science World'},
 {'job': 'Microbiologist', 'company': "NEIMETH INT'L PHARMACEUTICALS PLC"},
 {'job': 'Data Entry Officer', 'company': 'Nkoyo Pharmaceuticals Ltd.'},
 {'job': 'Chemical Analyst', 'company': "NEIMETH INT'L PHARMACEUTICALS PLC"},
 {'job': 'Senior Front-End Engineer', 'company': 'Salvo Agency'},...]

The problems with your search strategy has been covered by comments and answers posted earlier. I am offering a solution for your problem which involves the use of regex library, along with the find_all() function call:

    import requests
    from bs4 import BeautifulSoup
    import re
    
    res = requests.get("https://www.jobberman.com/jobs")
    soup = BeautifulSoup(res.text, "html.parser")
    company_name = soup.find_all("a", href=re.compile("/jobs\?"), rel="nofollow")
    for i in range(len(company_name)):
        print(company_name[i].text)

Output:

GRATIAS DEI NIGERIA LIMITED

Balkaan Employments service

Fun Science World

NEIMETH INT'L PHARMACEUTICALS PLC

Nkoyo Pharmaceuticals Ltd.

...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM