简体   繁体   中英

how to extract the text from the div tag using BeautifulSoup and python

I am trying to extract the text that exist inside a div tag using BeautifulSoup package in python.

example I want to extract the text inside the tag <p></p>

and the text inside <dt> and <dd>

When I run the code the system crash and display the below error:

--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in 60 # # # article_body = s.find('div', {'class':'card-content t-small bt p20'}).text 61 # text_info = s.find_all("div",{"class":"card-content is-spaced"}) ---> 62 text_desc = text_info.find('div', attrs={'class':'card-content t-small bt p20'}).getText(strip=True) 63 64 print(f"{date_published} {title}\n\n{text_desc}\n", "-" * 80)

f:\aienv\lib\site-packages\bs4\element.py in getattr (self, key)
2172 """Raise a helpful exception to explain a common code fix.""" 2173 raise AttributeError( -> 2174 "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key 2175
)

AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

html

<div class="card-content t-small bt p20" style="max-height:50vh" data-viewsize='{"d":{"height": {"max": 1}}, "offset":"JobSearch.jobViewSize"}'>
<h2 class="h6">Job Description</h2>
<p>The Executive Chef has full knowledge and capability of managing the general operations of the kitchen, specialty outlets kitchen including Stewarding.</p>
<h2 class="h6 p10t">Skills</h2>
<p>•  Provide, develop, train and maintain a professional workforce• Excellent in English both in oral and written.• Computer knowledge is required and good in correspondences and reports writing.</p>
<h2 class="h6 p10t">Job Details</h2>
<dl class="dlist is-spaced is-fitted t-small m0">
<div>
<dt>Job Location</dt>
<dd> Al Olaya, Riyadh , Saudi Arabia </dd>
</div>
<div>
<dt>Company Industry</dt>
<dd>Food & Beverage Production; Entertainment; Catering, Food Service, & Restaurant</dd>
</div>
<div>
<dt>Company Type</dt>
<dd>Employer (Private Sector)</dd>
</div>
<div>
<dt>Job Role</dt>
<dd>Hospitality and Tourism</dd>
</div>
<div>
<dt>Employment Type</dt>
<dd>Unspecified</dd>
</div>
<div>
<dt>Monthly Salary Range</dt>
<dd>$4,000 - $5,000</dd>
</div>
<div>
<dt>Number of Vacancies</dt>
<dd>1</dd>
</div>
</dl>
<h2 class="h6 p10t">Preferred Candidate</h2>
<dl class="dlist is-spaced is-fitted t-small m0">
<div>
<dt>Career Level</dt>
<dd>Management</dd>
</div>
<div>
<dt>Years of Experience</dt>
<dd>Min: 10 Max: 20</dd>
</div>
<div>
<dt>Residence Location</dt>
<dd> Riyadh, Saudi Arabia ; Algeria; Bahrain; Comoros; Djibouti; Egypt; Iraq; Jordan; Kuwait; Lebanon; Libya; Mauritania; Morocco; Oman; Palestine; Qatar; Saudi Arabia; Somalia; Sudan; Syria; Tunisia; United Arab Emirates; Yemen</dd>
</div>
<div>
<dt>Gender</dt>
<dd>Male</dd>
</div>
<div>
<dt>Age</dt>
<dd>Min: 26 Max: 55</dd>
</div>
</dl>
</div>

================================================

code:

import time
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    requests.get("https://www.bayt.com/en/international/jobs/executive-chef-jobs/").content,
    "lxml"
)

links = []
for a in soup.select("h2.m0.t-regular a"):
    if a['href'] not in links:
        links.append("https://www.bayt.com"+ a['href'])

for link in links:
    s = BeautifulSoup(requests.get(link).content, "lxml")
    text_info = s.find_all("div",{"class":"card-content is-spaced"})
    text_desc = text_info.find('div', attrs={'class':'card-content t-small bt p20'}).getText(strip=True)
    
    print(f"{date_published} {title}\n\n{text_desc}\n", "-" * 80)

you are doing a find_all and then using it, maybe you need to do a loop for text in text_info: and extract the information of the loop. if you want the first div use find instead of find_all

Hope that could help you!

To get the jobdesc and other details use the following css selector.

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    requests.get("https://www.bayt.com/en/international/jobs/executive-chef-jobs/").content,"lxml")

links = []
for a in soup.select("h2.m0.t-regular a"):
    if a['href'] not in links:
        links.append("https://www.bayt.com"+ a['href'])

for link in links:
    print(link)
    s = BeautifulSoup(requests.get(link).content, "lxml")
    jobdesc=s.select_one("div[class='card-content is-spaced'] p")
    print(jobdesc.text)
    alldt = [dt.text for dt in s.select("div[class='card-content is-spaced'] dt")]
    print(alldt)
    alldt = [dd.text for dd in s.select("div[class='card-content is-spaced'] dd")]
    print(alldt)
    print("-" * 80) 

Console Output:

https://www.bayt.com/en/qatar/jobs/executive-chef-4276199/
The ideal candidate is a seasoned chef with a background in fine dining. You will run an efficient kitchen by consistently looking to improve the menu, producing quality food, and working closely with rthe other staffs in the overall food and beverage operations of the palace.

['Job Location', 'Company Industry', 'Company Type', 'Job Role', 'Employment Type', 'Monthly Salary Range', 'Number of Vacancies', 'Career Level', 'Years of Experience', 'Residence Location', 'Gender', 'Nationality', 'Degree', 'Age']
[' Doha, Qatar ', 'Food & Beverage Production', 'Employer (Private Sector)', 'Management', 'Contractor', 'Unspecified', '2', 'Senior Executive', 'Min: 5', 'India; Lebanon', 'Male', 'Bahrain; Kuwait; Oman; Qatar; Saudi Arabia; United Arab Emirates', 'Certification / diploma', 'Min: 36']
--------------------------------------------------------------------------------
https://www.bayt.com/en/saudi-arabia/jobs/executive-chef-for-5-star-hotel-4274940/
The Executive Chef has full knowledge and capability of managing the general operations of the kitchen, specialty outlets kitchen including Stewarding. Responsibility includes food preparations that are used for banqueting, conferences, outside events, and catering. Basically ensures the culinary dishes are of high-quality prepared and served to enhance the guest experience. Monitors local competitors and compare their operations with the Food & Beverage Preparation enable to modify and develop a popular menu as needed so they remain effective for the purpose of the restaurants and other establishments. Also performs many administrative tasks including kitchen item requisition, ordering supplies, and maintain the highest professional food quality, hygiene, and sanitation standards.

['Job Location', 'Company Industry', 'Company Type', 'Job Role', 'Employment Type', 'Monthly Salary Range', 'Number of Vacancies', 'Career Level', 'Years of Experience', 'Residence Location', 'Gender', 'Age']
[' Al Olaya, Riyadh , Saudi Arabia ', 'Food & Beverage Production; Entertainment; Catering, Food Service, & Restaurant', 'Employer (Private Sector)', 'Hospitality and Tourism', 'Unspecified', '$4,000 - $5,000', '1', 'Management', 'Min: 10 Max: 20', ' Riyadh,Saudi Arabia ; Algeria; Bahrain; Comoros; Djibouti; Egypt; Iraq; Jordan; Kuwait; Lebanon; Libya; Mauritania; Morocco; Oman; Palestine; Qatar; Saudi Arabia; Somalia; Sudan; Syria; Tunisia; United Arab Emirates; Yemen', 'Male', 'Min: 26 Max: 55']
--------------------------------------------------------------------------------
https://www.bayt.com/en/saudi-arabia/jobs/executive-chef-4273678/

['Job Location', 'Company Industry', 'Company Type', 'Job Role', 'Employment Type', 'Monthly Salary Range', 'Number of Vacancies', 'Career Level', 'Residence Location']
[' Riyadh, Saudi Arabia ', 'Hospitality & Accomodation', 'Employer (Private Sector)', 'Hospitality and Tourism', 'Unspecified', 'Unspecified', 'Unspecified', 'Management', 'Saudi Arabia']
--------------------------------------------------------------------------------
https://www.bayt.com/en/other/jobs/executive-chef-4-58272955/
 Unit Description:  Artisan Restaurant Collection has a great Executive Chef 4 (resource lasting up-to6 months)opportunity in the Los Angeles area of California for a new piece of business.  The Artisan Restaurant Collection was imagined and created in California by a market need for local sustainable, chef driven, farm to fork food created with love.  The Executive Chef 4 will have total culinary responsibilities including the supervision ofhourly staff with a focus on amazing fresh food for this location.  The Ideal candidate must have 
['Job Location', 'Company Industry', 'Company Type', 'Job Role', 'Employment Type', 'Monthly Salary Range', 'Number of Vacancies']
['Other', 'Other Business Support Services', 'Unspecified', 'Hospitality and Tourism', 'Full Time Employee', 'Unspecified', 'Unspecified']
--------------------------------------------------------------------------------
https://www.bayt.com/en/other/jobs/executive-chef-3-58273086/
 Unit Description:  Artisan Restaurant Collection has a great Executive Chef 3 opportunity in San Jose, California for a new business venture.  The Artisan Restaurant Collection was imagined and created in California by a market need for local sustainable, chef driven, farm to fork food created with love.  The Executive Chef 3 will have total culinary responsibilities including the supervision ofhourly staff with a focus on amazing Asian food for this location.  The Ideal candidate must have 
['Job Location', 'Company Industry', 'Company Type', 'Job Role', 'Employment Type', 'Monthly Salary Range', 'Number of Vacancies']
['Other', 'Other Business Support Services', 'Unspecified', 'Hospitality and Tourism', 'Full Time Employee', 'Unspecified', 'Unspecified']
--------------------------------------------------------------------------------
so on..............

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM