
[英]BeautifulSoup4 - Concatenating multiple html elements between two different tags for batch processing url
[英]BeautifulSoup4 - Concatenating multiple html elements between two different tags
我正在使用 Python 和 bs4 抓取页面
我从 bs4 得到的 html 源代码如下(为了可读性做了一些清理):
<p style="text-align:justify;font-size:12.0px;font-family:Arial, Helvetica, sans-serif">
<span style="font-size:14.0px"><span style="font-family:Arial, Helvetica, sans-serif">
<strong>COMPANY DESCRIPTION</strong><br>
Here goes the first para of company description</span></span></p>
<p style="text-align:justify;font-size:12.0px;font-family:Arial, Helvetica, sans-serif">
<span style="font-size:14.0px"><span style="font-family:Arial, Helvetica, sans-serif">
Here goes the second para of company description</span></span></p>
<p><strong>PURPOSE AND OBJECTIVES</strong></p>
<p>To address requirements in the area of Supply Chain Management Extended Warehouse Management solutions, Build competencies at Solution Delivery Center to deliver solutions<br>
<strong>EXPECTATIONS AND TASKS </strong></p>
<ul>
<li>Independently handle large implementation projects with focus on Warehouse Management processes such as inbound, outbound and internal processes. RF Device functions and Barcode support experience is desirable</li>
<li>Able to lead EWM discussions, assessments and detail requirement studies with customers</li>
</ul>
<strong>KEY PERFORMANCE INDICATORS</strong></p>
<ul>
<li>Customer Feedback/customer satisfaction scores</li>
<li>Productive days/utilization as defined by the organization for projects/assessments/etc.</li>
<li>Knowledge Management and creation of effective reusable components</li>
</ul>
<strong>EXPERIENCE REQUIREMENTS</strong></p>
<ul>
<li>Minimum of 4+ years industry experience and a minimum of 5 to 6 years of SAP EWM experience</li>
<li>Domain knowledge in Supply Chain Management in the areas of Planning, Manufacturing & warehousing processes is a must</li>
</ul>
<p><strong>EDUCATION AND QUALIFICATIONS/SKILLS AND COMPETENCIES</strong></p>
<ul>
<li>Degree in Engineering or IT</li>
<li>SAP Certification in Extended Warehouse Management (EWM) desirable</li>
</ul>
<p><span style="font-family:Arial,Helvetica,sans-serif"><span style="font-size:14.0px"><strong>WHAT YOU GET FROM US </strong></span></span></p>
观察:
在上面的代码中,所有章节标题都在<strong> </strong>
标签之间。 不同页面的标题可能不同。
我的要求:
<strong>
标签开始,即从目的和目标开始,并在包含你从我们那里得到的东西的标签之前结束。我正在抓取的页面是链接我正在抓取
这是我的python代码peice:
def scrape_url(url, method='bs4'):
session = requests.session()
page = session.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
return soup
url = 'https://jobs.sap.com/job/Mumbai-Senior-Account-Executive-Job-MH/539212101/'
soup = scrape_url(url)
job_page = soup.body.find('div', attrs={'class': 'job'})
print(job_page)
首先使用正则表达式识别带有文本的标签,然后使用find_next_siblings()
获取所有下一个兄弟姐妹,然后检查是否有any siblings contains
文本WHAT YOU GET FROM US
代码:
import re
import requests
from bs4 import BeautifulSoup
def scrape_url(url, method='bs4'):
session = requests.session()
page = session.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
return soup
url = 'https://jobs.sap.com/job/Kuala-Lumpur-Business-Processes-Consultant-%28FICO%29-Job-14/541909901/'
soup = scrape_url(url)
findtag=soup.find('p',text=re.compile("PURPOSE AND OBJECTIVES"))
print(findtag.text)
for item in findtag.find_next_siblings():
if 'WHAT YOU GET FROM US' in item.text:
break
else:
print(item.text.strip())
输出:在控制台上
PURPOSE AND OBJECTIVES
To address requirements in the area of Supply Chain Management Extended Warehouse Management solutions, Build competencies at Solution Delivery Center to deliver solutions especially in areas relating to SAP EWM
EXPECTATIONS AND TASKS
Independently handle large implementation projects with focus on Warehouse Management processes such as inbound, outbound and internal processes. RF Device functions and Barcode support experience is desirable
Able to lead EWM discussions, assessments and detail requirement studies with customers
Leading the team that are assigned to, in functional capacity, adding value to the project and to the final deliverables
Be actively involved in the preparation, conception, realization and Go Live of customer implementation projects
Demonstrate the ability to plan, run, and manage blueprint workshops / meetings with internal and external clients
Responsible for defining the scope of a project / opportunities, estimating efforts and project timelines
Participating in RFP discussions and estimating under guidance from a Bid Manager
Providing a creative source of ideas/solutions to address problems
Delivering billable components that meets a customer’s needs
KEY PERFORMANCE INDICATORS
Customer Feedback/customer satisfaction scores
Productive days/utilization as defined by the organization for projects/assessments/etc.
Knowledge Management and creation of effective reusable components
EXPERIENCE REQUIREMENTS
Minimum of 4+ years industry experience and a minimum of 5 to 6 years of SAP EWM experience
Domain knowledge in Supply Chain Management in the areas of Planning, Manufacturing & warehousing processes is a must
Must have strong ERP implementation experience
Experience in SAP Material Flow Systems (MFS) or any other third party automation tools will be desirable
Experience in EWM technical knowledge will be an added advantage
Knowledge on SAP S/4HANA Public Cloud solution and SAP IOT/Leonardo portfolio will be preferred but not mandatory
Good understanding of S/4HANA Order to Cash and Procure to Pay business processes
Good understanding of SAP ACTIVATE implementation methodology
Use of Solution Manager as a part of implementation life cycle is desirable
Good Communication skill in English.
EDUCATION AND QUALIFICATIONS/SKILLS AND COMPETENCIES
Degree in Engineering or IT
SAP Certification in Extended Warehouse Management (EWM) desirable
Minimum 4 to 5 full life cycle SAP EWM implementations
Strong knowledge in SAP SCM Extended Warehouse Management Solutions and S/4HANA Embedded EWM Solution
Good integration knowledge with other components with SAP S/4HANA (WM, SD, MM, PP) and other SAP or Non-SAP legacy applications
Knowledge of SCOR, APICS certification preferable
Strong client-facing experience and well-developed customer focus
Solid oral and written communication skills, with the demonstrated ability to communicate complex technical topics to management and non-technical audiences
Mobility is must – candidate must be ready to travel to project locations (short term and long term)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.