简体   繁体   English

如何在抓取网站时排除带有标签的特定文本?

[英]How do I exclude a particular text with tag while scraping a website?

So,I am trying to scrape a website.,所以,我正在尝试抓取一个网站。,

import requests
#importing modules
search = requests.get('https://www.timesjobs.com/candidate/job-search.html?searchType=personalizedSearch&from=submit&txtKeywords=C%23&txtLocation=').text
soup = BeautifulSoup(search,'lxml')
jobs = soup.find_all("li",class_="clearfix job-bx wht-shd-bx")
for i in jobs:
    date_publishment = i.find("span",class_= "sim-posted").span.text
    if "few" in date_publishment:
        company_name = i.find("h3",class_= "joblist-comp-name" ).text.replace(" ","")
        company_skills = i.find("span",class_="srp-skills").text.replace(" ","")
        description =i.find("ul",class_='list-job-dtl clearfix').text
        #prints data---v
        print(f"Company Name:{company_name.strip()}")
        print(f"Skills:{company_skills.strip()}")
        print(f"Description:{description}")
        print("")
<li>
      <label>Job Description:</label>
Sophus is  looking for a Full stack developer with good experience in Dot net technologies for our product Talpal.What is the product you will work for?Talpal is a cloud based... <a href="https://www.timesjobs.com/job-detail/c-net-full-stack-developer-sophus-infotech-india-private-limited-chennai-2-to-4-yrs-jobid-w1ZrmDvR5__PLUS__1zpSvf__PLUS__uAgZw==&amp;source=srp" target="_blank">More Details</a>
      </li>

So,while trying to scrape out the description there are some issues being other tag's text included in the main(li)tag.So,Is there any way that I can only scrape out only Sophus is looking for a Full stack developer with good experience in Dot net technologies for our product Talpal.What is the product you will work for?Talpal is a cloud based...因此,在尝试删除描述时,存在一些问题是主(li)标签中包含其他标签的文本。所以,有什么方法可以让我只能删除Sophus 正在寻找具有良好经验的全栈开发人员在我们的产品 Talpal 的点网技术中。您将工作的产品是什么?Talpal 是一个基于云的...

You can use:contains to target the right label tag then next_sibling to move to the desciption.您可以使用:contains 定位正确的 label 标签,然后使用 next_sibling 移动到描述。 Eg within loop over job:例如在循环工作中:

i.select_one('label:contains("Job Description:")').next_sibling.strip()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM