How to remove a hyperlink tag under a header tag using beautifulSoup -

Question

I am trying to web scrape a webpage. Here I want to extract only Freelancer from the header H3. but when I run the below code I get "More jobs" which is under 'a' tag . How to extract only Freelancer from below link?

https://www.timesjobs.com/candidate/job-search.html?searchType=personalizedSearch&from=submit&txtKeywords=work+from+home&txtLocation=

my code is:

company_name = job.find('h3', class_='joblist-comp-name').text

Result is: Freelancer (More Jobs)

Expected: Freelancer

Answer 1

You can simply split the string based on space and extract the first text

from bs4 import BeautifulSoup
html="""<h3 class="joblist-comp-name">Freelancer <a class="jobs-frm-comp" href="/candidate/companySearchResult.html?from=submit&encid=V1VUNYG9OfxywnPTmYOKIg==&searchType=byCompany&luceneResultSize=25">(More Jobs)</h3>
"""
soup=BeautifulSoup(html,"lxml")
soup.find("h3",class_="joblist-comp-name").text.split(" ")[0]

Output:

'Freelancer'

Update with URL given

import requests
from bs4 import BeautifulSoup

res=requests.get("https://www.timesjobs.com/candidate/job-search.html?searchType=personalizedSearch&from=submit&txtKeywords=work+from+home&txtLocation=")
soup=BeautifulSoup(res.text,"lxml")

Here it will find main ul tag and from it find all li tag so it will return as list from that we can go for first element and we can find the text associate to it!

all_li=soup.find("ul",class_="new-joblist").find_all("li")
all_li[0].find("h3",class_="joblist-comp-name").get_text(strip=True).split("(")[0]

Output:

'Freelancer'

Answer 2

Your html is not well formed, but if it's fixed like this:

<h3 class="joblist-comp-name"> Freelancer 
  <a class="jobs-frm-comp" href="/whatever">  More Jobs</a>
</h3>

something like the below should get you there - it uses the lxml library and xpath search to zero in on the target. Obviously, you'll have to modify it to fit your actual html:

import lxml.html as lh
company = """the modified html string above"""
job = lh.fromstring(company)
job.xpath('//h3[@class="joblist-comp-name"]/text()')[0].strip()

Output:

'Freelancer'

How to remove a hyperlink tag under a header tag using beautifulSoup -

Question

2 answers

solution1
0 ACCPTED 2021-07-13 10:48:23

solution2
0 2021-07-13 10:50:24

How to remove a hyperlink tag under a header tag using beautifulSoup -

Question

2 answers

solution1 0 ACCPTED 2021-07-13 10:48:23

solution2 0 2021-07-13 10:50:24

solution1
0 ACCPTED 2021-07-13 10:48:23

solution2
0 2021-07-13 10:50:24