Scraping specific part of HTML website ID using beautifulsoup

Question

I am trying to scrape the id of the below html (1217428), without scraping the rest of the id tag, but I have no clue how to isolate only the desired portion.

<td class="pb-15 text-center">
<a href="#" id="1217428_1_10/6/2020 12:00:00 AM" class="slotBooking">
    8:15 AM ✔ 
</a>
</td>

So far I have come up with this:

lesson_id = [] # I wish to fit the lesson id in this list
soup = bs(html, "html.parser")
slots = soup.find(attrs={"class" : "pb-15 text-center"})
tag = slots.find("a")
ID = tag.attrs["id"]
print (ID)

But this only allows me to receive this as an output:

1217428_1_10/6/2020 12:00:00 AM

Is there any way I could edit my code such that the output would be:

I have also tried using regex with this:

lesson_id = []
soup = bs(html, "html.parser")
slots = soup.find(attrs={"class" : "pb-15 text-center"})
tag = slots.find("a")
ID = tag.attrs["id"]
lesson_id.append(ID(re.findall("\d{7}")))

But I receive this error:

TypeError: findall() missing 1 required positional argument: 'string'

Answer 1

You can simply split the sting as follows:

id_list = ID.split('_',1)
#will give you ['1217428', '1_10/6/2020 12:00:00 AM']
id = id_list[0] # which is '1217428'

You can use Regular Expression as well:

match = re.search(r'\d{1,}',ID)
id = match.group() # '1217428'

Answer 2

I think this you can solve your problem by splitting the id with "_" and using the first part. (this is what I understand from your above example):

lesson_id = [] # I wish to fit the lesson id in this list
soup = bs(html, "html.parser")
slots = soup.find(attrs={"class" : "pb-15 text-center"})
tag = slots.find("a")
ID = tag.attrs["id"]
if ID:
    ID = ID.split("_")[0]
print (ID)

Scraping specific part of HTML website ID using beautifulsoup

Question

2 answers

solution1
1 ACCPTED 2020-04-08 08:08:36

solution2
1 2020-04-08 08:09:10

Scraping specific part of HTML website ID using beautifulsoup

Question

2 answers

solution1 1 ACCPTED 2020-04-08 08:08:36

solution2 1 2020-04-08 08:09:10

solution1
1 ACCPTED 2020-04-08 08:08:36

solution2
1 2020-04-08 08:09:10