I'm just starting to play around with BeautifulSoup and I'm trying to create something in Python but when I scrape for the information the tags are included in the results which I do not want, is there anyway I can seperate the product ID from the tags?
Example of my results:
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
Try something like this if you want to get the data of product-id:
data = soup.find('product-id').getText()
print(data)
[i.text for i in soup('product-id')]
out:
['8422899464',
'8422899464',
'8422899464',
'8422899464',
'8422899464',
'8422899464',
'8422899464',
'8422899464',
'8422899464',
'8422899464',
'8422899464',
'8422899464',
'8422899464',
'8422899464',
'8422899464',
'8422899464']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.