简体   繁体   中英

How to pull some information from a string with Python?

I'm just starting to play around with BeautifulSoup and I'm trying to create something in Python but when I scrape for the information the tags are included in the results which I do not want, is there anyway I can seperate the product ID from the tags?

Example of my results:

<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>
<product-id type="integer">8422899464</product-id>

Try something like this if you want to get the data of product-id:

data = soup.find('product-id').getText()
print(data)
[i.text for i in soup('product-id')]

out:

['8422899464',
 '8422899464',
 '8422899464',
 '8422899464',
 '8422899464',
 '8422899464',
 '8422899464',
 '8422899464',
 '8422899464',
 '8422899464',
 '8422899464',
 '8422899464',
 '8422899464',
 '8422899464',
 '8422899464',
 '8422899464']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM