简体   繁体   中英

beautifulsoup creates an incomplete result?

I want beautifulsoup to parse a page from lexico dictionary

what I want to parse is the content under this tag ul screenshot of the tag

This (sorry for the wot) is the result that

url = 'https://www.lexico.com/definition/iron'
req = requests.get(url)
soup = BeautifulSoup(req.text, 'lxml')
semb = soup.find('ul', attrs={'class':'semb'})
print(semb)

should give( I think )
However, the code here gives this (sorry again)

It seems that bs stops parsing for some reason in the middle of the second li tag. It doesn't seem anything related to javascript to me, am I wrong? Thanks anyone.

Beautifulsoup version: 4.11.1 Python: 3.9.12

Since you want just the definitions, try the following, which will get all of the definitions under each grammar type (in the example page linked, it will get the noun and verb definitions)

import requests
from bs4 import BeautifulSoup

url = 'https://www.lexico.com/definition/iron'
req = requests.get(url)
soup = BeautifulSoup(req.text, 'lxml')  #I used html.parser as I didn't have lxml installed but either should work
definitions = soup.find_all("span", class_='ind one-click-content') # note the keyword is class_ because class is reserved.

# This gave us a list of all the <spans> containing the definitions.

for n, d in enumerate(definitions, start=1):
    print(f"{n}. {d.text}")
OUTPUT:
 1. A strong, hard magnetic silvery-grey metal, the chemical element of atomic number 26, much used as a material for construction and manufacturing, especially in the form of steel.
 2. Used figuratively as a symbol or type of firmness, strength, or resistance.
 3. A tool or implement now or originally made of iron.
 4. Metal supports for a malformed leg.
 5. Fetters or handcuffs.
 6. Stirrups.
 7. A handheld implement, typically an electrical one, with a heated flat steel base, used to smooth clothes, sheets, etc.
 8. A golf club with a metal head (typically with a numeral indicating the degree to which the head is angled in order to loft the ball)
 9. A shot made with an iron.
 10. A meteorite containing a high proportion of iron.
 11. Smooth (clothes, sheets, etc.) with an iron.
 12. Firmness or ruthlessness cloaked in outward gentleness.
 13. Have a range of options or courses of action available, or be involved in many activities or commitments at the same time.
 14. Have other options or courses of action available, or be involved in other activities or commitments at the same time.
 15. Having the feet or hands fettered.
 16. (of a sailing vessel) stalled head to wind and unable to come about or tack either way.
 17. Solve or settle difficulties or problems.

If you want to get the definitions for the different parts of grammar separately, that would be possible too. Use find_all to get all with class "semb" and then use find_all on each of those to get the spans as above, and also extract the labels for whichever section it is.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM