简体   繁体   中英

BeautifulSoup.find_all(), can I select multiple tags and strings within those tags?

I am looking to scrape some data from a website. To preface, I am a novice. I am looking to specifically filter all the XML data returned based upon the Postal Code (postal code is under 'item_teaser' ).

<item lat="43.6437075296758" long="-80.083111524582" item_name="Acton Golf Club" item_url="http://www.ontariogolf.com/courses/acton/acton-gc/" item_teaser="4955 Dublin Line Acton, Ontario L7J 2M2"/>

Above is an example of what I am trying to pull, but I want to filter everything through specific Postal Areas (the first 3 letters ex. L7J)

Can find_all() go through item_teaser find the associated strings such as "L7J, L2S, L2O, etc." and return those matching Postal Areas including the entire item? The below code is wrong as I can't pull anything, but it's currently what I have.

from bs4 import BeautifulSoup

url = "http://www.ontariogolf.com/app/map/cfeed.php?e=-63&w=-106&n=55&s=36"
xml = requests.get(url)
# I was just seeing if I could grab everything from the website which worked when I printed.
soup = BeautifulSoup(xml.content, 'lxml')
# I am trying to show all item teasers just to try it out, but I can't seem to figure it out
tag = soup.find_all(id="item_teaser")
print(tag)

When you are doing:

tag = soup.find_all(id="item_teaser")

BeautifulSoup is looking for an HTML ID by the name of "item_teaser". However, "item_teaser" is not an id , it's an attribute .

To search for all item-teaser 's you can pass that tag as a keyword argument to BeautifulSoup :

for tag in soup.find_all(item_teaser=True):
    print(tag)

Additionarly, to access the item-teaser 's attribute , you can use tag [<attribute>]

for tag in soup.find_all(item_teaser=True):
    print(tag["item_teaser"])

You can check if multiple strings [ matches list ] exist in another string [ attribute with name = " item_teaser "]

from bs4 import BeautifulSoup
import requests

url = "http://www.ontariogolf.com/app/map/cfeed.php?e=-63&w=-106&n=55&s=36"
xml = requests.get(url)
soup = BeautifulSoup(xml.content, 'lxml')
input_tag = soup.find_all('item')

# put the list of associated strings here
matches = ["L7J", "L1S", "L2A"]

# print the result
for tag in input_tag:
    text= tag["item_teaser"]    
    if any(x in text for x in matches):
        print(text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM