简体   繁体   中英

BeautifulSoup get all different attribute values in divs of given class

Let's say I have html file with divs like that:

<div class="message" title="user1"> <span> Hey </span> </div>
<div class="message" title="user1"> <span> It's me </span> </div>
<div class="message" title="user2"> <span> Hi </span> </div>
<div class="message" title="user3"> <span> Ola </span> </div>

How can I get list of all users sending messages?

If I use find method I get only first user, if I use find_all I get user1 two times.

Can I somehow make it in one step without deleting duplicates in list made by find_all ?

here's the 2 ways I can only think of doing it:

import bs4

r = '''<div class="message" title="user1"> <span> Hey </span> </div>
<div class="message" title="user1"> <span> It's me </span> </div>
<div class="message" title="user2"> <span> Hi </span> </div>
<div class="message" title="user3"> <span> Ola </span> </div>'''

soup = bs4.BeautifulSoup(r,'html.parser')
messages = soup.find_all('div', {'class':'message'})

users_list = []   

for user in messages:
    user_id = user.get('title')
    if user_id not in users_list:
        users_list.append(user_id)

or

import bs4

r = '''<div class="message" title="user1"> <span> Hey </span> </div>
<div class="message" title="user1"> <span> It's me </span> </div>
<div class="message" title="user2"> <span> Hi </span> </div>
<div class="message" title="user3"> <span> Ola </span> </div>'''

soup = bs4.BeautifulSoup(r,'html.parser')
messages = soup.find_all('div', {'class':'message'})

users_list = list(set([ user.get('title') for user in messages ]))

You could use a custom finder function

seen_users = set()
def users(tag):
    username = tag.get('title')
    if username and 'message' in tag.get('class', ''):
        seen_users.add(username)
        return True

tags = soup.find_all(users)
print(seen_users)  # {'user1', 'user2', 'user3'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM