简体   繁体   English

BeautifulSoup、Selenium、Python 数据提取问题与 For 循环

[英]BeautifulSoup, Selenium, Python Data Extraction Problem With For Looping

I need to loop on 'ul' tag as I show my script.当我显示我的脚本时,我需要在“ul”标签上循环。 Any one can help me how to make it?任何人都可以帮助我如何制作它? Thank you very much for your time.非常感谢您的宝贵时间。

here is my code:这是我的代码:

As I said, I want to loop through UL tag which include at least 10 LI tag.正如我所说,我想循环遍历至少包含 10 个 LI 标签的 UL 标签。 And I want to extract LI tag's texts.我想提取 LI 标签的文本。 But I couldnt find a way to loop inside that UL tag.但我找不到在那个 UL 标记内循环的方法。

page_source = driver.page_source
soup = BeautifulSoup(page_source, features='html.parser')
searchResCon = soup.find('div', {'class':'search-results-container'})
followerCol = searchResCon.find('div', {'class':'ph0 pv2 artdeco-card mb2'})
searchList = followerCol.find('ul', {'class':'reusable-search__entity-result-list 
list-style-none'})
singleCon = searchList.find('li', {'class':'reusable-search__result-container'})

for li in searchList: #I want to loop at inside 'ul' tag which equal to searchList 
        #variable
        #that ul tag has at least 10 'li' tag. I want to iterate over 'ul'.


        #here is the information that I collect with their precise name and variables 
        inside 
        of 'ul' tag which these infos inside 'li' s.

        name = singleCon.find('span', {'aria-hidden':'true'}).get_text().strip()
        title = singleCon.find('div', {'class':'entity-result__primary-subtitle t-14 
        t-black t-normal'}).get_text().strip()
        location = singleCon.find('div', {'class':'entity-result__secondary-subtitle 
        t-14 t- normal'}).get_text().strip()
        hashtag = singleCon.find('p', {'class':'entity-result__summary entity 
        result__summary--2-lines t-12 t-black--light mb1'}).get_text().strip()
        follower = singleCon.find('span',{'class':'entity-result__simple-insight-text 
        entity- 
        result__simple-insight-text--small'}).get_text().strip()

        #I have list called contactsInfo and I am appending whole information to this 
        list.

        contactsInfo.append(f'-' * 30)
        contactsInfo.append('\n')
        contactsInfo.append(f'-' * 30)
        contactsInfo.append('\n')
        contactsInfo.append(f'Name: {name}')
        contactsInfo.append('\n')
        contactsInfo.append(f'Title: {title}')
        contactsInfo.append('\n')
        contactsInfo.append(f'Location: {location}')
        contactsInfo.append('\n')
        contactsInfo.append(f'Hashtag: {hashtag}')
        contactsInfo.append('\n')
        contactsInfo.append(f'Follower & Mutual: {follower}')
        contactsInfo.append('\n')

When I added find_all object to searchList variable, then soup raise me error like this;当我将 find_all 对象添加到 searchList 变量时,汤会引发这样的错误;

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [242], in <cell line: 8>()
      6 singleCon = searchList.find_all('li', {'class':'reusable-search__result-container'})
      8 for li in singleCon:
---> 10     name = singleCon.find('span', {'aria-hidden':'true'}).get_text().strip()
     11     title = singleCon.find('div', {'class':'entity-result__primary-subtitle t-14 t-black t-normal'}).get_text().strip()
     12     location = singleCon.find('div', {'class':'entity-result__secondary-subtitle t-14 t-normal'}).get_text().strip()

File ~/Desktop/linkedin/emv/lib/python3.10/site-packages/bs4/element.py:2289, in ResultSet.__getattr__(self, key)
   2287 def __getattr__(self, key):
   2288     """Raise a helpful exception to explain a common code fix."""
-> 2289     raise AttributeError(
   2290         "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key
   2291     )

AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

Thank you very much for your time.非常感谢您的宝贵时间。

By the way I use Python 3.10 and jupyter notebook.顺便说一句,我使用 Python 3.10 和 jupyter notebook。

Code and error message don't really fit together.代码和错误消息并不真正适合在一起。 Based on the message, it should rather look like this根据消息,它应该看起来像这样

Use li to find your information, not singleCon that is still the ResultSet you are iterating for each li :使用li查找您的信息,而不是仍然是您为每个li迭代的ResultSetsingleCon

singleCon = searchList.find_all('li', {'class':'reusable-search__result-container'})
for li in singleCon:
    name = li.find('span', {'aria-hidden':'true'}).get_text().strip()
    title = li.find('div', {'class':'entity-result__primary-subtitle t-14 t-black t-normal'}).get_text().strip()
    location = li.find('div', {'class':'entity-result__secondary-subtitle t-14 t-normal'}).get_text().strip()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM