簡體   English   中英

BeautifulSoup - 在嵌套的 for 循環內將字符串值轉換為 int 然后排序

[英]BeautifulSoup - Converting string values into int inside of nested for loop then sort

我試圖弄清楚如何在抓取的 for 循環中將字符串值轉換為 int 以便按 int (下面腳本中的“視圖”)進行排序。

下面是問題的簡要視圖。 包括返回字符串的工作腳本、我解決問題的失敗嘗試以及所需的輸出。

返回字符串的工作腳本:

import requests  
from bs4 import BeautifulSoup  
import pprint

res = requests.get('https://www.searchenginejournal.com/category/news/')
soup = BeautifulSoup(res.text, 'html.parser')
links = soup.find_all('h2', class_='sej-ptitle')
subtext = soup.find_all('ul', class_='sej-meta-cells')


def sort_stories_by_views(sejlist):
    return sorted(sejlist, key=lambda k: k['views'], reverse=True)


def create_custom_sej(links, subtext):
    sej = []

    for idx, item in enumerate(links):
        title = links[idx].getText()
        href = links[idx].a.get('href', None)
        views = subtext[idx].find_all(
            'li')[2].text.strip().replace(' Reads', '')
        sej.append({'title': title, 'link': href, 'views': views})
    return sort_stories_by_views(sej)


create_custom_sej(links, subtext)
pprint.pprint(create_custom_sej(links, subtext))

在上面,輸出包含如下所示的字典:

 {
'link': 'https://www.searchenginejournal.com/google-answers-if-site-section-can-impact-ranking-scores-of-
'title': 'Google Answers If Site Section Can Impact Ranking Score of Entire ''Site                ',
'views': '4.5K'
}

所需的輸出是:

 {
'link': 'https://www.searchenginejournal.com/google-answers-if-site-section-can-impact-ranking-scores-of-
'title': 'Google Answers If Site Section Can Impact Ranking Score of Entire ''Site                ',
'views': '4500'
}

我解決問題的失敗嘗試如下。 下面的腳本返回單個值,而不是所有適用值的列表,但老實說,我不確定我是否以正確的方式進行處理。

import requests
from bs4 import BeautifulSoup
import pprint

res = requests.get('https://www.searchenginejournal.com/category/news/')
soup = BeautifulSoup(res.text, 'html.parser')
links = soup.find_all('h2', class_='sej-ptitle')
subtext = soup.find_all('ul', class_='sej-meta-cells')


def sort_stories_by_views(sejlist):
    return sorted(sejlist, key=lambda k: k['views'], reverse=True)


def create_custom_sej(links, subtext):
    sej = []

    for idx, item in enumerate(links):
        title = links[idx].getText()
        href = links[idx].a.get('href', None)
        views = subtext[idx].find_all(
            'li')[2].text.strip().replace(' Reads', '').replace(' min  read', '')
# below is my unsuccessful attempt to change the strings to int
        for item in views:
            if views:
                multiplier = 1
                if views.endswith('K'):
                    multiplier = 1000
                    views = views[0:len(views)-1]
                return int(float(views) * multiplier)
            else:
                return views
        sej.append({'title': title, 'link': href, 'views': views})
    return sort_stories_by_views(sej)


create_custom_sej(links, subtext)
pprint.pprint(create_custom_sej(links, subtext))

任何幫助,將不勝感激!

謝謝。

您可以嘗試使用此代碼將視圖轉換為整數:

import requests  
from bs4 import BeautifulSoup  
import pprint

res = requests.get('https://www.searchenginejournal.com/category/news/')
soup = BeautifulSoup(res.text, 'html.parser')
links = soup.find_all('h2', class_='sej-ptitle')
subtext = soup.find_all('ul', class_='sej-meta-cells')


def convert(views):
    if 'K' in views:
        return int( float( views.split('K')[0] ) * 1000 )
    else:
        return int(views)

def sort_stories_by_views(sejlist):
    return sorted(sejlist, key=lambda k: k['views'], reverse=True)


def create_custom_sej(links, subtext):
    sej = []

    for idx, item in enumerate(links):
        title = links[idx].getText()
        href = links[idx].a.get('href', None)
        views = item.parent.find('i', class_='sej-meta-icon fa fa-eye')
        views = views.find_next(text=True).split()[0] if views else '0'
        sej.append({'title': title, 'link': href, 'views': convert(views)})
    return sort_stories_by_views(sej)


create_custom_sej(links, subtext)
pprint.pprint(create_custom_sej(links, subtext))

印刷:

[{'link': 'https://www.searchenginejournal.com/microsoft-clarity-analytics/385867/',
  'title': 'Microsoft Announces Clarity – Free Website '
           'Analytics                ',
  'views': 11000},
 {'link': 'https://www.searchenginejournal.com/wordpress-5-6-feature-removed-for-subpar-experience/385414/',
  'title': 'WordPress 5.6 Feature Removed For Subpar '
           'Experience                ',
  'views': 7000},
 {'link': 'https://www.searchenginejournal.com/whatsapp-shopping-payment-customer-service/385362/',
  'title': 'WhatsApp Announces Shopping and Payment Tools for '
           'Businesses                ',
  'views': 6800},
 {'link': 'https://www.searchenginejournal.com/google-noindex-meta-tag-proper-use/385538/',
  'title': 'Google Shares How Noindex Meta Tag Can Cause '
           'Issues                ',
  'views': 6500},

...and so on.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM