使用 Beautiful Soup 从 JavaScript 中提取数组值

Question

I'm trying to build a scraper in Python that gets a variable from JavaScript code within the HTML of a webpage.我正在尝试用 Python 构建一个抓取工具，它从网页 HTML 中的 JavaScript 代码获取变量。 This variable changes over time.这个变量随时间变化。 Here is the JavaScript code;这是 JavaScript 代码； I need the first number of the yValues variable:我需要yValues变量的第一个数字：

jQuery(document).ready(function() {
  var draw = true;
  
  if ("Biblioteca di Ingegneria" == "") {
    draw = false;
  }
  
  if (draw) {
    var yValues = [
        "28",
        "100"
      ];
    var Titolo = "Biblioteca di Ingegneria";
    var sottoTitolo = "Posti Totali: 128";
    var barColors = [
        "#167d21",
        "#ed2135"
      ];
    var xValues = [
        "Liberi (28)",
        "Occupati (100)"
      ];
    
    new Chart("InOutChart", {
      type: "pie",
      data: {
        labels: xValues,
        datasets: [
          {
            backgroundColor: barColors,
            data: yValues
          }
        ]
      },
      options: {
        plugins: {
          title: {
            display: true,
            text: Titolo,
            font: {
              size: 25,
              style: 'normal',
              lineHeight: 1.2
            },
            // padding: {
            //   top: 10,
            //   bottom: 30
            // }
          },
          subtitle: {
            display: true,
            text: sottoTitolo,
            font: {
              size: 20,
              style: 'normal',
              lineHeight: 1.2
            },
            padding: {
              bottom: 30
            }
          },
          legend: {
            display: true,
            position: "bottom",
            labels: {
              font: {
                size: 20,
                style: 'normal',
                lineHeight: 1.2
              }
            }
          }
        },
        responsive: true,
        maintainAspectRatio: false,
        scales: {
          yAxes: [
            {
              display: true,
              ticks: {
                beginAtZero: true
              }
            }
          ]
        }
      }
    });
  }
});

This is the best I could do:这是我能做的最好的：

from bs4 import BeautifulSoup
import requests

# Make a GET request to the URL of the web page.
base_url = 'https://qrbiblio.unipi.it/Home/Chart?IdCat=a96d84ba-46e8-47a1-b947-ab98a8746d6f'
response = requests.get(base_url)

# Parse the HTML content of the page.
soup = BeautifulSoup(response.text, "html.parser")

# Find all the `<script>` elements on the page.
scripts = soup.find_all("script")

# Get the 8th `<script>` element.
script8 = scripts[7]

# Transform the 8th `<script>` into a string.
script8_txt = "".join(script8)

# Get the useful string from the 8th `<script>`.
usefull_txt = script8_txt[248:251]
        
# Get the int from the string.
pl = int("".join(filter(str.isdigit, usefull_txt)))

print(pl)

This works, but I want to automatically parse the JavaScript code to find the variable and get its value, because as you can see I manually checked the position of the characters that I needed.这可行，但我想自动解析 JavaScript 代码以查找变量并获取其值，因为如您所见，我手动检查了所需字符的位置。 I'm looking for a better solution because I'm planning to use this code for other similar webpages, but the position of the variable changes every time.我正在寻找更好的解决方案，因为我打算将这段代码用于其他类似的网页，但变量的位置每次都在变化。 Last information: I want to put this Python code in an Alexa skill, so I don't know if Selenium package will work well.最后的信息：我想把这个 Python 代码放在一个 Alexa 技能中，所以我不知道 Selenium 包是否能正常工作。

Answer 1

Try this:试试这个：

import ast

import requests
from bs4 import BeautifulSoup

base_url = 'https://qrbiblio.unipi.it/Home/Chart?IdCat=a96d84ba-46e8-47a1-b947-ab98a8746d6f'
response = requests.get(base_url)

script = (
    BeautifulSoup(response.text, "html.parser")
    .find_all("script")[7]
    .string
)
numbers = ast.literal_eval(
    script.strip().split("var yValues = ")[1].split(";")[0]
)
print(numbers)
print(numbers[0])

Output:输出：

['130', '0']
130

使用 Beautiful Soup 从 JavaScript 中提取数组值

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-12-15 15:37:53

使用 Beautiful Soup 从 JavaScript 中提取数组值

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-12-15 15:37:53

解决方案1
1 已采纳 2022-12-15 15:37:53