使用漂亮的 Soup 从 Airtasker 中提取数据

Question

I am trying to extract data from this website - https://www.airtasker.com/users/brad-n-11346775/ .我正在尝试从该网站提取数据 - https://www.airtasker.com/users/brad-n-11346775/ 。

So far, I have managed to extract everything except the license number .到目前为止，我已经设法提取了除许可证号之外的所有内容。 The problem I'm facing is bizarre as the license number is in the form of text.我面临的问题很奇怪，因为许可证号是文本形式的。 I was able to extract everything else like the Name, Address etc. For example, to extract the Name, I just did this:我能够提取其他所有内容，例如名称、地址等。例如，要提取名称，我只是这样做了：

name.append(pro.find('div', class_= 'name').text)

And it works just fine.它工作得很好。

This is what I have tried to do, but I'm getting the output as None这是我试图做的，但我将 output 设置为None

license_number.append(pro.find('div', class_= 'sub-text'))

When I do:当我做：

license_number.append(pro.find('div', class_= 'sub-text').text)

It gives me the following error:它给了我以下错误：

AttributeError: 'NoneType' object has no attribute 'text'

That means it does not recognise the license number as a text, even though it is a text.这意味着它不会将许可证号识别为文本，即使它是文本。

Can someone please give me a workable solution and please tell me what am I doing wrong???有人可以给我一个可行的解决方案，请告诉我我做错了什么？？？ Regards,问候，

Answer 1

The badge with the license number is added to the HTML dynamically from a Boostrap JSON that sits in one of the <script> tags.带有许可证号的徽章从位于<script>标记之一中的Boostrap JSON HTML动态添加到 HTML。

You can find the tag with bs4 and scoop out the data with regex and parse it with json .您可以使用bs4找到标签并使用regex挖出数据并使用json对其进行解析。

Here's how:就是这样：

import ast
import json
import re

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.airtasker.com/users/brad-n-11346775/").text
scripts = BeautifulSoup(page, "lxml").find_all("script")[-4]
bootstrap_JSON = json.loads(
    ast.literal_eval(re.search(r"parse\((.*)\)", scripts.string).group(1))
)
print(bootstrap_JSON["profile"]["badges"]["electrical_vic"]["reference_code"])

Output: Output：

Licence No. 28661

使用漂亮的 Soup 从 Airtasker 中提取数据

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-05-16 13:31:05

使用漂亮的 Soup 从 Airtasker 中提取数据

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-05-16 13:31:05

解决方案1
2 已采纳 2021-05-16 13:31:05