[英]Extracting data from Airtasker using beautiful Soup
I am trying to extract data from this website - https://www.airtasker.com/users/brad-n-11346775/ .我正在尝试从该网站提取数据 - https://www.airtasker.com/users/brad-n-11346775/ 。
So far, I have managed to extract everything except the license number .到目前为止,我已经设法提取了除许可证号之外的所有内容。 The problem I'm facing is bizarre as the license number is in the form of text.我面临的问题很奇怪,因为许可证号是文本形式的。 I was able to extract everything else like the Name, Address etc. For example, to extract the Name, I just did this:我能够提取其他所有内容,例如名称、地址等。例如,要提取名称,我只是这样做了:
name.append(pro.find('div', class_= 'name').text)
And it works just fine.它工作得很好。
This is what I have tried to do, but I'm getting the output as None这是我试图做的,但我将 output 设置为None
license_number.append(pro.find('div', class_= 'sub-text'))
When I do:当我做:
license_number.append(pro.find('div', class_= 'sub-text').text)
It gives me the following error:它给了我以下错误:
AttributeError: 'NoneType' object has no attribute 'text'
That means it does not recognise the license number as a text, even though it is a text.这意味着它不会将许可证号识别为文本,即使它是文本。
Can someone please give me a workable solution and please tell me what am I doing wrong???有人可以给我一个可行的解决方案,请告诉我我做错了什么??? Regards,问候,
The badge with the license number is added to the HTML
dynamically from a Boostrap JSON
that sits in one of the <script>
tags.带有许可证号的徽章从位于<script>
标记之一中的Boostrap JSON
HTML
动态添加到 HTML。
You can find the tag with bs4
and scoop out the data with regex
and parse it with json
.您可以使用bs4
找到标签并使用regex
挖出数据并使用json
对其进行解析。
Here's how:就是这样:
import ast
import json
import re
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.airtasker.com/users/brad-n-11346775/").text
scripts = BeautifulSoup(page, "lxml").find_all("script")[-4]
bootstrap_JSON = json.loads(
ast.literal_eval(re.search(r"parse\((.*)\)", scripts.string).group(1))
)
print(bootstrap_JSON["profile"]["badges"]["electrical_vic"]["reference_code"])
Output: Output:
Licence No. 28661
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.