简体   繁体   English

如何使用漂亮的汤获取 window.data 中的内容并将其 jsonify 以便我可以选择要打印的键和值?

[英]How do I get the content inside window.data with beautiful soup and jsonify it so I can choose what key and value I want to print?

I didn't know how to put the title, so it's rather long.标题不知道怎么写,所以有点长。 Feel free to edit it.随意编辑它。

I am trying to scrape data from this site, but I can't figure out how to access the individual keys and values within the 'window.data' with beautiful soup.我正在尝试从该站点抓取数据,但我无法弄清楚如何使用漂亮的汤访问“window.data”中的各个键和值。

I'd like to for example get the value of yyuid, birthday, etc.例如,我想获取 yyuid、生日等的值。

The code is as such:代码是这样的:

import urllib.request
import urllib.error
from bs4 import BeautifulSoup
import re

username = "itsahardday"
url = "https://likee.video/@" + username # profile url - https://likee.video/account_name

def get_profile_html():
    '''
    Get profile data from HTML - https://likee.video/account_name
    :return:
    '''
    response = urllib.request.urlopen(url)
    soup = BeautifulSoup(response.read(), "html.parser")
    results = soup.select_one("script:-soup-contains('userinfo')").string
    print(results)

get_profile_html()

Preferable I would like to have it as JSON, but any solution is welcomed.最好我希望它为 JSON,但欢迎任何解决方案。

In advance, thank you for your help!在此先感谢您的帮助!

tweaked your code.调整了你的代码。 return from the function.从 function 返回。

import urllib.request
import urllib.error
from bs4 import BeautifulSoup
import re

username = "itsahardday"
url = "https://likee.video/@" + username # profile url - https://likee.video/account_name

def get_profile_html():
    '''
    Get profile data from HTML - https://likee.video/account_name
    :return:
    '''
    response = urllib.request.urlopen(url)
    soup = BeautifulSoup(response.read(), "html.parser")
    results = soup.select_one("script:-soup-contains('userinfo')").string
    print(results)
    return results # add return

res=get_profile_html() # save the result

then, convert to JSON然后,转换为 JSON

import json # import
json.loads(res.split(";")[0].split("window.data =")[1])['userinfo']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM