简体   繁体   中英

How do I get the content inside window.data with beautiful soup and jsonify it so I can choose what key and value I want to print?

I didn't know how to put the title, so it's rather long. Feel free to edit it.

I am trying to scrape data from this site, but I can't figure out how to access the individual keys and values within the 'window.data' with beautiful soup.

I'd like to for example get the value of yyuid, birthday, etc.

The code is as such:

import urllib.request
import urllib.error
from bs4 import BeautifulSoup
import re

username = "itsahardday"
url = "https://likee.video/@" + username # profile url - https://likee.video/account_name

def get_profile_html():
    '''
    Get profile data from HTML - https://likee.video/account_name
    :return:
    '''
    response = urllib.request.urlopen(url)
    soup = BeautifulSoup(response.read(), "html.parser")
    results = soup.select_one("script:-soup-contains('userinfo')").string
    print(results)

get_profile_html()

Preferable I would like to have it as JSON, but any solution is welcomed.

In advance, thank you for your help!

tweaked your code. return from the function.

import urllib.request
import urllib.error
from bs4 import BeautifulSoup
import re

username = "itsahardday"
url = "https://likee.video/@" + username # profile url - https://likee.video/account_name

def get_profile_html():
    '''
    Get profile data from HTML - https://likee.video/account_name
    :return:
    '''
    response = urllib.request.urlopen(url)
    soup = BeautifulSoup(response.read(), "html.parser")
    results = soup.select_one("script:-soup-contains('userinfo')").string
    print(results)
    return results # add return

res=get_profile_html() # save the result

then, convert to JSON

import json # import
json.loads(res.split(";")[0].split("window.data =")[1])['userinfo']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM