[英]How do I get the content inside window.data with beautiful soup and jsonify it so I can choose what key and value I want to print?
I didn't know how to put the title, so it's rather long.标题不知道怎么写,所以有点长。 Feel free to edit it.随意编辑它。
I am trying to scrape data from this site, but I can't figure out how to access the individual keys and values within the 'window.data' with beautiful soup.我正在尝试从该站点抓取数据,但我无法弄清楚如何使用漂亮的汤访问“window.data”中的各个键和值。
I'd like to for example get the value of yyuid, birthday, etc.例如,我想获取 yyuid、生日等的值。
The code is as such:代码是这样的:
import urllib.request
import urllib.error
from bs4 import BeautifulSoup
import re
username = "itsahardday"
url = "https://likee.video/@" + username # profile url - https://likee.video/account_name
def get_profile_html():
'''
Get profile data from HTML - https://likee.video/account_name
:return:
'''
response = urllib.request.urlopen(url)
soup = BeautifulSoup(response.read(), "html.parser")
results = soup.select_one("script:-soup-contains('userinfo')").string
print(results)
get_profile_html()
Preferable I would like to have it as JSON, but any solution is welcomed.最好我希望它为 JSON,但欢迎任何解决方案。
In advance, thank you for your help!在此先感谢您的帮助!
tweaked your code.调整了你的代码。 return from the function.从 function 返回。
import urllib.request
import urllib.error
from bs4 import BeautifulSoup
import re
username = "itsahardday"
url = "https://likee.video/@" + username # profile url - https://likee.video/account_name
def get_profile_html():
'''
Get profile data from HTML - https://likee.video/account_name
:return:
'''
response = urllib.request.urlopen(url)
soup = BeautifulSoup(response.read(), "html.parser")
results = soup.select_one("script:-soup-contains('userinfo')").string
print(results)
return results # add return
res=get_profile_html() # save the result
then, convert to JSON然后,转换为 JSON
import json # import
json.loads(res.split(";")[0].split("window.data =")[1])['userinfo']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.