我正在尝试从页面中抓取产品评论,但我不确定如何在<script>
标签中提取 var。
这是我的python代码:
import requests
from bs4 import BeautifulSoup
import csv
a_file = open("ProductReviews.csv", "a")
writer = csv.writer(a_file)
# Write the titles of the columns to the CSV file
writer.writerow(["created_at", "reviewer_name", "rating", "content", "source"])
url = 'https://www.lazada.com.my/products/iron-gym-total-upper-body-workout-bar-i467342383.html'
# Connect to the URL
response = requests.get(url)
# Parse HTML and save to BeautifulSoup object
soup = BeautifulSoup(response.content, "html.parser")
data = soup.findAll('script')[123]
if 'var __moduleData__' in data.string:
print("Yes")
这是页面源代码(我删除了不必要的代码):
<html>
<head>
<title></title>
</head>
<body>
<script>
var __moduleData__ = {
"data": {
"root": {
"fields": {
"review": {
"reviews": [{
"rating": 5,
"reviewContent": "tq barang dah sampai",
"reviewTime": "24 May 2021",
"reviewer": "Jaharinbaharin",
}, {
"rating": 5,
"reviewContent": "Beautiful quality👌👌👌",
"reviewTime": "08 Sep 2021",
"reviewer": "M***.",
}, {
"rating": 5,
"reviewContent": "the box was badly dented but the item was intact...just that my door frame is shallow and slippery....I can't pull up without worrying of falling down",
"reviewTime": "25 Aug 2021",
"reviewer": "David S.",
}, {
"rating": 5,
"reviewContent": "Haven’t really opened it yet but please put some effort on the packaging for future improvement thanks it was really fast",
"reviewTime": "14 Dec 2020",
"reviewer": "Yasir A.",
}, {
"rating": 5,
"reviewContent": "Seems to be ok, good quality.. No weight restriction mentioned on the box.. I'm about 90kg, it could handle my weight so far..",
"reviewTime": "22 May 2020",
"reviewer": "Kevin",
}]
},
}
},
},
};
</script>
</body>
</html>
我只想获取评论数据,所以我想知道如何提取var __moduleData__
的值。