简体   繁体   中英

Python for-looping in API to get data?

So I have this item whose url is this

I was checking out the Network Header in Inspect mode and found that when I click on the second page of reviews there was an API address.

I am trying to only scrape reviews "With Comments"

import requests
import pandas as pd
import csv

rows = []
    
url = "https://shopee.sg/api/v2/item/get_ratings"

params = {
    "filter": "1",
    "flag": "1",
    "itemid": "3734876291",
    "limit": 6, # shows 6 reviews at a time
    "offset": 0, # starting at first page, raise increment by 6?
    "shopid": "234663159",
    "type": "0"
    }
    
r = requests.get(url, params=params)
data = r.json()
    
# Total number of reviews
reviews = data['data']['item_rating_summary']['rcount_with_context']
    
for i in range (0, reviews, 6):
    params["offset"] = i

    limit = reviews - i
    if limit < 6:
        params["limit"] = limit

#    offset = reviews - i
#    if offset < 6:
#       params["offset"] = offset

            
    for num, item in enumerate(data['data']['ratings']):
        row = {}

        print('Review:', i + num + 1)
            
        # User Name
        user_name = item['author_username']
        row['Name'] = user_name
        print(user_name)
            
        # Content
        content = item['comment']
        row['Content'] = content
        print(content)
            
        # Rating
        rating = item['rating_star']
        row['Rating'] = rating
        print(rating)
            
        rows.append(row)
            
df = pd.DataFrame(rows)
df = df[['Name', 'Content', 'Rating']]
df.to_csv('API Review DF.csv', index=False)

This is what I have so far but I seem to be getting the first six reviews 246 times (total number of reviews with comments)

I'm definitely doing something wrong in the for loop but I'm just not sure which one I need to fix to solve this issue. What do I need to change so that I can properly get all 246 reviews' data?

most of your code is right.you just forget to do request.
I modified your code.you can check the change on it.

  • I change first request limit to 1,because first request I just want the rcount_with_context ,there is no need to fetch too much reviews back
  • in the outside for loop,I said limit to 6 again
  • the limit offset combination means:I want 6 items (which is limit ) start at the 10th(which is offset ).for example,here is a list 1,2,3,4,5,6,7,8,9,10 ,if limit is 2 and offset is 2 two,the result is 3,4 ,
import requests

rows = []
    
url = "https://shopee.sg/api/v2/item/get_ratings"

params = {
    "filter": "1",
    "flag": "1",
    "itemid": "3734876291",
    "limit": 1, # only for the first time get count
    "offset": 0, # starting at first page, raise increment by 6?
    "shopid": "234663159",
    "type": "0"
    }
    
r = requests.get(url, params=params)
data = r.json()
    
# Total number of reviews
reviews = data['data']['item_rating_summary']['rcount_with_context']
    
for i in range (0, reviews, 6):
    params["offset"] = i
    params["limit"] = 6
    # limit = reviews - i
    # if limit < 6:
        # params["limit"] = limit

#    offset = reviews - i
#    if offset < 6:
#       params["offset"] = offset

    # after set params,we will call requests again to fetch new data
    r = requests.get(url,params=params)
    data = r.json()
            
    for num, item in enumerate(data['data']['ratings']):
        row = {}

        print('Review:', i + num + 1)
            
        # User Name
        user_name = item['author_username']
        row['Name'] = user_name
        print(user_name)
            
        # Content
        content = item['comment']
        row['Content'] = content
        print(content)
            
        # Rating
        rating = item['rating_star']
        row['Rating'] = rating
        print(rating)
            
        rows.append(row)
            
df = pd.DataFrame(rows)
df = df[['Name', 'Content', 'Rating']]
df.to_csv('API Review DF.csv', index=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM