简体   繁体   中英

Google Analytics Reporting API v4: nextPagetoken cannot go beyond 10,000 rows (Python)

I'm new to Google analytics API and I'm trying to query data with more than 10,000 rows using Python. Each row in my request is a client id. After doing some research, I know I have to specify the pageToken and pagesize parameters to achieve this goal. The following function shows my basic query structure.

def get_report(analytics, pageToken=None):
    sample_request = {
      'viewId': '1111111',
      'pageSize': 2000,
      'pageToken': pageToken,
      'dateRanges': {
          'startDate': datetime.strftime(datetime.now() - timedelta(days = 30),'%Y-%m-%d'),
          'endDate': datetime.strftime(datetime.now(),'%Y-%m-%d')
      },
      'dimensions': [{'name': 'ga:clientid'}],
      #'metrics': [{'expression': 'ga:sessions'}, {'expression': 'ga:avgSessionDuration'}]
      'metrics': [{'expression': 'ga:sessions'}, 
                  {'expression': 'ga:avgSessionDuration'},
                  {'expression': 'ga:bounceRate'},
                  {'expression': 'ga:goalConversionRateAll'},
                  {'expression': 'ga:pageviews'}
                 ],
    'orderBys':  [{"fieldName": "ga:sessions", "sortOrder": "DESCENDING"}],
        #{"fieldName": "ga:pageviews", "sortOrder": "DESCENDING"}],


    }

    return analytics.reports().batchGet(
      body={
        'reportRequests': sample_request,

      }
    ).execute()

I implemented my pagination function using a similar idea appeared in this post Update the input of a loop with a result from the previous iteration of the loop . Basically I make a query with 2000 rows each time and convert these rows into dataframe. While I can receive the pageToken from previous request, I keep requesting additional 2000 rows based on the pageToken, and append them to my existing dataframe. Here is my pagination code.

# pagination
def main():
    global result

    # Initial request
    analytics = initialize_analyticsreporting()
    response = get_report(analytics)
    pageToken = response['reports'][0].get('nextPageToken')
    response_data = response.get('reports', [])[0]
    # convert the report into pandas dataframe
    result= pd.DataFrame(prase_response(response_data)[0])


    while pageToken != None:   # more data available

        print(pageToken)
        print("still running")
        analytics = initialize_analyticsreporting()
        response = get_report(analytics, str(int(pageToken)+1))
        pageToken = response['reports'][0].get('nextPageToken') # update the pageToken
        response_data = response.get('reports', [])[0]
        # temp is new dataframe to be apended 
        temp= pd.DataFrame(prase_response(response_data)[0])

        result= pd.concat([result,temp], axis=0)


if __name__ == '__main__':
    main()

This program works as expected but it will stop when pageToken reaches "10000"; that is, in this case, the main function can concatenate 5 data frames together, each with 2000 rows. I should have more than 60,000 rows available. I know we can request at most 10,000 rows for each request, but I also know we can use pageToken parameter go around this limitation. I'm not sure which part of my code goes wrong. If I set pageSize=10,000, then the main function will just create one dataframe with 10,000 rows and stops.

So how can I get all 60,000 rows of data and why my pagination function fails. Or is it impossible to request more than 10,000 client ids? Any help is greatly appreciated! Thank you!

Update (05/15/2020): I used my pagination function for other dimensions, such as "ga:pagePath", it works perfectly. So I guess it may be just impossible to query more than 10,000 rows of client id only. Please do correct me if I'm wrong.

I think it's a permission or configuration problem. I have two different accounts and in one of them I get everything back and in the other one only 10000. The worst thing about the one that returns less is a 360 account.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM