简体   繁体   中英

Gmail API - Quickly access the dates of every email ever sent / received

I'm try to analyse my 25k+ emails similar to the post here: http://beneathdata.com/how-to/email-behavior-analysis/

While the mentioned script used IMAP, I'm trying to implement this using the Gmail API for improved security. I'm using Python (and Pandas for data analysis) but the question applies more generally to use of the Gmail API.

From the docs, I'm able to read emails in using:

msgs = service.users().messages().list(userId='me', maxResults=500).execute()

and then access the data using a loop:

for msg in msgs['messages']:
    m_id = msg['id'] # get id of individual message
    message = service.users().messages().get(userId='me', id=m_id).execute()
    payload = message['payload'] 
    header = payload['headers']

    for item in header:
        if item['name'] == 'Date':
           date = item['value']
           ** DATA STORAGE FUNCTIONS ETC **

but this is clearly very slow. In addition to looping over every message, I have to call the list() API call many times to cycle through all emails.

Is there a higher performance way to do this? eg to ask the API to only return the data rather than all unwanted message information.

Thanks.

Reference: https://developers.google.com/resources/api-libraries/documentation/gmail/v1/python/latest/gmail_v1.users.messages.html

You can batch your messages.get() operations into a batch, see: https://developers.google.com/gmail/api/guides/batch

You can put up to 100 requests into a batch.

Note that "a set of n requests batched together counts toward your usage limit as n requests, not as one request." So you may need to do some pacing to stay below request rate limits.

Here's a rough Python example that will fetch the messages given by a list of ids id_list

msgs = []
def fetch(rid, response, exception):
    if exception is not None:
        print exception
    else:
        msgs.append(response)

# Make a batch request
batch = gmail.new_batch_http_request()
for message_id in id_list:
    t = gmail.users().messages().get(userId='me', id=message_id, format=fmt)
    batch.add(t, callback=fetch)

batch.execute(http=http)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM