简体   繁体   中英

How to update field “type” for all documents in mongodb collection using python mongodb client library (pymongo)

This is the last link to completing a majorly important data pipeline. We have the following newline delimited JSON, that we've exported from BigQuery into GCS, and then have downloaded locally:

{"name":"Terripins","fga":"42","fgm":"28","fgPct":0.67}
{"gameTime":"2019-01-12 12:00:00 UTC","gameDate":"2019-01-12","updated":"2019-01-12 20:25:03 UTC","isHome":true,"name":"","fga":"0","fgm":"0"}
{"gameTime":"2019-01-12 12:00:00 UTC","gameDate":"2019-01-12","updated":"2019-01-12 20:25:03 UTC","isHome":true,"name":"Crusaders","fga":"54","fgm":"33","fgPct":0.61}
{"gameTime":"2019-01-12 12:00:00 UTC","gameDate":"2019-01-12","updated":"2019-01-12 20:25:03 UTC","isHome":false,"name":"Greyhounds","fga":"54","fgm":"33","fgPct":0.61}
{"gameTime":"2019-01-12 12:00:00 UTC","gameDate":"2019-01-12","updated":"2019-01-12 20:25:03 UTC","isHome":false,"name":"Greyhounds","fga":"68","fgm":"20","fgPct":0.29}
{"gameTime":"2019-01-12 12:00:00 UTC","gameDate":"2019-01-12","updated":"2019-01-12 20:25:03 UTC","isHome":true,"name":"Crusaders","fga":"68","fgm":"20","fgPct":0.29}

We mongoimport this into our mongodb cluster, and the collection is successfully created:

在此处输入图片说明

Unfortunately, when we export the JSON from BigQuery, the integer types are converted into strings (see fga , fgm ), and the date columns are also converted into strings. This image shows the original schema from BigQuery.

在此处输入图片说明

We are trying to use the python mongodb client library pymongo to convert fga , and fgm into integer types. Presumably it is easier to (a) load the "stringified" json file into mongodb, and then use pymongo to update types, rather than (b) update or fix the types directly in the JSON file before mongoimport ing into mongo. So we are trying (a).

import(pymongo)

... connect to db and set "db"
our_collection = db["our_coll_name"]

# query and set for "update"
myquery = {} # for whole table
newvalues = { "$set": { "fga": int(fga) } } # change to int

# and update
new_output = our_collection.update_many(myquery, newvalues)
print(new_output.modified_count, "documents updated."

This doesn't work because int(fga) returns an error name 'fga' is not defined , and if we instead run int("fga") , we get the error ValueError: invalid literal for int() with base 10: 'fga' .

These errors both make complete sense to us, but we're still unsure then of how to update fga and fgm in this example to int . Also, are there mongo-specific date and timestamp types we can use for the 3 fields [gameTime, gameDate, updated] , and how can we make these conversions as well using pymongo?

Assuming MongoDB 4.2 or later.

Use MongoDB's toInt() and toDate() functions.

I've split these into seperate commands for clarity but you could run it in one update_many() if you prefer.

our_collection.update_many({}, [{'$set': {'fga': {'$toInt': '$fga'}}}])
our_collection.update_many({}, [{'$set': {'fgm': {'$toInt': '$fgm'}}}])
our_collection.update_many({}, [{'$set': {'gameTime': {'$toDate': '$gameTime'}}}])
our_collection.update_many({}, [{'$set': {'gameDate': {'$toDate': '$gameDate'}}}])
our_collection.update_many({}, [{'$set': {'updated': {'$toDate': '$updated'}}}])

Documentation:

https://docs.mongodb.com/manual/reference/operator/aggregation/toInt/ https://docs.mongodb.com/manual/reference/operator/aggregation/toDate/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM