简体   繁体   中英

MongoDB collection to pandas Dataframe

My MongoDB document structure is as follows and some of the factors are NaN.

  _id :ObjectId("5feddb959297bb2625db1450")
factors: Array 
   0:Object
     factorId:"C24"
     Index:0
     weight:1
   1:Object
     factorId:"C25"
     Index:1
     weight:1
   2:Object
     factorId:"C26"
     Index:2
     weight:1
name:"Growth Led Momentum"

I want to convert it to pandas data frame as follows using pymongo and pandas.

|name                   | factorId | Index | weight|
----------------------------------------------------
|Growth Led Momentum    | C24      | 0     | 0     |
----------------------------------------------------
|Growth Led Momentum    | C25      | 1     | 0     |
----------------------------------------------------
|Growth Led Momentum    | C26      | 2     | 0     |
----------------------------------------------------

Thank you

Wonderful answer by Matt, In case you want to use pandas:

Use this after you have retrieved documents from db:

df = pd.json_normalize(data)
df = df['factors'].explode().apply(lambda x: [val for _, val in x.items()]).explode().apply(pd.Series).join(df).drop(columns=['factors'])

Output:

  factorId  Index  weight                 name
0      C24      0       1  Growth Led Momentum
0      C25      1       1  Growth Led Momentum
0      C26      2       1  Growth Led Momentum

Update

I broke out the ol Python to give this a crack - the following code works flawlessly!

from pymongo import MongoClient
import pandas as pd

uri = "mongodb://<your_mongo_uri>:27017"
database_name = "<your_database_name"
collection_name = "<your_collection_name>"

mongo_client = MongoClient(uri)
database = mongo_client[database_name]
collection = database[collection_name]

# I used this code to insert a doc into a test collection
# before querying (just incase you wanted to know lol)
"""
data = {
    "_id": 1,
    "name": "Growth Lead Momentum",
    "factors": [
        {
            "factorId": "C24",
            "index": 0,
            "weight": 1
        },
        {
            "factorId": "D74",
            "index": 7,
            "weight": 9
        }
    ]
}

insert_result = collection.insert_one(data)
print(insert_result)
"""

# This is the query that
# answers your question

results = collection.aggregate([
  {
    "$unwind": "$factors"
  },
  {
    "$project": {
      "_id": 1, # Change to 0 if you wish to ignore "_id" field.
      "name": 1,
      "factorId": "$factors.factorId",
      "index": "$factors.index",
      "weight": "$factors.weight"
    }
  }
])

# This is how we turn the results into a DataFrame.
# We can simply pass `list(results)` into `DataFrame(..)`,
# due to how our query works.

results_as_dataframe = pd.DataFrame(list(results))
print(results_as_dataframe)

Which outputs:

   _id                  name factorId  index  weight
0    1  Growth Lead Momentum      C24      0       1
1    1  Growth Lead Momentum      D74      7       9

Original Answer

You could use the aggregation pipeline to unwind factors and then project the fields you want.

Something like this should do the trick.

Live demo here .

Database Structure

[
  {
    "_id": 1,
    "name": "Growth Lead Momentum",
    "factors": [
      {
        factorId: "C24",
        index: 0,
        weight: 1
      },
      {
        factorId: "D74",
        index: 7,
        weight: 9
      }
    ]
  }
]

Query

db.collection.aggregate([
  {
    $unwind: "$factors"
  },
  {
    $project: {
      _id: 1,
      name: 1,
      factorId: "$factors.factorId",
      index: "$factors.index",
      weight: "$factors.weight"
    }
  }
])

Results

(.csv friendly)

[
  {
    "_id": 1,
    "factorId": "C24",
    "index": 0,
    "name": "Growth Lead Momentum",
    "weight": 1
  },
  {
    "_id": 1,
    "factorId": "D74",
    "index": 7,
    "name": "Growth Lead Momentum",
    "weight": 9
  }
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM