简体   繁体   中英

How can I convert a mongodb document collection to a numpy Array in python?

I am trying to get all documents with the same value on the Key "Verlauf" from my MongoDB. That works so far. But then I want to convert this output to a numpy array. How does this work?

I am getting the documents in a list from MongoDB with this command:

v1 = list(collection.find({"Verlauf": 1}))

The Output looks like this (for 2 matching documents):

[{'_id': ObjectId('5f05aca208c3c86edf465953'), 'Verlauf': 1, 'Wie stark haben Sie den Kraftverlauf empfunden?': 2, 'Wie gut wurde dies empfunden?': 3, 'Dritte Frage hier einfügen': 4, 'Vierte Frage hier einfügen': 5, 'Fünfte Frage hier einfügen': 6, 'Sechste Frage hier einfügen': 7}, {'_id': ObjectId('5f05b89d48eb73c488a90efb'), 'Verlauf': 1, 'Wie stark haben Sie den Kraftverlauf empfunden?': 4, 'Wie gut wurde dies empfunden?': 5, 'Dritte Frage hier einfügen': 4, 'Vierte Frage hier einfügen': 5, 'Fünfte Frage hier einfügen': 4, 'Sechste Frage hier einfügen': 5}]

Is there A way to structure the data from this list in a numpy array, where the first row contains all the values of the first key, the second row contains all the values of the second key?

So for this example:

[ [5f05aca208c3c86edf465953, 5f05b89d48eb73c488a90efb],
  [1, 1],  
  [2, 4],
  [3, 5],
  [4, 4],
  [5, 5],
  [6, 4],
  [7, 5]]

I am very new to all this data handling and would be very thankful for any advice.

Later I want to analyse this data to get the Minimum, Maximum, Lower Quartile, Upper Quartile, and Median for each key over all documents.

Thanks in Advance. Greetings Tom

I don't use python too often so I'm fairly confident a better way to do this exists.

When I do play around it's usually low scaled experiments hence I just use the brute force approach (you can do this in several different ways).

columns = ["_id", "field1", "field2"]
data = list(map(lambda item: [item["_id"], item["field1"], item["field2"]], db.collection.find({})))

df = DataFrame(data, columns=columns)

You can even add a little more sugar:

columns = ["_id", "field1", "field2"]
data = list(map(lambda item: list(map(lambda col: item[col], columns)), db.collection.find({})))

df = DataFrame(data, columns=columns)

Note that this won't work with nested fields, ie "field1.nested" values as python wants to you use item["field1"]["nested"] to access nested dictionary values. in that case I usally just use a for loop to achieve the required result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM