簡體   English   中英

如何在不重復的情況下將mongoDB的嵌套文檔存儲在熊貓中

[英]How to store mongoDB's nested documents in pandas without duplication

我正在從mongoDB讀取數據,並將其存儲在pandas數據框中,以進行進一步的探索性分析和機器學習。mongoDB文檔如下所示。

{ 
   "user_id"    : "user_9",
   "order_id"   : "order_9",
   "meals"      :  5,
   "order_area" : "London",

   "dish" : [
      {
         "dish_id"          : "012" ,
         "dish_name"        : "ABC",
         "dish_type"        : "Non-Veg",                
         "dish_price"       : 135,
         "dish_quantity"    : 2,
         "ratings"          : 4,
         "reviews"          : "blah blah blah",
         "coupon_type"      : "Rs 20 off"
      },
      {
        "dish_id"          : "013" ,
        "dish_name"        : "XYZ",
        "dish_type"        : "Non-Veg",                
        "dish_price"       : 125,
        "dish_quantity"    : 3,
        "ratings"          : 4,
        "reviews"          : "blah blah blah",
        "coupon_type"      : "Rs 20 off"
      },
   ],
}

一旦我在python中獲得數據,就使用json_normalize在將其插入數據框時拆分與菜相關的屬性

 df=  json_normalize(db.dataset2.find(), 'dish',           
 ['_id','user_id','order_id','order_time','meals','order_area']

這讓我跟隨大熊貓

  coupon_type     dish_id  dish_name  dish_price  dish_quantity
0     Rs 20 off     012      ABC      135            2
1     Rs 20 off     013      XYZ      125            3

  ratings    reviews      coupon_type  user_id order_id  meals order_area
0   4     blah blah blah  Rs 20 off      9       9         5     London
1   4     blah blah blah  Rs 20 off      9       9         5     London

問題在於數據是在(user_id,order_id,meals,_id和order_area)中復制的嗎?還有什么其他方法可以在不重復的情況下將數據存儲在數據框中?

您可能正在尋找一個MultiIndex ,它至少看上去避免了duplication - (請參閱docs ):

df = json_normalize(data, 'dish', ['user_id', 'order_id', 'meals', 'order_area'])
df = df.set_index(['user_id','order_id', 'meals', 'order_area'])

                                  coupon_type dish_id dish_name  dish_price  \
user_id order_id meals order_area                                             
user_9  order_9  5     London       Rs 20 off     012       ABC         135   
                                    Rs 20 off     013       XYZ         125   

                                   dish_quantity dish_type  ratings  \
user_id order_id meals order_area                                     
user_9  order_9  5     London                  2   Non-Veg        4   
                                               3   Non-Veg        4   

                                          reviews  
user_id order_id meals order_area                  
user_9  order_9  5     London      blah blah blah  
                                   blah blah blah 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM