简体   繁体   中英

Obtaining Distinct values from mongodb as list of dictionaries using pymongo

I have a mongodb collection data in the following format

[{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 11:20", "ml_pred":"Invalid","hum_pred":"valid"},
 {"name":"axe2","base-url":"www.example2.com","date":"2022-06-22 12:20", "ml_pred":"Valid","hum_pred":"null"},
 {"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 22:20", "ml_pred":"Invalid","hum_pred":"valid"},
 {"name":"axe3","base-url":"www.example3.com","date":"2022-06-22 02:20", "ml_pred":"Valid","hum_pred":"null"},
 {"name":"axe2","base-url":"www.example2.com","date":"2022-06-22 06:20", "ml_pred":"Invalid","hum_pred":"valid"},
 {"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 14:20", "ml_pred":"Invalid","hum_pred":"null"},
 {"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 10:20", "ml_pred":"Invalid","hum_pred":"invalid"},
 {"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 01:20", "ml_pred":"Invalid","hum_pred":"null"}]

I am trying to get unique base-url and name as a response. For that I use pymongo distinct like below

filter_stuff = {'base-url': 1, 'name':1,'_id': 0}
data = list(crawlcol.find({},filter_stuff).distinct("base-url"))

which returned me a list of base urls. But I am expecting an output like

[{"name":"axe1","base-url":"www.example1.com"},
 {"name":"axe2","base-url":"www.example2.com"},
 {"name":"axe3","base-url":"www.example3.com"}]

How this can be obtained

This will give the result as required

result = list(crawlcol.aggregate( 
            [
                {"$group": { "_id": { "base-url": "$base-url", "name": "$name" } } }
            ]
        ))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM