繁体   English   中英

如何使用 MongoDB Java 查找字段的重复数?

[英]How can I find the number of duplicates for a field using MongoDB Java?

我如何在 Java-MongoDB 中找到每个文档中的重复数,我有这样的集合。 集合示例:

{
    "_id": {
        "$oid": "5fc8eb07d473e148192fbecd"
    },
    "ip_address": "192.168.0.1",
    "mac_address": "00:A0:C9:14:C8:29",
    "url": "https://people.richland.edu/dkirby/141macaddress.htm",
    "datetimes": {
        "$date": "2021-02-13T02:02:00.000Z"
    }
{
    "_id": {
        "$oid": "5ff539269a10d529d88d19f4"
    },
    "ip_address": "192.168.0.7",
    "mac_address": "00:A0:C9:14:C8:30",
    "url": "https://people.richland.edu/dkirby/141macaddress.htm",
    "datetimes": {
        "$date": "2021-02-12T19:00:00.000Z"
    }
}
{
    "_id": {
        "$oid": "60083d9a1cad2b613cd0c0a2"
    },
    "ip_address": "192.168.1.5",
    "mac_address": "00:0A:05:C7:C8:31",
    "url": "www.facebook.com",
    "datetimes": {
        "$date": "2021-01-24T17:00:00.000Z"
    }
}

示例查询:

            BasicDBObject whereQuery = new BasicDBObject();
            DBCursor cursor = table1.find(whereQuery);
            while (cursor.hasNext()) {
                DBObject obj = cursor.next();
                String ip_address = (String) obj.get("ip_address");
                String mac_address = (String) obj.get("mac_address");
                Date datetimes = (Date) obj.get("datetimes");
                String url = (String) obj.get("url");
                System.out.println(ip_address, mac_address, datetimes, url);
            }

在 Java 中,我如何知道计算“url”的重复数据。 以及有多少重复。

如果我正确理解您的问题,您正在尝试查找字段url的重复条目数量。 您可以遍历所有文档并将它们添加到Set Set具有仅存储唯一值的属性。 添加值时,不会再次添加已在Set中的值。 因此, Set中的条目数与文档数之差就是给定字段的重复条目数。

如果你想知道哪些 URL 是非唯一的,你可以评估Set.add(Object)的返回值,它会告诉你给定的值是否事先已经在Set中。 如果有,你就得到了一个副本。

在 mongodb 中,您可以使用“聚合管道”解决此问题。 您需要在“Mongodb Java Driver”中实现此管道。 它只给出重复的结果及其重复计数。

db.getCollection('table1').aggregate([
   {
        "$group": {
            // group by url and calculate count of duplicates by url 
            "_id": "$url",
            "url": {
                "$first": "$url"
            },
            "duplicates_count": {
                "$sum": 1
            },
            "duplicates": {
                "$push": {
                    "_id": "$_id",
                    "ip_address": "$ip_address",
                    "mac_address": "$mac_address",
                    "url": "$url",
                    "datetimes": "$datetimes"
                }
            }
        }
    },
    {   // select documents that only duplicates count higher than 1
        "$match": {
            "duplicates_count": {
                "$gt": 1
            }
        }
    },
    {
        "$project": {
            "_id": 0
        }
    }
]);

Output 结果:

{
    "url" : "https://people.richland.edu/dkirby/141macaddress.htm",
    "duplicates_count" : 2.0,
    "duplicates" : [ 
        {
            "_id" : ObjectId("5fc8eb07d473e148192fbecd"),
            "ip_address" : "192.168.0.1",
            "mac_address" : "00:A0:C9:14:C8:29",
            "url" : "https://people.richland.edu/dkirby/141macaddress.htm",
            "datetimes" : {
                "$date" : "2021-02-13T02:02:00.000Z"
            }
        }, 
        {
            "_id" : ObjectId("5ff539269a10d529d88d19f4"),
            "ip_address" : "192.168.0.7",
            "mac_address" : "00:A0:C9:14:C8:30",
            "url" : "https://people.richland.edu/dkirby/141macaddress.htm",
            "datetimes" : {
                "$date" : "2021-02-12T19:00:00.000Z"
            }
        }
    ]
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM