[英]How can I find the number of duplicates for a field using MongoDB Java?
我如何在 Java-MongoDB 中找到每个文档中的重复数,我有这样的集合。 集合示例:
{
"_id": {
"$oid": "5fc8eb07d473e148192fbecd"
},
"ip_address": "192.168.0.1",
"mac_address": "00:A0:C9:14:C8:29",
"url": "https://people.richland.edu/dkirby/141macaddress.htm",
"datetimes": {
"$date": "2021-02-13T02:02:00.000Z"
}
{
"_id": {
"$oid": "5ff539269a10d529d88d19f4"
},
"ip_address": "192.168.0.7",
"mac_address": "00:A0:C9:14:C8:30",
"url": "https://people.richland.edu/dkirby/141macaddress.htm",
"datetimes": {
"$date": "2021-02-12T19:00:00.000Z"
}
}
{
"_id": {
"$oid": "60083d9a1cad2b613cd0c0a2"
},
"ip_address": "192.168.1.5",
"mac_address": "00:0A:05:C7:C8:31",
"url": "www.facebook.com",
"datetimes": {
"$date": "2021-01-24T17:00:00.000Z"
}
}
示例查询:
BasicDBObject whereQuery = new BasicDBObject();
DBCursor cursor = table1.find(whereQuery);
while (cursor.hasNext()) {
DBObject obj = cursor.next();
String ip_address = (String) obj.get("ip_address");
String mac_address = (String) obj.get("mac_address");
Date datetimes = (Date) obj.get("datetimes");
String url = (String) obj.get("url");
System.out.println(ip_address, mac_address, datetimes, url);
}
在 Java 中,我如何知道计算“url”的重复数据。 以及有多少重复。
如果我正确理解您的问题,您正在尝试查找字段url
的重复条目数量。 您可以遍历所有文档并将它们添加到Set
。 Set
具有仅存储唯一值的属性。 添加值时,不会再次添加已在Set
中的值。 因此, Set
中的条目数与文档数之差就是给定字段的重复条目数。
如果你想知道哪些 URL 是非唯一的,你可以评估Set.add(Object)
的返回值,它会告诉你给定的值是否事先已经在Set
中。 如果有,你就得到了一个副本。
在 mongodb 中,您可以使用“聚合管道”解决此问题。 您需要在“Mongodb Java Driver”中实现此管道。 它只给出重复的结果及其重复计数。
db.getCollection('table1').aggregate([
{
"$group": {
// group by url and calculate count of duplicates by url
"_id": "$url",
"url": {
"$first": "$url"
},
"duplicates_count": {
"$sum": 1
},
"duplicates": {
"$push": {
"_id": "$_id",
"ip_address": "$ip_address",
"mac_address": "$mac_address",
"url": "$url",
"datetimes": "$datetimes"
}
}
}
},
{ // select documents that only duplicates count higher than 1
"$match": {
"duplicates_count": {
"$gt": 1
}
}
},
{
"$project": {
"_id": 0
}
}
]);
Output 结果:
{
"url" : "https://people.richland.edu/dkirby/141macaddress.htm",
"duplicates_count" : 2.0,
"duplicates" : [
{
"_id" : ObjectId("5fc8eb07d473e148192fbecd"),
"ip_address" : "192.168.0.1",
"mac_address" : "00:A0:C9:14:C8:29",
"url" : "https://people.richland.edu/dkirby/141macaddress.htm",
"datetimes" : {
"$date" : "2021-02-13T02:02:00.000Z"
}
},
{
"_id" : ObjectId("5ff539269a10d529d88d19f4"),
"ip_address" : "192.168.0.7",
"mac_address" : "00:A0:C9:14:C8:30",
"url" : "https://people.richland.edu/dkirby/141macaddress.htm",
"datetimes" : {
"$date" : "2021-02-12T19:00:00.000Z"
}
}
]
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.