简体   繁体   English

如何在mongodb中使用mapreduce?

[英]How to use mapreduce in mongodb?

I have the following code in python: 我在python中有以下代码:

from pymongo import Connection
import bson

c = Connection()
db = c.twitter

ids = db.users_from_united_states.distinct("user.id")

for i in ids:
    count = db.users_from_united_states.find({"user.id":i}).count()
    for u in db.users_from_united_states.find({"user.id":i, "tweets_text": {"$size": count}}).limit(1):
    db.my_usa_fitness_network.insert(u)

I need to get all the users and find the register of each user where the number of tweets_text is equal to the number of times that it appears in the collection (meaning that this document contains ALL the tweets that the same user posted). 我需要获取所有用户,并找到每个用户的注册,其中tweets_text的数量等于它在集合中出现的次数(这意味着该文档包含同一用户发布的所有tweets)。 Then, I need to save it in another collection, or just group it on the same collection. 然后,我需要将其保存在另一个集合中,或者仅将其分组在同一集合中。

When I run this code it gives me a number of documents that is less than the ids number 当我运行此代码时,它给我的文档数量少于ids数量

I saw something about mapReduce but I just can't figure out how to use it in my case. 我看到了有关mapReduce的一些信息,但是我不知道该如何使用它。

I tried to run another code directly on mongodb but it hasn't worked at all: 我试图直接在mongodb上运行另一个代码,但根本没有用:

var ids = db.users_from_united_states.distinct("user.id")

for (i=0; i< ids.length; i++){
    var count = db.users_from_united_states.find({"user.id":ids[i]}).count()
    db.users_from_united_states.find({"user.id":ids[i], "tweets_text": {$size: count}).limit(1).forEach(function(doc){db.my_usa_fitness_network.insert(doc)})
}

Can you help me please? 你能帮我吗? I have a huge project and I need help. 我有一个庞大的项目,需要帮助。 Thank you. 谢谢。

[
    {
        "$group": {
            "_id": "$user.id",
            "my_fitness_data": {
                "$push": "$text"
            }
        }
    },
    {
        "$project": {
            "UserId": "$_id",
            "TweetsCount": {
                "$size": "$my_fitness_data"
            },
            "Tweets": "$my_fitness_data"
        }
    }
    ]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM