简体   繁体   English

使用rmongodb在R中运行高级MongoDB查询

[英]Running advanced MongoDB queries in R with rmongodb

As MySQL is driving me nuts I'm trying to make myself acquainted with my first "NoSQL" DBMS and it happened to be MongoDB . 由于MySQL让我疯狂,我试图让自己熟悉我的第一个“NoSQL”DBMS,它恰好是MongoDB I'm connecting to it via rmongodb . 我通过rmongodb连接到它。

The more I play around with rmongodb , the more questions/problems come up with respect to running advanced queries. 我使用rmongodb越多,运行高级查询就会遇到更多问题/问题。

First I present some example data before I go into detail about the different types of queries that I can't seem to specify correctly. 首先介绍一些示例数据,然后再详细介绍我无法正确指定的不同类型的查询。

Example Data 示例数据

The example is taken from the MongoDB website and has been simplified a bit. 这个例子来自MongoDB网站 ,并且已经简化了一些。

pkg <- "rmongodb"
if (!require(pkg, character.only=TRUE)) {
    install.packages(pkg)
    require(pkg, character.only=TRUE)   
}

# Connect to DB
db <- "test"
ns <- "posts"
mongo <- mongo.create(db=db)

# Insert document to collection 'test.users'
b <- mongo.bson.from.list(list(
    "_id"="alex", 
    name=list(first="Alex", last="Benisson"),
    karma=1.0,
    age=30,
    test=c("a", "b")
))
mongo.insert(mongo, "test.users", b)

# Insert document to collection 'test.posts'
b <- mongo.bson.from.list(list(
        "_id"="abcd",
        when=mongo.timestamp.create(strptime("2011-09-19 02:00:00",
            "%Y-%m-%d %H:%M:%s"), increment=1),
        author="alex",
        title="Some title",
        text="Some text.",
        tags=c("tag.1", "tag.2"),
        votes=5,
        voters=c("jane", "joe", "spencer", "phyllis", "li"),
        comments=list(
            list(
                who="jane", 
                when=mongo.timestamp.create(strptime("2011-09-19 04:00:00",
                    "%Y-%m-%d %H:%M:%s"), increment=1),
                comment="Some comment."
            ),
            list(
                who="meghan", 
                when=mongo.timestamp.create(strptime("2011-09-20 13:00:00",
                    "%Y-%m-%d %H:%M:%s"), increment=1),
                comment="Some comment."
            )
        )
    )
)
b
mongo.insert(mongo, "test.posts", b)

Two questions related to inserting JSON/BSON objects: 与插入JSON / BSON对象有关的两个问题:

  1. Document 'test.posts', field voters : is it correct to use c() in this case? 文件'test.posts',现场voters :在这种情况下使用c()是否正确?
  2. Document 'test.posts', field comments : what's the right way to specify this, c() or list() ? 文档'test.posts',字段comments :指定这个, c()list()的正确方法是什么?

Top Level Queries: they work a treat 顶级查询:他们是一种享受

Top level queries work just fine: 顶级查询工作正常:

# Get all posts by 'alex' (only titles)
res <- mongo.find(mongo, "test.posts", query=list(author="alex"), 
    fields=list(title=1L))
out <- NULL
while (mongo.cursor.next(res))
    out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))

> out
[[1]]
                       _id                      title 
                     "abcd"            "No Free Lunch" 

Question 1: Basic Sub Level Queries 问题1:基本子级查询

How can run a simple "sub level queries" (as opposed to top level queries) that need to reach into arbitrarily deep sublevels of a JSON/BSON style MongoDB object? 如何运行一个简单的“子级别查询”(而不是顶级查询),需要进入JSON / BSON样式的MongoDB对象的任意深度子级别? These sub level queries make use of MongoDB's dot notation and I can't seem to figure out how to map that to a valid rmongodb query 这些子级查询使用MongoDB的点表示法 ,我似乎无法弄清楚如何将其映射到有效的rmongodb查询

In plain MongoDB syntax, something like 在简单的MongoDB语法中,类似于

> db.posts.find( { comments.who : "meghan" } )

would work. 会工作。 But I can't figure out how to do that with rmongodb functions 但我无法弄清楚如何用rmongodb函数做到这一点

Here's what I tried so far 这是我到目前为止所尝试的内容

# Get all comments by 'meghan' from 'test.posts'

#--------------------
# Approach 1)
#--------------------
res <- mongo.find(mongo, "test.posts", query=list(comments=list(who="meghan")))
out <- NULL
while (mongo.cursor.next(res))
    out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))

> out
NULL
# Does not work

#--------------------
# Approach 2) 
#--------------------
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "comments")
mongo.bson.buffer.append(buf, "who", "meghan")
mongo.bson.buffer.finish.object(buf)
query <- mongo.bson.from.buffer(buf)
res <- mongo.find(mongo, "test.posts", query=query)
out <- NULL
while (mongo.cursor.next(res))
    out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))

> out
NULL
# Does not work

Question 2: Queries Using $ Operators 问题2:使用$运算符查询

These work 这些工作

Query 1 查询1

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "age")
mongo.bson.buffer.append(buf, "$lte", 30)
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria

> mongo.find.one(mongo, "test.users", query=criteria)
    _id : 2      alex
    name : 3     
        first : 2    Alex
        last : 2     Benisson

    karma : 1    1.000000
    age : 1      30.000000
    test : 4     
        0 : 2    a
        1 : 2    b

Query 2 查询2

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "test")
mongo.bson.buffer.append(buf, "$in", c("a", "z"))
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria
mongo.find.one(mongo, "test.users", query=criteria)

However, notice that an atomic set will result in a return value of NULL 但是,请注意原子集将导致返回值为NULL

mongo.bson.buffer.append(buf, "$in", "a")
# Instead of 'mongo.bson.buffer.append(buf, "$in", c("a", "z"))'

Trying the same with sub level queries I'm lost again 尝试与子级别查询相同,我再次失去了

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "name")
mongo.bson.buffer.start.object(buf, "first")
mongo.bson.buffer.append(buf, "$in", c("Alex", "Horst"))
mongo.bson.buffer.finish.object(buf)
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria <- mongo.bson.from.buffer(buf)
> criteria
    name : 3     
        first : 3    
            $in : 4      
                0 : 2    Alex
                1 : 2    Horst

> mongo.find.one(mongo, "test.users", query=criteria)
NULL

Either c() or list() can be ok. c()或list()都可以。 Depends on whether the components are named and whether they all have the same type (for list). 取决于组件是否已命名以及它们是否都具有相同的类型(对于列表)。 Best thing to do is look at the generated BSON and see if you are getting what you want. 最好的办法是查看生成的BSON,看看你是否得到了你想要的东西。 For the best control of the generated object use mongo.bson.buffer and the functions that operate on it. 为了最好地控制生成的对象,请使用mongo.bson.buffer以及对其进行操作的函数。 In fact this is why the sub-queries are failing. 实际上,这就是子查询失败的原因。 'comments' is being created as a subobject rather than an array. 'comments'被创建为子对象而不是数组。 mongo.bson.from.list() is handy but it doesn't give you the same control and sometimes it guesses wrong about what to generate from complicated structures. mongo.bson.from.list()很方便,但它没有给你相同的控制,有时它猜测从复杂结构生成什么是错误的。

The query on the other set of data can be corrected like so though: 对其他数据集的查询可以像这样更正:

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "name.first")
mongo.bson.buffer.append(buf, "$in", c("Alex", "Horst"))
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)

Note that you definitely need to use a buffer here since R will choke on the dotted name. 请注意,你肯定需要在这里使用一个缓冲区,因为R会阻塞虚线名称。

I hope this straightens out your problem. 我希望这能解决你的问题。 Let me know if you have any further questions. 如果您有任何其他问题,请与我们联系。

I'm still not very clear on what's the preferred way here on SO to progress once a question has been posted but one wishes to elaborate a bit more, possibly adding further questions and answer approaches. 我还不是很清楚,一旦问题发布,这里SO的首选方式是什么,但是人们希望进一步阐述,可能还会增加更多的问题和答案方法。

As I was often told not to blow up my original question with future edits, in this "answer" I'm simply taking the suggestions by Gerald Lindsly and try to put it into actual code (because it still didn't work out for me): 由于我经常被告知不要用未来的编辑来夸大我原来的问题,在这个“答案”中,我只是接受Gerald Lindsly的建议,并试着把它放到实际的代码中(因为它仍然不适合我):

Preparations 准备工作

pkg <- "rmongodb"
if (!require(pkg, character.only=TRUE)) {
    install.packages(pkg)
    require(pkg, character.only=TRUE)   
}

# Connect to DB
db <- "test"
ns <- "posts"
mongo <- mongo.create(db=db)

# Make sure we start with an empty collection
mongo.drop(mongo, paste(db, ns, sep="."))

Insert document 插入文件

As Gerald has pointed out in his answer, mongo.bson.from.list() sometimes makes wrong guesses about the resulting BSON structure, so I tried to go ahead an explicitly create BSON array objects: 正如Gerald在他的回答中指出的那样, mongo.bson.from.list()有时会对生成的BSON结构做出错误的猜测,所以我试着继续显式创建BSON数组对象:

buf <- mongo.bson.buffer.create()

# 'REGULAR' APPENDING
mongo.bson.buffer.append(buf, "_id", "abcd")
mongo.bson.buffer.append(buf, "when", 
    mongo.timestamp.create(strptime("2011-09-19 02:00:00",
        "%Y-%m-%d %H:%M:%s"), increment=1))
mongo.bson.buffer.append(buf, "author", "alex")
mongo.bson.buffer.append(buf, "title", "Some title")
mongo.bson.buffer.append(buf, "text", "Some text.")
mongo.bson.buffer.append(buf, "tags", c("tag.1", "tag.2"))
mongo.bson.buffer.append(buf, "votes", 5)
# /

# VOTERS ARRAY
mongo.bson.buffer.start.array(buf, "voters")
voters <- c("jane", "joe", "spencer", "phyllis", "li")
i=1
for (i in seq(along=voters)) {
    mongo.bson.buffer.append(buf, as.character(i), voters[i])
}
mongo.bson.buffer.finish.object(buf)
# /

# COMMENTS ARRAY
mongo.bson.buffer.start.array(buf, "comments")

mongo.bson.buffer.start.object(buf, "1")
mongo.bson.buffer.append(buf, "who", "jane")
mongo.bson.buffer.append(buf, "when", 
    mongo.timestamp.create(strptime("2011-09-19 04:00:00",
            "%Y-%m-%d %H:%M:%s"), increment=1))
mongo.bson.buffer.append(buf, "comment", "some comment.")
mongo.bson.buffer.finish.object(buf)

mongo.bson.buffer.start.object(buf, "2")
mongo.bson.buffer.append(buf, "who", "meghan")
mongo.bson.buffer.append(buf, "when", 
    mongo.timestamp.create(strptime("2011-09-20 13:00:00",
            "%Y-%m-%d %H:%M:%s"), increment=1))
mongo.bson.buffer.append(buf, "comment", "some comment.")
mongo.bson.buffer.finish.object(buf)
# /

# FINALIZE
mongo.bson.buffer.finish.object(buf)
b <- mongo.bson.from.buffer(buf)
> b
_id : 2      abcd
when : 17    i: 1, t: 1316390400
author : 2   alex
title : 2    Some title
text : 2     Some text.
tags : 4     
    0 : 2    tag.1
    1 : 2    tag.2

votes : 1    5.000000
voters : 4   
    1 : 2    jane
    2 : 2    joe
    3 : 2    spencer
    4 : 2    phyllis
    5 : 2    li

comments : 4     
    1 : 3    
        who : 2      jane
        when : 17    i: 1, t: 1316397600
        comment : 2      some comment.

    2 : 3    
        who : 2      meghan
        when : 17    i: 1, t: 1316516400
        comment : 2      some comment.

mongo.insert(mongo, "test.posts", b)

Basic sub-level query 基本的子级查询

# Get all comments by 'meghan' from 'test.posts'

#--------------------
# Approach 1)
#--------------------
res <- mongo.find(mongo, "test.posts", query=list(comments=list(who="meghan")))
out <- NULL
while (mongo.cursor.next(res))
    out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))

> out
NULL
# Does not work

#--------------------
# Approach 2) 
#--------------------
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "comments")
mongo.bson.buffer.append(buf, "who", "meghan")
mongo.bson.buffer.finish.object(buf)
query <- mongo.bson.from.buffer(buf)
res <- mongo.find(mongo, "test.posts", query=query)
out <- NULL
while (mongo.cursor.next(res))
    out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))

> out
NULL
# Does not work

I still must be doing something wrong here when specifying the document ;-) 在指定文档时我仍然必须在这里做错事;-)

Regarding atomic queries and the $in operator, I got Query 2 from your first question to work as follows: 关于原子查询和$ in运算符,我从第一个问题得到了查询2,如下所示:

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "test")
mongo.bson.buffer.start.array(buf, "$in")
mongo.bson.buffer.append(buf, "a", "a")
mongo.bson.buffer.finish.object(buf)
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria

I guess explicitly starting and ending the array does the trick, if the array is going to end up holding only one element. 我猜如果数组最终只能保存一个元素,那么明确地开始和结束数组就可以了。

One thing that can be useful is monitoring the mongod console or log (after starting mongod with the -v option). 有用的一件事是监视mongod控制台或日志(在使用-v选项启动mongod之后)。 Running your old query, you'll see: 运行旧查询,您会看到:

Tue Nov 20 16:09:04 [conn23] User Assertion: 12580:invalid query
Tue Nov 20 16:09:04 [conn23] assertion 12580 invalid query ns:test.users query:{ test: { $in: "a" } }
Tue Nov 20 16:09:04 [conn23] problem detected during query over test.users : { $err: "invalid query", code: 12580 }
Tue Nov 20 16:09:04 [conn23] query test.users query: { test: { $in: "a" } } ntoreturn:0 keyUpdates:0 exception: invalid query code:12580 locks(micros) r:440 reslen:59 0ms

Running the modified query, it looks ok: 运行修改后的查询,看起来没问题:

Tue Nov 20 16:10:14 [conn23] query test.users query: { test: { $in: [ "a" ] } } ntoreturn:0 keyUpdates:0 locks(micros) r:168 nreturned:1 reslen:142 0ms

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM