简体   繁体   中英

Unexpected Count & Filter Behaviour in AWS Neptune

I'm getting an unexpected StopIteration error with some gremlin queries that contain a count step within nested filter steps.

This error can be recreated with the following code (using Gremlin-Python , 3.5.0 in my case):

filter_header = g.addV().id().next()
count_headers = [g.addV().id().next() for _ in range(10)]

for i, c in enumerate(count_headers):
    # Add 10 nodes
    sub_nodes = [g.addV().id().next() for _ in range(10)]
    # Connect them all to the header
    for s in sub_nodes:
        g.V(c).addE('edge').to(__.V(s)).iterate()
    # Connect i of them to the filter header
    for s in sub_nodes[:i]:
        g.V(filter_header).addE('edge').to(__.V(s)).iterate()

# This raises StopIterationError
g.V(count_headers).filter(
    __.out('edge').filter(
        __.in_('edge').hasId(filter_header)
    ).count().is_(P.gt(1))
).count().next()

(Equivalently if using toList instead of next I get an empty list)

However this error doesn't happen if you unfold after the count :

# No StopIterationError
g.V(count_headers).filter(
    __.out('edge').filter(
        __.in_('edge').hasId(filter_header)
    ).count().unfold().is_(P.gt(1))
).count().next()

Neither does it happen if you use map instead of filter :

# No StopIterationError
g.V(count_headers).as_('c').map(
    __.out('edge').filter(
        __.in_('edge').hasId(filter_header)
    ).count().is_(P.gt(1))
).select('c').count().next()

I've tested and this error doesn't happen when using TinkerGraph, so I suspect this is specific to AWS Neptune.

I'd really appreciate any guidance as to why this happens, if I'm doing anything wrong, or what the differences are that means this just happens in Neptune. Alternatively - if the consensus is that this is a bug - I'd appreciate it if anyone could let me know where to raise it.

When using a Gremlin client, such as Gremlin Python, if a query has no result, the next step will throw an error. I prefer to always use toList as that way you are guaranteed to at least get an empty list back. If you use TinkerGraph locally with the Gremlin Console you will not see the same behavior. If getting no result is also unexpected, that is a second level item to explore.

As an example of the Python next behavior, here is a simple experiment using the Python console. If you run your same tests with a Gremlin Server backed by TinkerGraph you will see the same results.


>>> g.V().hasId('I do not exist').next()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ec2-user/.local/lib/python3.6/site-packages/gremlin_python/process/traversal.py", line 89, in next
    return self.__next__()
  File "/home/ec2-user/.local/lib/python3.6/site-packages/gremlin_python/process/traversal.py", line 50, in __next__
    self.last_traverser = next(self.traversers)
StopIteration

For anyone that finds themselves here: this was a bug that was fixed in Neptune Engine release 1.1.1.0 .

"Fixed a rare Gremlin bug where no results were returned when using nested filter() and count() steps in combination"

(Thanks to the Neptune team for fixing!)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM