简体   繁体   中英

Wildcard or “LIKE” search in azure cosmos db with Gremlin Graph API

I'm trying to search for vertices in a kind of wildcard search. In SQL it would be: "where name like '%abc%'". Neither Gremlin graph traversal nor SQL-queries support it.

The use case is to filter a 1:n dependency, eg "Show me all my customers whose name contains 'Sam'". This is pretty basic and easy with SQL. It is not an overall full text search but simply a filter in this specific 1:n releationship.

Following SQL works:

SELECT * FROM g 
  where (g.label = "person" and g.name[0]._value = 'Sam')

which is equivalent to:

g.V().hasLabel("person").has("name", "Sam")

Following SQL does not work ("Syntax error, incorrect syntax near 'like'):

SELECT * FROM g 
  where (g.label = "person" and g.name[0]._value like 'Sam')

Trying to use a lambda in a Gremlin "filter" step results in an error, too.

Is it a good idea to write a UDF oder Stored Procedure for this kind of search? How is indexing handled in that case? Are there any alternatives?

Thanks a lot

What about something like this:

g.V().has("person", "name", between("Sam", "San"))

Kelvin Lawrence wrote a book on Gremlin that you might find helpful (I did!).

I hit this brick-wall limitation regarding Azure's implementation of gremlin and fuzzy matching too

There are a couple of approaches you can take at this stage, but the best solution depends on your goals and constraints, hope these constitute some inspiration...

One way you could progress this is to implement a caching layer in your repository/data access layer and query the in-memory collection using code

Another option, depending on the complexity of the patterns you will be searching for is to break the name down into segments (which will effectively be descending in accuracy level) and search liberally for parts of the name

Eg

Name = Sam Smith

Properties:

Firstname = Sam,

FirstNameInitial = S,

Lastname = Smith,

LastnameInitial = S

Init Data

g.addV('Person').property('Firstname', 'Sam').property('FirstnameInitial', 'S').property('Lastname', 'Smith').property('LastnameInitial', 'S')

Query Data

g.V().has('label', 'Person').has('Firstname', 'Sa').has('Lastname', 'Smit').fold().coalesce(
    unfold(),
    g.V().has('label', 'Person').has('FirstnameInitial', 'S').has('Lastname', 'Smit'),
    g.V().has('label', 'Person').has('Firstname', 'Sam').has('LastnameInitial', 'S'),
    g.V().has('label', 'Person').has('FirstnameInitial', 'S').has('LastnameInitial', 'S')
)

Coalesce evaluates the comma separated terms in order, return the first non-empty set. Used this way in conjunction with fold and unfold, the first term (everything before .fold().coalesce()) will return first if it can (via unfold() in coalesce statement), it will then try each of the following comma separated queries in sequence.

In this way, there's the possibility to start your search quite specifically and fall-back towards a more generic search. Obviously, you can take concept and evolve it to include the likes of 'Lastname2Initials', 'Lastname3Initials' and so on

Hope this helps you on your way, let me know what you end-up settling for!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM