简体   繁体   中英

How do you handle lists that require joined data from multiple data sources in AppSync/GraphQL?

type Employee {
    id: String!
    name: String
    lastObservedStatus: String
}

type Query {
    employees: [Employee]
}

This is a fictional schema to illustrate my question. I have two separate data sources that return lists that need to be joined in order to populate the response. The first data source 'employee list api' is an http API I can query to get an authoritative list of employees that I can use to populate the id and name columns. For example, I get a response like this:

[
    {"id": "001", "name": "Harry"},
    {"id": "002", "name": "Jerry"},
    {"id": "003", "name": "Larry"}
]

I have a second http API 'employee observation log' I can query to get a list of statuses together with the associated ids. The id allows me to associate the number to an entry in the employee record, and I have a record date. There may be more than one status record, but in GraphQL I want to pick only the most recent one. Example response:

[
    {"id":"002", "TimeStamp":"2021-07-01T12:30:00Z", "status": "eating"},
    {"id":"002", "TimeStamp":"2021-07-01T13:10:00Z", "status": "staring out the window"},
    {"id":"001", "TimeStamp":"2021-07-01T16:00:00Z", "status": "sleeping in lobby"}
]

Now, I want the graphQL response to return something like this:

{
  "data": {
    "employees": [
      {
        "id": "001",
        "name": "Harry",
        "lastObservedStatus": "sleeping in lobby"
      },
      {
        "id": "002",
        "name": "Jerry",
        "lastObservedStatus": "staring out the window"
      },
      {
        "id": "003",
        "name": "Larry",
        "lastObservedStatus": null
      }
    ]
  }
}

Since 'employee list api' is the authoritative source about which employees exist, all queries to the 'employee' field should always trigger a query to that api, but the 'employee observation log' api should only be triggered if the 'lastObservedStatus' field is selected in the query.

For a schema like this, where should the resolvers be registered? I've read that the best practice is to always attach resolvers at the leaf nodes, but I'm not sure how that can be done in this situation. I'm not even sure what happens if you attach a resolver on subfields of a list.

I feel like the correct way to handle this is to attach a lambda resolver to the employees field, and in the lambda resolver check the query's selectionSetList to check whether or not the 'lastObservedStatus' field has been selected. If not, then the lambda only queries 'employee list api', but otherwise the lambda also queries 'employee observation log' and does something similar to a SQL join before returning the result. But is that the correct way to handle this?

It sounds like what you need is a resolver on the lastObservedStatus field that uses your second API ('employee observation log') as the data source, where the Query field employees is using the first API as its data source.

This resolver should do a query using the context source field (the 'parent' values, in this case id and name of the Employee which you can reference). You can reference this in the VTL code using $ctx.source.id for example, or $ctx.source.name if you needed the name.

This resolver should only query the status for a single Employee, since it'll be invoked once per every result in your Query field employees .

There is another option as well, which would be to have a 2 function pipeline resolver where each function points at a different data source:

  • Step 1 resolves all fields except for lastObservedStatus
  • Step 2 resolves lastObservedStatus and stitches the results with the $ctx.prev.result .

This will be messier to implement, but will require fewer API calls if designed properly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM