How do I denormalize my relational data for AWS CloudSearch documents?

Question

AWS CloudSearch expects you will send them flattened documents of your data to index for search which look something like:

[
 {"type": "add",
  "id":   "123456",
  "fields": {
     "account_id": "123456",
     "name": "foo",
     "addresses": []
  }
 }
]

Let's assume I have a database with an accounts table and an addresses table.

There are many addresses for each account . The addresses table has the fields:

address_1
address_2
city
state
zip
account_id (reference field)

How would I denormalize addresses in the CloudSearch document structure so that I can search across all of the columns in accounts and addresses?

Or should I be creating a separate search domain for each table?

Answer 1

I'm assuming your use cases to be:

Retrieving addresses by their account_id
Retrieving account_ids by an address
Finding accounts in a particular city/state/zip

I would recommend the following two things:

Index each address as a separate document
I would index each address as a separate document. Having a separate doc for each address will enable you to keep the relationships between different fields (which you would lose if you had an array of cities and an array of states for each account).
Index each field separately
I would index each field (city, state, etc) separately. Breaking out each field will enable you to search them independently (eg get all the addresses in Cleveland, OH), use them as facets, boost scores based on them, etc.

Here's an example of some documents in my proposed schema:

[
 {"type": "add",
  "id":   "<see below>",
  "fields": {
     "account_id": "123456",
     "name": "John Smith",
     "address_1": "1 Main St",
     "address_2": "Apt 1",
     "city": "Davenport",
     "state": IA,
     "zip": 52081
  }
 },
 {"type": "add",
  "id":   "<see below>",
  "fields": {
     "account_id": "123456",
     "name": "John Smith",
     "address_1": "2 Elm St",
     "city": "Lincoln",
     "state": NE,
     "zip": 23452
  }
 }
]

Generating Document IDs:

Note that you'd need some non-random way to construct unique document_ids (unique per account+address, not just per account). Something like the account_id plus a hash of the address,city,state,zip would work, or you could add another column to your table to uniquely identify them (I prefer the latter).

How do I denormalize my relational data for AWS CloudSearch documents?

Question

1 answers

solution1
0 2015-05-05 14:54:20

How do I denormalize my relational data for AWS CloudSearch documents?

Question

1 answers

solution1 0 2015-05-05 14:54:20

solution1
0 2015-05-05 14:54:20