简体   繁体   中英

How do I denormalize my relational data for AWS CloudSearch documents?

AWS CloudSearch expects you will send them flattened documents of your data to index for search which look something like:

[
 {"type": "add",
  "id":   "123456",
  "fields": {
     "account_id": "123456",
     "name": "foo",
     "addresses": []
  }
 }
]

Let's assume I have a database with an accounts table and an addresses table.

There are many addresses for each account . The addresses table has the fields:

  • address_1
  • address_2
  • city
  • state
  • zip
  • account_id (reference field)

How would I denormalize addresses in the CloudSearch document structure so that I can search across all of the columns in accounts and addresses?

Or should I be creating a separate search domain for each table?

I'm assuming your use cases to be:

  • Retrieving addresses by their account_id
  • Retrieving account_ids by an address
  • Finding accounts in a particular city/state/zip

I would recommend the following two things:

  • Index each address as a separate document

    I would index each address as a separate document. Having a separate doc for each address will enable you to keep the relationships between different fields (which you would lose if you had an array of cities and an array of states for each account).

  • Index each field separately

    I would index each field (city, state, etc) separately. Breaking out each field will enable you to search them independently (eg get all the addresses in Cleveland, OH), use them as facets, boost scores based on them, etc.

Here's an example of some documents in my proposed schema:

[
 {"type": "add",
  "id":   "<see below>",
  "fields": {
     "account_id": "123456",
     "name": "John Smith",
     "address_1": "1 Main St",
     "address_2": "Apt 1",
     "city": "Davenport",
     "state": IA,
     "zip": 52081
  }
 },
 {"type": "add",
  "id":   "<see below>",
  "fields": {
     "account_id": "123456",
     "name": "John Smith",
     "address_1": "2 Elm St",
     "city": "Lincoln",
     "state": NE,
     "zip": 23452
  }
 }
]

Generating Document IDs:

Note that you'd need some non-random way to construct unique document_ids (unique per account+address, not just per account). Something like the account_id plus a hash of the address,city,state,zip would work, or you could add another column to your table to uniquely identify them (I prefer the latter).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM