简体   繁体   中英

AWS S3 - Search a specific text in nested JSON file

{
    "repo": [
        {
            "book": 47,
            "version": 1,
            "bookName": "Book1",
            "chapters": [
                {
                    "chapterId": 1,
                    "chapterContents": [
                        {
                            "line": 1,
                            "Text": "1. The first text of chapter 1 book 1"
                        },
                        {
                            "line": 2,
                            "Text": "2. The second text of chapter 1 book 1"
                        }
                    ]
                },
                {
                    "chapterId": 2,
                    "chapterContents": [
                        {
                            "line": 1,
                            "Text": "1. The first text of chapter 2 book 1"
                        }
                    ]
                }
            ]
        }
    ]
}

This is the format for the data model stored as JSON file in Amazon S3. There are lot of books with lot of chapters and text contents. The requirement is to search a particular text across all books and list the line, chapter, book and the version where the text is found. How can I make S3 queries on nested array JSON file?

You can use Amazon Athena to query nested JSON data from S3.

Searching for values inside json should be possible. You can find an example here :

WITH dataset AS (
  SELECT * FROM (VALUES
    (JSON '{"name": "Bob Smith", "org": "legal", "projects": ["project1"]}'),
    (JSON '{"name": "Susan Smith", "org": "engineering", "projects": ["project1", "project2", "project3"]}'),
    (JSON '{"name": "Jane Smith", "org": "finance", "projects": ["project1", "project2"]}')
  ) AS t (users)
)
SELECT json_extract_scalar(users, '$.name') AS user
FROM dataset
WHERE json_array_contains(json_extract(users, '$.projects'), 'project2')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM