简体   繁体   中英

Problem loading Snowflake tables from SAP Data Services using S3 bucket

I'm trying to load tables from Data services to Snowflake using a s3 bucket (it's required to bulkload the info).

I can't format output files to s3 bucket. I have problems with break lines (doesn't break the lines) and dates (extra precision), and probably i'll have problem with commas if any text has one (actually separators are comma).

I've seen the posibility to write the file as a json in s3 bucket with nested schema. But if i do that i don't know how to call the copy into from Snowflake.

This project is a migration. I'm changing the old database for snowflake. Jobs in SAP DS are already created, and the idea is just change the destiny, but not the information flow.

If someone can bring me some help would be awesome. Thanks

You can use a table with a single column of type VARIANT to load the json file.

Here is an example:

/* Create a JSON file format that strips the outer array. */

create or replace file format json_format
  type = 'JSON'
  strip_outer_array = true;

/* Create an internal stage that references the JSON file format. */

create or replace stage mystage
  file_format = json_format;

/* Stage the JSON file. */

put file:///tmp/sales.json @mystage auto_compress=true;

/* Create a target table for the JSON data. */

create or replace table house_sales (src variant);

/* Copy the JSON data into the target table. */

copy into house_sales
   from @mystage/sales.json.gz;

select * from house_sales;

+---------------------------+
| SRC                       |
|---------------------------|
| {                         |
|   "location": {           |
|     "city": "Lexington",  |
|     "zip": "40503"        |
|   },                      |
|   "price": "75836",       |
|   "sale_date": "4-25-16", |
|   "sq__ft": "1000",       |
|   "type": "Residential"   |
| }                         |
| {                         |
|   "location": {           |
|     "city": "Belmont",    |
|     "zip": "02478"        |
|   },                      |
|   "price": "92567",       |
|   "sale_date": "6-18-16", |
|   "sq__ft": "1103",       |
|   "type": "Residential"   |
| }                         |
| {                         |
|   "location": {           |
|     "city": "Winchester", |
|     "zip": "01890"        |
|   },                      |
|   "price": "89921",       |
|   "sale_date": "1-31-16", |
|   "sq__ft": "1122",       |
|   "type": "Condo"         |
| }                         |
+---------------------------+

For more information have a look here

You can also query directly a JSON file staged, see below example:

create or replace file format my_json_format type = 'json';
select * from @~/example_2.json.gz 
(
  file_format => my_json_format
);

I get:

{
          "quiz": {
                    "maths": {
                              "q1": {
                                        "answer": "12",
                                        "options": [
                                                  "10",
                                                  "11",
                                                  "12",
                                                  "13"
                                        ],
                                        "question": "5 + 7 = ?"
                              },
                              "q2": {
                                        "answer": "4",
                                        "options": [
                                                  "1",
                                                  "2",
                                                  "3",
                                                  "4"
                                        ],
                                        "question": "12 - 8 = ?"
                              }
                    },
                    "sport": {
                              "q1": {
                                        "answer": "Huston Rocket",
                                        "options": [
                                                  "New York Bulls",
                                                  "Los Angeles Kings",
                                                  "Golden State Warriros",
                                                  "Huston Rocket"
                                        ],
                                        "question": "Which one is correct team name in NBA?"
                              }
                    }
          }
}

I can also do:

select parse_json($1):quiz.maths from @~/example_2.json.gz 
(
  file_format => my_json_format
);

And I get:

{
          "q1": {
                    "answer": "12",
                    "options": [
                              "10",
                              "11",
                              "12",
                              "13"
                    ],
                    "question": "5 + 7 = ?"
          },
          "q2": {
                    "answer": "4",
                    "options": [
                              "1",
                              "2",
                              "3",
                              "4"
                    ],
                    "question": "12 - 8 = ?"
          }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM