简体   繁体   中英

How to load correct data in redshift spectrum table?

I am trying to load data from s3 bucket to spectrum table but data is not showing correct. Could you please help me to resolve the below issue.

Example:

website is one column where data is - www.calidadhh.com/\

but while loading into table it is taking other column value - www.calidadhh.com/Home Health Agency Corporation

I am running below query -

drop table schema.dhs_account_ovrvw;
create external table schema.dhs_account_ovrvw(
hospital_id INTEGER
,hospital_name VARCHAR(255)
,hq_address VARCHAR(255)
,hq_address1 VARCHAR(255)
,hq_city VARCHAR(255)
,hq_state VARCHAR(50)
,hq_zip_code VARCHAR(50)
,hq_county VARCHAR(255)
,website VARCHAR(20000)
,firm_type VARCHAR(510)
,rec_add_user_nm VARCHAR(20)
,rec_add_tms TIMESTAMP
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ('quoteChar' = '\"','escapeChar' = '\\','separatorChar' = '\001')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://*******/Refined_Layer/DHS_ACC_OVRVW/'
TABLE PROPERTIES ('skip.header.line.count' = '1');

Output it is generating -

hospital_id,hospital_name,hq_address,hq_address1,hq_city,hq_state,hq_zip_code,hq_county,website,firm_type,rec_add_user_nm,rec_add_tms
929719  Calidad Home Health & Hospice   1600 E Expy 83      La Feria    TX  78559   TX - Cameron    www.calidadhh.com/Home Health Agency Corporation   CDP_USER    2021-11-22 07:39:57 

Output should be -

hospital_id,hospital_name,hq_address,hq_address1,hq_city,hq_state,hq_zip_code,hq_county,website,firm_type,rec_add_user_nm,rec_add_tms
929719  Calidad Home Health & Hospice   1600 E Expy 83      La Feria    TX  78559   TX - Cameron    www.calidadhh.com/\  Home Health Agency Corporation CDP_USER    2021-11-22 07:39:57

The problem looks to be that '' is being seen as the escape character (serde default). It is not clear what character is following the '' in your desired output (looks like a space) so it is confusing. If you change the escape character to something that is not in your data these characters ('\ ') should come through uninterpreted. I think adding

ESCAPECHAR='^' 

to the SERDEPROPERTIES will get you closer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM