简体   繁体   中英

postgresql(aws redshift) error 1204 String length exceeds DDL length

I am trying to import csv into aws redshift( postgresql 8.x) .

The data flow is: mysql -> parquet files on s3 -> csv files on s3 -> redshift.

Table structure

The mysql table sql:

create table orderitems
(
  id                         char(36) collate utf8_bin not null
    primary key,
  store_id                   char(36) collate utf8_bin not null,
  ref_type                   int                       not null,
  ref_id                     char(36) collate utf8_bin not null,
  store_product_id           char(36) collate utf8_bin not null,
  product_id                 char(36) collate utf8_bin not null,
  product_name               varchar(50)               null,
  main_image                 varchar(200)              null,
  price                      int                       not null,
  count                      int                       not null,
  logistics_type             int                       not null,
  time_create                bigint                    not null,
  time_update                bigint                    not null,
  ...
);

I used same sql to create table in redshift , but it got error while importing csv.

My code import csv to redshift (python)

# parquet is dumpy by sqoop
p2 = 'xxx'
df = pd.read_parquet(path)    

with smart_open.smart_open(p2, 'w') as f:
    df.to_csv(f, index=False)  # python3 default encoding is utf-8

conn = psycopg2.connect(CONN_STRING)

sql="""COPY %s FROM '%s' credentials 'aws_iam_role=%s' region 'cn-north-1' 
delimiter ',' FORMAT AS CSV IGNOREHEADER 1 ; commit ;""" %  (to_table, p2, AWS_IAM_ROLE)
print(sql)
cur = conn.cursor()
cur.execute(sql)
conn.close()

Got error

By checking STL_LOAD_ERRORS found error on product_name column 在此处输入图片说明

  • row_field_value : .............................................215g/...
  • err_code: 1204
  • err_reason: String length exceeds DDL length

The real_value is 伊利畅轻蔓越莓奇亚籽风味发酵乳215g/瓶 ( chinese) .

So it looks like some encoding problem. Since mysql is utf-8 and the csv is utf-8 too , I don't know what is wrong .

Your column is a varchar data type, with length 50. That's 50 bytes , not 50 characters . The string example you've given looks to be about 16 chinese characters, which are probably 3 bytes each (UTF-8) and four ASCII characters (one byte each), so about 52 bytes. That's longer than the byte length of the column, so the import fails.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM