AWS Athena 中的特殊字符显示为问号

Question

I've added a table in AWS Athena from a csv file, which uses special characters "æøå".我从 csv 文件在 AWS Athena 中添加了一个表，它使用特殊字符“æøå”。 These show up as � in the output. The csv file is encoded using unicode. I've also tried changing the encoding to UTF-8, with no luck.这些在 output 中显示为 �。csv 文件使用 unicode 进行编码。我还尝试将编码更改为 UTF-8，但没有成功。 I've uploaded the csv in S3 and then added the table to Athena using the following DDL:我已经在 S3 中上传了 csv，然后使用以下 DDL 将表添加到 Athena：

CREATE EXTERNAL TABLE `regions_dk`(
  `postnummer` string COMMENT 'from deserializer', 
  `kommuner` string COMMENT 'from deserializer', 
  `regioner` string COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( 
  'separatorChar'='\;') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://bucket/path'
TBLPROPERTIES (
  'classification'='csv')

I have another table which also includes the characters "æøå", which I added using an ETL script, and here there's no issue.我有另一个表，其中还包含字符“æøå”，这是我使用 ETL 脚本添加的，这里没有问题。

What am I overlooking?我忽略了什么？

Answer 1

I uploaded an ANSI encoded file to S3, there was several unreadable data left, I changed the encoding of the file from the PC to UTF-8, I did the process again and everything was fine.我上传了一个ANSI编码的文件到S3，有几个无法读取的数据，我把文件的编码从PC改成UTF-8，我又做了一遍，一切正常。

I did it with sublimetext.我是用 sublimetext 做的。

AWS Athena 中的特殊字符显示为问号

问题描述

1 个解决方案

解决方案1
0 2021-10-10 18:25:07

AWS Athena 中的特殊字符显示为问号

问题描述

1 个解决方案

解决方案1 0 2021-10-10 18:25:07

解决方案1
0 2021-10-10 18:25:07