简体   繁体   English

AWS Redshift Spectrum 不适用于 apache 镶木地板文件

[英]AWS Redshift Spectrum not working with apache parquet files

following is my sample csv file.以下是我的示例 csv 文件。

id,name,gender
1,isuru,male
2,perera,male
3,kasun,male
4,ann,female

i converted above csv file into apache parquet using pandas library.我使用 pandas 库将上述 csv 文件转换为 apache 镶木地板。 following is my code.以下是我的代码。

import pandas as pd
    
df = pd.read_csv('./data/students.csv')
df.to_parquet('students.parquet')

after that i uploaded the parquet file into the s3 and created a external table like below.之后,我将镶木地板文件上传到 s3 并创建了一个如下所示的外部表。

create external table imp.s1 (
id integer,
name varchar(255),
gender varchar(255)
)
stored as PARQUET 
location 's3://sample/students/';

after that i just run select query, but i got following error.之后我只运行 select 查询,但出现以下错误。

select * from imp.s1

Spectrum Scan Error. File 'https://s3.ap-southeast-2.amazonaws.com/sample/students/students.parquet' 
has an incompatible Parquet schema for column 's3://sample/students.id'. 
Column type: INT, Parquet schema:\noptional int64 id [i:0 d:1 r:0] 
(s3://sample/students.parquet)

Could you please help me to figure out what's the problem in here?你能帮我弄清楚这里有什么问题吗?

For NULLable integer values, Pandas use the dtype Int64 that correspond to Bigint in Parquet Amazon S3.对于 NULLable integer 值,Pandas 使用与 Parquet Amazon S3 中的Bigint对应的 dtype Int64。

Parquet Amazon S3 File Data Type Parquet Amazon S3 文件数据类型 Transformation转型 Description描述
Int32整数32 Integer Integer -2,147,483,648 to 2,147,483,647 (Precision of 10, scale of 0) -2,147,483,648 到 2,147,483,647(精度为 10,比例为 0)
Int64整数64 Bigint比格特 -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (Precision of 19, scale of 0) -9,223,372,036,854,775,808 至 9,223,372,036,854,775,807(精度为 19,比例为 0)

You need to explicitly set the column type of id when calling pandas.read_csv .调用pandas.read_csv时需要显式设置id的列类型。

df = pd.read_csv('./data/students.csv', dtype={'id': 'int32'})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM