简体   繁体   English

使用条件根据分区的镶木地板数据创建表

[英]Creating Table from Partitioned Parquet data by using conditions

I am trying to create a table from partitioned data from amazon s3 in databricks cluster . 我正在尝试从databricks cluster中的 Amazon s3的分区数据创建表。 Now the data I have is partitioned on the following 现在我拥有的数据在以下分区

ID , report and date IDreportdate

So I have mounted the data: 所以我已经安装了数据:

%python
ACCESS_KEY = "xxxxxxxxx"
SecretKey = "xxxxxxxxxx"
ENCODED_SECRET_KEY = SecretKey.replace("/", "%2F")
AWS_BUCKET_NAME = "path/parent_directory"
MOUNT_NAME = "parent"
dbutils.fs.mount("s3a://%s:%s@%s" % (ACCESS_KEY, ENCODED_SECRET_KEY, 
AWS_BUCKET_NAME), "/mnt/%s" % MOUNT_NAME)

Now as per the structure of my data's path would be something like this: 现在,按照我的数据路径的结构,将如下所示:

/dbfs/parent/id/report/date

Now, I want to create table based on the partition. 现在,我想基于分区创建表。 I want to specify a where condition in the create table where the report_name is specified in condition. 我想在创建表中指定where条件,并在其中指定report_name。 There are 5 reports inside the id folder. id文件夹中有5个报告。 My query is something like this: 我的查询是这样的:

%sql
Create table if not exists abc
(col1 string,
 col2 string,
 col3 bigint)using parquet
OPTIONS (path "/mnt/parent/")
partitioned by (id,report,date) where 
report="report1" ;

I am getting syntax error : 我收到语法错误

Error in SQL statement: ParseException:mismatched input 'where' expecting <EOF>

I also tried 我也试过

Create table if not exists report1
(
col1 string,
col2 string,
col3 bigint  )using parquet
OPTIONS (path "/mnt/parent/")
partitioned by (id,report="report1",date)

Can anyone help me with this? 谁能帮我这个? Or anyone can help me loading through spark-shell? 还是有人可以帮助我通过spark-shell加载?

Thanks 谢谢

I think what you really want is an unmanaged table over the data and a view that filters by that partition condition. 我认为您真正想要的是数据的非托管表和根据该分区条件进行过滤的视图。

create table report
using parquet
options (
  path '/mnt/parent'
);

msck repair table report;

create or replace view report1
as select * from report where report = 'report1';

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM