Tag[hive-partitions] Recent Newest Questions

How to read filtered partitioned parquet files efficiently using pandas's read_parquet?

Let say my data stored in object storage, say s3, with date time partition like this: According to pandas's read_parquet api docs, I can use filter ...

Hive insert into partitioned table with colums list from select

I want to insert into a partitioned Hive table tb_1(a, b, c, d, p1) only columns (a, b) from a select statement. Ex: insert into table tb_1 partition ...

In Foundry, how can I Hive partition with only 1 parquet file per value?

I'm looking to improve the performance on running filtering logic. To accomplish this, the idea is to do hive partitioning setting by setting the part ...

Databricks / Spark storage mechanism for Delta Tables, Delta Logs, Partitions etc

I am trying to understand how data is stored and managed in the DataBricks environment. I have a fairly decent understanding of what is going on under ...

Need to merge multiple hive partitions into one partition in spark

I have around 50 partitions in hive table. I need to merge each set of partitions into one partition. I tried to use rename partition command. But get ...

How to automatically update the Hive external table metadata partitions for streaming data

I am writing the spark streaming data into hdfs partitions using pyspark. please find the code After writing the data into hdfs, i am creating the ...

Performance of pyspark + hive when a table has many partition columns

I am trying to understand the performance impact on the partitioning scheme when Spark is used to query a hive table. As an example: Table 1 has 3 pa ...

Hive - incomplete rows in select from managed partitioned table

I need to copy data from a CSV file to a managed partitioned table in Hive. CSV file rows are: ------- I created a managed partitioned table on r ...

How to retain last N partitions for a hive external table?

I need to retain say last 7 partitions and data of a given hive external table. This can be either done via a shell script or a hive hql script. The ...

Querying based on Partition and non-partition column in Hive

I have an external Hive table as follows :- where table is partitioned by 3 columns ( countryCode, sourceNbr , date). I know that if i query based ...

Hive: read table partitions defined in subselect

I have a Hive table which is partitioned by partitionDate field. I can read partition of my choice via simple My task is to specify the partition o ...

Filtering is supported only on partition keys of type string Hive

We recently started facing issues with spark 2.4.4 with hive 1.2.1 version. when we are trying to read data from a table which is partition by string ...

Data Loaded wrongly into Hive Partitioned table after adding a new column using ALTER

I already have a Hive partitioned table. I needed to add a new column to the table, so i used ALTER to add the column like below. I have my final t ...

Hive load multiple partitioned HDFS file to table

I have some twice-partitioned files in HDFS with the following structure: and would like to load these into a hive table as elegantly as possible. ...

how to make max function in hive query to ignore _HIVE_DEFAULT_PARTITION__

I have a view which uses max to show the latest partition (which is of format 2021-01, 2021-02, 2021-03, 2021-04). The hive table has _HIVE_DEFAULT_PA ...

Will HIVE do a full table query with both partition conditions and not partition conditions?

I have a hive table partitioned by one date column name datetime If I do a query like with extra and id in (1,2) condition, will hive do a full tab ...

Hive Update partition vs MSCK Repair

I have a table with thousands of partition. I want to change all the partition location to diff cluster. Ex: for table test_table and partition day=20 ...

How partitioning and clustered by works in Hive table?

I'm trying to understand below query by using that how data is going to be placed. The keyword PARTITIONED BY will distribute the data in below lik ...

Can I use regular expression in PARTITION BY?

It has been partitioned by RequestTimestamp(2020-12-12T07:39:35.000+0000 ), but it has the format as below. Could I change the format to different f ...

Dynamic partitioned table in hive not updating the recent partitions

I have a folder in gcs bucket with a folder structure as I am trying to create a dynamic partitioned table on top of the folder by executing the be ...