Tag[hadoop-partitioning] Recent Newest Questions

How to write to Hive table with static partition using PySpark?

I've created a Hive table with a partition like this: Then with PySpark, I'm having a dataframe and I've tried to write it to the Hive table like t ...

How to drop hive partitions with hivevar passed as partition variable?

I have been trying to run this piece of code to drop current day's partition from hive a table and for some reason it does not drop the partition from ...

Is it relevant to partition by Business Date and Ingest Date for a FACT table on Delta Lake?

I am working on a data engineering case where i have a table Table_Movie partitionned by ingest date. Now, from time to time, i receive some old data. ...

Querying based on Partition and non-partition column in Hive

I have an external Hive table as follows :- where table is partitioned by 3 columns ( countryCode, sourceNbr , date). I know that if i query based ...

How to drop rows from partitioned hive table?

I need to drop specific rows from a Hive table, which is partitioned. These rows for deletion matches certain conditions, so entire partitions can not ...

How partitioning and clustered by works in Hive table?

I'm trying to understand below query by using that how data is going to be placed. The keyword PARTITIONED BY will distribute the data in below lik ...

Hadoop MapReduce example command not found

I've installed the Hadoop file and I'm trying to run the MapReduce example in the terminal, but am getting the command not found message, can someone ...

hive script failing due to heap space issue to process too many partitions

my script failing due to a heap space issue to process too many partitions. To avoid the issue I am trying to insert all the partitions into a single ...

How to run MSCK on Hive Standalone Metastore server via thrift client

I'm using Hive as my meta store database and the Hive Standalone Metastore for dealing with the DDLs, via this thrift client that implements the serve ...

joining hive partitioned , bucketed table with only bucketed table (not partitioned table) in hive

i have 2 tables: q6_cms_list_key1 (bucketed by cm and se) partitioned by tr_dt ... 99 000 000 000 rows q6_cm_first_visit (bucketed by cm and se) 25 0 ...

How map reduce is being performed in this HiveQL query?

How map reduce is working in this query and what is the significance of "CLUSTER BY" in this query? ...

Hand selecting parquet partitions vs filtering them in pyspark

This might be a dumb question, But is there any difference between manually specifying the partition columns in a parquet file, as opposed to loading ...

Kafka S3 Sink Connector - how to mark a partition as complete

I am using Kafka sink connector to write data from Kafka to s3. The output data is partitioned into hourly buckets - year=yyyy/month=MM/day=dd/hour=hh ...

Hive: why to use partition by in selects?

I cannot understand partitioning concept in Hive completely. I understand what are partitions and how to create them. What I cannot get is why people ...

optimizing reading from partitioned parquet files in s3 bucket

I have a large dataset in parquet format (~1TB in size) that is partitioned into 2 hierarchies: CLASS and DATE There are only 7 classes. But the Date ...

Moving files from one parquet partition to another

I have a very large amount of data in my S3 bucket partitioned by two columns MODULE and DATE such that the file structure of my parquets are: I ha ...

Can I create buckets in a Hive External Table?

I am creating an external table that refers to ORC files in an HDFS location. That ORC files are stored in such a way that the external table is parti ...

How to insert Hive partition column and value into data (parquet) file?

Request:- How can I insert partition key pair into each parquet file while inserting data into Hive/Impala table. Hive Table DDL [ create external ta ...

Drop partitions in Hive with different date format in the same partition column

I have 2 types of value in the partition column of string datatype: yyyyMMdd yyyy-MM-dd E.g. there are partition column values 20200301, 2020 ...

partitioning and re-partittioning parquet files using pyspark

I have a parquet partitioning issue that I am trying to solve. I have read a lot of material on partitioning in this site and on the web but still cou ...