I've created a Hive table with a partition like this: Then with PySpark, I'm having a dataframe and I've tried to write it to the Hive table like t ...
I've created a Hive table with a partition like this: Then with PySpark, I'm having a dataframe and I've tried to write it to the Hive table like t ...
I have been trying to run this piece of code to drop current day's partition from hive a table and for some reason it does not drop the partition from ...
I am working on a data engineering case where i have a table Table_Movie partitionned by ingest date. Now, from time to time, i receive some old data. ...
I have an external Hive table as follows :- where table is partitioned by 3 columns ( countryCode, sourceNbr , date). I know that if i query based ...
I need to drop specific rows from a Hive table, which is partitioned. These rows for deletion matches certain conditions, so entire partitions can not ...
I'm trying to understand below query by using that how data is going to be placed. The keyword PARTITIONED BY will distribute the data in below lik ...
I've installed the Hadoop file and I'm trying to run the MapReduce example in the terminal, but am getting the command not found message, can someone ...
my script failing due to a heap space issue to process too many partitions. To avoid the issue I am trying to insert all the partitions into a single ...
I'm using Hive as my meta store database and the Hive Standalone Metastore for dealing with the DDLs, via this thrift client that implements the serve ...
i have 2 tables: q6_cms_list_key1 (bucketed by cm and se) partitioned by tr_dt ... 99 000 000 000 rows q6_cm_first_visit (bucketed by cm and se) 25 0 ...
How map reduce is working in this query and what is the significance of "CLUSTER BY" in this query? ...
This might be a dumb question, But is there any difference between manually specifying the partition columns in a parquet file, as opposed to loading ...
I am using Kafka sink connector to write data from Kafka to s3. The output data is partitioned into hourly buckets - year=yyyy/month=MM/day=dd/hour=hh ...
I cannot understand partitioning concept in Hive completely. I understand what are partitions and how to create them. What I cannot get is why people ...
I have a large dataset in parquet format (~1TB in size) that is partitioned into 2 hierarchies: CLASS and DATE There are only 7 classes. But the Date ...
I have a very large amount of data in my S3 bucket partitioned by two columns MODULE and DATE such that the file structure of my parquets are: I ha ...
I am creating an external table that refers to ORC files in an HDFS location. That ORC files are stored in such a way that the external table is parti ...
Request:- How can I insert partition key pair into each parquet file while inserting data into Hive/Impala table. Hive Table DDL [ create external ta ...
I have 2 types of value in the partition column of string datatype: yyyyMMdd yyyy-MM-dd E.g. there are partition column values 20200301, 2020 ...
I have a parquet partitioning issue that I am trying to solve. I have read a lot of material on partitioning in this site and on the web but still cou ...