When I deleted content of NoSqlTarget (key-value storage) in MLRun/v3io via standard command line utility such as: It took approx. 1 hour for 1 mil ...
When I deleted content of NoSqlTarget (key-value storage) in MLRun/v3io via standard command line utility such as: It took approx. 1 hour for 1 mil ...
I have followed up the tutriolpoint guide and completed every step on setting up a new node into an existing hadoop cluster. But I am facing difficult ...
In our place, we use AWS services for all our data infrastructure and services needs. Our hive tables are external tables and the actual data files ar ...
My periodically running process writes data to a table over parquet files with the configuration "spark.sql.sources.partitionOverwriteMode" = "dynamic ...
I want to train a model on a compute node but using the data (parquet format) from a storage cluster (HDFS). And I cannot copy-paste the whole dataset ...
I have a directory "SmallFiles" that contains 8 files, I archived them using "hadoop archive -archiveName myArch.har -p /Files/SmallFiles /Files" then ...
I can navigate from node to node with an ssh connection without any problems, for example from parasilo-1 to parasilo-10. cat ~/.ssh/id_rsa.pub >&g ...
For a project I need to append frequently but on a non-periodic way about one thousand or more data files (tabular data) on one existing CSV or parque ...
I am building a Flink pipeline and based on live input data need to read records from archive files in a RichFlatMapFunction (e.g. each day I want to ...
I provisioned an AWS EMR HBASE cluster with 1 master and 1 core node (m5.xLarge). My cluster doesn't have any 'task' node as I plan to use this cluste ...
I am Running spark in cluster mode which is giving error as I ran below command and verified that jks files are present at the location. I have ...
I am new to Hadoop and am doing a project for university. I have a folder called 'docs' that I have several text files in. When I look at it locally, ...
When alter the partition column name of the partition table(named partitioned_table), the corresponding directory in the HDFS does not change. However ...
for example, I want to recursively output file path which size is zero in recursive timestamp directory like follows hdfs://<DIRECTORY>/<TIME ...
For example, I want to output all zero files path in a specific directory like hdfs://<DIRECTORY>. I want to use hdfs -ls or hdfs -du and a ...
My problem is as below: A pyspark script that runs perfectly on a local machine and an EC2 is ported on to an EMR for scaling up. There's a config fi ...
namenode default write data How do I select a datanode datanode (a,b,c,d,e,f) hdfs client (z) -> wirte data -> put->hello.txt nn->(? How ...
I have a multi-node Hadoop cluster as 1 master and 2 Slaves. i want to try import from MySQL and load in HDFS. i want to have Hive to write hive ...
I was trying to read the file present in hadoop cluster through the following code. The default port used is 9000. (since at 50700, it is not getting ...
I am trying to create a hadoop cluster. `hdfs' is starting normally and I am lso able to access it through the web interface. But, data nodes are not ...