Tag[mapreduce] Recent Newest Questions

Is Spark good for automatically running statistical analysis script in many nodes for a speedup?

I have a Python script that runs statistical analysis and trained deep learning models on input data. The data size is fairly small (~5Mb) however the ...

In Flink is it possible to use state with a non keyed stream?

Lets assume that I have an input DataStream and want to implement some functionality that requires "memory" so I need ProcessFunction that gives me ac ...

hadoop get files from existing archived file in hdfs

I have a directory "SmallFiles" that contains 8 files, I archived them using "hadoop archive -archiveName myArch.har -p /Files/SmallFiles /Files" then ...

A question about spark distributied aggregation

I am reading up on spark from here At one point the blog says: consider an app that wants to count the occurrences of each word in a corpus and p ...

How to find out number of elements in MongoDB array?

My collection of products consists of _id, product/title, product/price and reviews. Reviews is an array with all reviews for that specific product. I ...

How to check if there is a key in collection that has more than one value?

My collection looks something like this: My goal is to find if there are any products that have more than one price. Values of the key "product/pri ...

How to debug and test MapReduce on local Window machine?

I have found that debugging and testing a MapReduce project challenging. For debugging and testing, I usually get the script above and put it insid ...

in aws emr job flow, does each step receive the output from the previous step?

I am making a map reduce program in Java that has 4 steps. each step is operating on the output of the previous step. I ran those steps locally and m ...

OptionConverter.convertLevel Error in Hadoop Mapreduce job

I am getting a weird error while executing a mapreduce job in my Hadoop cluster. This error is intermittent. Sometimes, it fails the mapper and someti ...

How does spark calculate the number of reducers in a hash shuffle?

I am trying to understand hash shuffle in Spark. I am reading this article Hash Shuffle: Each mapper task creates separate file for each separate ...

NoSuchMethodError: org/apache/hadoop/mapreduce/util/MRJobConfUtil.setTaskLogProgressDeltaThresholds

I am getting the following error while executing a mapreduce job in my hadoop cluster (distributed cluster). I found the error below in the applicati ...

Necessary to make periods from the lines

We have we need to get I tried to do it through map, but it turned out to be some kind of crap ...

Can I return a non-partitioned table with function mr?

I wrote the code below to return a table t using function mr. But joining table t with an in-memory table raises an error requiring both tables to ...

Hadoop MapReduce job failing in launch_container.sh

MapReduce job is failing with following error even though JAVA_HOME is set. I am trying to setup hadoop (3.3.4) on my Mac M1. I have set JAVA_HOME ...

What is an effective way to return sum distributed by days using Apache Spark or another similiar solution?

Lets imagine we have a number of records with attributes: id, start_day, end_date, sum. These records have different periods defined by start and end ...

How to efficiently WordCount for each file?

I have tens of thousands files in dir demotxt like： demotxt/ aa.txt this is aaa1 this is aaa2 this is aaa3 ...

Pyspark MapReduce - how to get number occurrences in a list of tuple

I have a list like: and I applied the following map functions to map each row with the # of occurrences: map(lambda x: ((x.split(',')[0], x.split( ...

Reduce an array by ID and by item in JavaScript

I have an arrangement that I need to take an average of a sum based on the user who attended an activity I have an array where there is a list of act ...

ERROR org.apache.hadoop.conf.Configuration: error parsing conf mapred-site.xml

enter image description here there is an exception ,I can't start hadoop , ...

Calculate average temperature in reducer

I am trying to write a code that would calculate average temperature (reducer.py) based on ncdc weather. ...