简体   繁体   中英

Mapreduce-java: calcutaing average for array list

I have and assignment for mapreduce and I am very new in mapreduce programming. I want to calculate for each year and specific city what is average, min and max values. so here is my sample input

Calgary,AB,2009-01-07,604680,12694,2.5207754,0.065721168,0.025668362,0.972051954,0.037000279,0.022319018,,,0.003641149,,,0.002936745,,,0.016723641 Calgary,AB,2009-12-30,604620,12694,2.051769654,0.060114973,0.034026918,1.503277516,0.054219005,0.023258217,,,0.00354166,,,0.003361414,,,0.122375131 Calgary,AB,2010-01-06,604680,12266,4.015745522,0.097792741,0.032738892,0.368454554,0.019228992,0.032882053,,,0.004778065,,,0.003190444,,,0.064203865 Calgary,AB,2010-01-13,604680,12551,3.006492921,0.09051656,0.041508534,0.215395047,0.012081755,0.023706119,,,0.004231772,,,0.003083003,,,0.155212503

I know how to find city and year I am using this code:

String line = value.toString();
    String[] tokens = line.split(",");
    String[] date = tokens[2].split("-");
    String year = date[0];
    String location = tokens[0];

Now I want to find these two numbers in each line (eg2.5207754,0.065721168 , not exactly the same but all the number after third and fourth comma) and find an average, a min and max.

and in the output should looks like this:

Calgary 2009 average: "" , min; "" , max: "" Calgary 2010 average: "" , min; "" , max: ""

I was trying to use this code to find the values in each line but because the data set is in not the same in the each line I got error (in the part there is no data or bigger that this length)

float number = 0;
    float number2 = 0 ;
    char a;
    char c;
    a = line.charAt(34);
    c = line.charAt(44);
    if (a == ',') 
    { 
        number = Float.parseFloat(line.substring(35, 44));
    }
    else 
    {
        number = Float.parseFloat(line.substring(35, 46));
    }

    if (c == ',')
    {
        number2 = Float.parseFloat(line.substring(45, 56));

    } else 
    {
        number = Float.parseFloat(line.substring(47, 58));
    }

    Text numbers = new Text(number + " " + number2 + " ");

Then I was trying to use this code and the same as above it doesn't work:

String number = tokens[4];
String number2 = tokens[5];

so can you help me to do this project?

Looking at your input, it seems your records are separated by space. You can first split using " " and then get the individual values and use them for calculation

        String[] arr = line.split(" ");
        for(String val : arr){
            String[] dataArr = val.split(",");
            String city  = dataArr[0];
            String date = dataArr[2];
            String v1 = dataArr[5];
            String v2 = dataArr[6];
            System.out.println("city: "+city +" date: "+ date +" v1: "+ v1+"v2: "+ v2);
        }

city: Calgary date: 2009-01-07 v1: 2.5207754v2: 0.065721168 city: Calgary date: 2009-12-30 v1: 2.051769654v2: 0.060114973 city: Calgary date: 2010-01-06 v1: 4.015745522v2: 0.097792741 city: Calgary date: 2010-01-13 v1: 3.006492921v2: 0.09051656 city: Calgary date: 2009-01-07 v1: 2.5207754v2: 0.065721168

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM