We are running a little POC with Aerospike to understand if we can run LUA scripts doing some stuff.
In this case, we used the flights example: https://github.com/aerospike/flights-analytics
I created a new index on the flight time in order to search by it.
The script runs over all the records and finds the last arrival time of a flight. We inserted only flights to Bufalo for simplicity sake.
local function aggregatCityToMax(result, record)
city = string.upper(record['DEST_CITY_NAME'])
flightTime = record['ARR_TIME']
if result[city] == nil then
info("CITY: |%s| | DATE: %d | MAX: null" , city, flightTime)
result[city] = flightTime
else
info("CITY: |%s| | DATE: %d | MAX: %d" , city, flightTime,
result[city])
if result[city] < flightTime then
info("new MAX %s", flightTime)
result[city] = flightTime
end
end
return result
end
local function reduce_values(a, b)
return map.merge(a, b, mergeFunction)
end
local function mergeFunction(a, b)
info("merging: %s VS %s ", a, b)
if a < b then
return b
end
return a
end
function mapMax(stream)
return stream : aggregate(map(), aggregatCityToMax) : reduce(reduce_values)
end
The log shows odd result: 1. I don't get the maximum. 2. It looks like every 10 records, the maximum value is reset to null.
LOG:
CITY: |BUFFALO| | DATE: 1253 | MAX: null CITY: |BUFFALO| | DATE: 1221 | MAX: 1253 CITY: |BUFFALO| | DATE: 1600 | MAX: 1253 CITY: |BUFFALO| | DATE: 1203 | MAX: 1600 CITY: |BUFFALO| | DATE: 1424 | MAX: 1600 CITY: |BUFFALO| | DATE: 2141 | MAX: 1600 CITY: |BUFFALO| | DATE: 1821 | MAX: 2141 CITY: |BUFFALO| | DATE: 1221 | MAX: 2141 CITY: |BUFFALO| | DATE: 1424 | MAX: 2141 CITY: |BUFFALO| | DATE: 1550 | MAX: 2141 CITY: |BUFFALO| | DATE: 1703 | MAX: null
CITY: |BUFFALO| | DATE: 2312 | MAX: 1703 CITY: |BUFFALO| | DATE: 2251 | MAX: 2312 CITY: |BUFFALO| | DATE: 19 | MAX: 2312 CITY: |BUFFALO| | DATE: 1030 | MAX: 2312 CITY: |BUFFALO| | DATE: 1257 | MAX: 2312 CITY: |BUFFALO| | DATE: 803 | MAX: 2312 CITY: |BUFFALO| | DATE: 19 | MAX: 2312 CITY: |BUFFALO| | DATE: 1502 | MAX: 2312 CITY: |BUFFALO| | DATE: 2319 | MAX: 2312 CITY: |BUFFALO| | DATE: 1735 | MAX: null CITY: |BUFFALO| | DATE: 1221 | MAX: 1735 CITY: |BUFFALO| | DATE: 1258 | MAX: 1735 CITY: |BUFFALO| | DATE: 2125 | MAX: 1735 CITY: |BUFFALO| | DATE: 2251 | MAX: 2125 CITY: |BUFFALO| | DATE: 1104 | MAX: 2251 CITY: |BUFFALO| | DATE: 2053 | MAX: 2251 CITY: |BUFFALO| | DATE: 1340 | MAX: 2251 CITY: |BUFFALO| | DATE: 2312 | MAX: 2251 CITY: |BUFFALO| | DATE: 2226 | MAX: 2312 CITY: |BUFFALO| | DATE: 2053 | MAX: null CITY: |BUFFALO| | DATE: 1637 | MAX: 2053 CITY: |BUFFALO| | DATE: 1030 | MAX: 2053 CITY: |BUFFALO| | DATE: 1618 | MAX: 2053 CITY: |BUFFALO| | DATE: 1510 | MAX: 2053 CITY: |BUFFALO| | DATE: 1510 | MAX: 2053 CITY: |BUFFALO| | DATE: 2346 | MAX: 2053 CITY: |BUFFALO| | DATE: 2343 | MAX: 2346 CITY: |BUFFALO| | DATE: 1600 | MAX: 2346 CITY: |BUFFALO| | DATE: 1550 | MAX: 2346 CITY: |BUFFALO| | DATE: 1949 | MAX: null CITY: |BUFFALO| | DATE: 1104 | MAX: 1949 CITY: |BUFFALO| | DATE: 2045 | MAX: 1949 CITY: |BUFFALO| | DATE: 2213 | MAX: 2045
Did I do something wrong? Did I miss anything?
Thanks,
Idob
Aerospike's aggregation is more of streaming in nature. ie it keeps pushing partial results out so that there is no stalling. The reduce which happens at the client will do the final job of merging all the partial results. This is a different model compared to hadoop map-reduce where the reduce/final will wait for all the local reduces to finish completely before starting itself. There is a merit in the streaming model of Aerospike.
You have a print statement in the aggregate function. Once the partial result is pushed out, the seed map will start as empty when it is working on the next batch. There is nothing wrong in your logic. The end result should be fine. Are you seeing any issue with the end-result ?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.