I'm working in spark using pyspark. I have an rdd that is of the format [(key, (num, (min, max, count))),....]
when I use the lambda below
t = fullBids.filter(lambda (value, stats): (stats[2] > 10 and stats[0] < value and value < stats[1]))
it errors out with
tuple index out of range
but when I use it in a mapValues call it runs successfully, returning either True or False correctly.
ti = fullBids.mapValues(lambda (value, stats): (stats[2] > 10 and stats[0] < value and value < stats[1]))
I would expect the filter to work, but it isn't. Can someone explain what I'm missing here?
If you decompose your RDD format
(key, (num, (min, max, count)))
key = value
(num, (min, max, count)) = stats
num = stats[0]
(min, max, count) = stats[1]
min = stats[1][0]
max = stats[1][1]
count = stats[1][2]
So your stats[2]
is out of range
When you call filter
, value
is the key of the key-value pair RDD while stats
is the value of the RDD ( (num, (min, max, count))
), that's why you have a tuple index out of range
.
When you call mapValues
, value
is num
while stats
is (min, max, count)
. Infact the mapValues
transformation passes each value in the key-value pair RDD.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.