I am trying to pass a list of RDDs to groupWith instead of manually specifying them by index.
Here is the sample data
w = sc.parallelize([("1", 5), ("3", 6)])
x = sc.parallelize([("1", 1), ("3", 4)])
y = sc.parallelize([("2", 2), ("4", 3)])
z = sc.parallelize([("2", 42), ("4", 43), ("5", 12)])
Now I have created an array like this.
m = [w,x,y,z]
The manual hardcoded way is
[(x, tuple(map(list, y))) for x, y in sorted(list(m[0].groupWith(m[1],m[2],m[3]).collect()))]
which prints below result
[('1', ([5], [1], [], [])),
('2', ([], [], [2], [42])),
('3', ([6], [4], [], ])),
('4', ([], [], [3], [43])),
('5', ([], [], [], [12]))]
But I would like to do something like pass m[1:]
instead of passing manually.
[(x, tuple(map(list, y))) for x, y in sorted(list(m[0].groupWith(m[1:]).collect()))]
I tried to remove brackets but it has to be converted to string and i get below error
AttributeError: 'list' object has no attribute 'mapValues'
AttributeError: 'str' object has no attribute 'mapValues'
由于groupWith
接受varargs,所以您要做的就是解压缩参数:
w.groupWith(*m[1:])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.