简体   繁体   中英

How do I convert list of elements to 1 or 0 in RDD Python?

I want to all the values in my_dict that are in the list [1, 2, 3, 4, 5] to be 1's and all the values that are not in the list gets a 0. How do I do this?

I have a dictionary and a list:

number_list = sc.parallelize([1, 2, 3, 4, 5])
my_dict = sc.parallelize([(101, [1, 2, 5]), (102, [2, 4] ), (103, [2, 3, 5] ), (104,[1, 5])])

**I want the output to be the following: **

([(101, [1, 1, 0, 0, 1]), (102, [0, 1, 0, 1, 0]), (103, [0, 1, 1, 0, 1]), (104, [1, 0, 0, 0, 1])])

I want to all the values in my_dict that are in the list [1, 2, 3, 4, 5] to be 1's and all the values that are not in the list gets a 0. How do I do this?

...

...

I tried this code, but it is wrong and not working

transformed_dict = my_dict.map(lambda x: (x[0], 1 if x[1] in my_test else 0))
origin = [0, 0, 0, 0, 0]
my_list = [(101, [1, 2, 5]), (102, [2, 4] ), (103, [2, 3, 5] ), (104,[1, 5])]
res = []
for (ele, l) in my_list:
    for i in l:
        origin[i-1] = 1
    res.append((ele, origin))
    origin = [0, 0 , 0, 0, 0]
print(res)
res_rdd = spark.sparkContext.parallelize(res)
res_rdd

this should do the work

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM