Have a PySpark broadcast value with content like this:
[('b000jz4hqo', {'rom': 2.4051362683438153, 'clickart': 56.65432098765432, '950': 254.94444444444443, 'image': 3.6948470209339774, 'premier': 9.27070707070707, '000': 6.218157181571815, 'dvd': 1.287598204264871, 'broderbund': 22.169082125603865, 'pack': 2.98180636777128}), ('b0006zf55o', {'laptops': 11.588383838383837, 'desktops': 12.74722222222222, 'backup': 2.8015873015873014, 'win': 0.501859142607174, 'ca': 9.10515873015873, 'v11': 50.98888888888888, '30u': 84.98148148148148, '30pk': 254.94444444444443, 'desktop': 2.23635477582846, '1': 0.3231235037318687, 'arcserve': 24.28042328042328, 'computer': 0.6965695203400122, 'lap': 127.47222222222221, 'oem': 46.35353535353535, 'international': 9.44238683127572, 'associates': 7.284126984126985})]
So it is a key->list broadcast variable.
Attempts to convert broadcast.value into a dictionary results in
TypeError: unhashable type: 'dict'
Using code like
from itertools import izip
amazonWeightsBroadcast = sc.broadcast(amazonWeightsRDD.collect())
i = iter(amazonWeightsBroadcast.value)
amazonWeightsDict = dict(izip(i, i))
Also tried (gives the same "unshapable" error):
amazonWeightsDict = dict(amazonWeightsBroadcast.value[i:i+2] for i in range(0, len(amazonWeightsBroadcast.value), 2))
So if it's not possible to convert a broadcast variable into a dictionary, what will be a better solution to lookup a value-list by a key?
Python 2.7.6 Spark 1.3.1
Took me a while.. problem was in how the broadcast variable was created. Had to use .collectAsMap() and not just .collect() Now it is working as expected.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.