I need to do the following operation in python:
I have a list of tuples
data = [("John", 14, 12132.213, "Y", 34), ("Andrew", 23, 2121.21, "N", 66)]
I have a list of fields:
fields = ["name", "age", "vol", "status", "limit"]
Each tuple of the data is for each of the fields in order.
I have a dict
desc = { "name" : "string", "age" : "int", "vol" : "double", "status" : "byte", "limit" : "int" }
I need to generate a message to be sent over in the following format :
[{"column": "name", "value": {"String": "John"}}, {"column": "age", "value": {"Int": 14}}, {"column": "vol", "value": {"Double": 12132.213}}, {"column": "status", "value": {"Byte": 89}}, {"column": "limit", "value": {"Int": 34}},
{"column": "name", "value": {"String": "Andrew"}}, {"column": "age", "value": {"Int": 23}}, {"column": "vol", "value": {"Double":2121.21}}, {"column": "status", "value": {"Byte": 78}}, {"column": "limit", "value": {"Int": 66}}]
I have two functions that generates this :
def get_value(data_type, res):
if data_type == 'string':
return {'String' : res.strip()}
elif data_type == 'byte' :
return {'Byte' : ord(res[0])}
elif data_type == 'int':
return {'Int' : int(res)}
elif data_type == 'double':
return {'Double' : float(res)}
def generate_message(data, fields, desc):
result = []
for row in data:
for field, res in zip(fields, row):
data_type = desc[field]
val = {'column' : field,
'value' : get_value(data_type, res)}
result.append(val)
return result
However, the data is really large with a huge number of tuples (~200,000). It takes a lot of time to generate the above message format for each of them. Is there an efficient way of doing this.
PS Need such a message as i am sending this on a queue and the consumer is a C++ client that needs the type information.
List comprehensions should be quicker. They are also readable and concise.
In [94]: def generate_message_faster(data, fields, desc):
...: return [
...: {'column': field, 'value': get_value(desc[field], res)}
...: for row in data for field, res in zip(fields, row)
...: ]
...:
In [95]: generate_message_fast(data, fields, desc)
Out[95]:
[{'column': 'name', 'value': {'String': 'John'}},
{'column': 'age', 'value': {'Int': 14}},
{'column': 'vol', 'value': {'Double': 12132.213}},
{'column': 'status', 'value': {'Byte': 89}},
{'column': 'limit', 'value': {'Int': 34}},
{'column': 'name', 'value': {'String': 'Andrew'}},
{'column': 'age', 'value': {'Int': 23}},
{'column': 'vol', 'value': {'Double': 2121.21}},
{'column': 'status', 'value': {'Byte': 78}},
{'column': 'limit', 'value': {'Int': 66}}]
In [96]: %timeit(generate_message(data, fields, desc))
100000 loops, best of 3: 7.84 µs per loop
In [97]: %timeit(generate_message_faster(data, fields, desc))
The slowest run took 4.24 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.9 µs per loop
Building on aydow's answer and speeding up a bit more:
dt_action = {
'string': (lambda res: {'String': res.strip()}),
'byte': (lambda res: ord(res[0])),
'int': (lambda res: int(res)),
'double': (lambda res: float(res)),
}
def generate_message_faster(data, fields, desc):
return [
{'column': field, 'value': dt_action[desc[field]](res)}
for row in data for field, res in zip(fields, row)
]
Timings:
6.44 µs per loop
dt_action
: 5.54 µs per loop
dt_action
and list comp: 4.92 µs per loop
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.