简体   繁体   中英

Improving efficiency of repetitive code in python

I need to do the following operation in python:

I have a list of tuples

data = [("John", 14, 12132.213, "Y", 34), ("Andrew", 23, 2121.21, "N", 66)]

I have a list of fields:

fields = ["name", "age", "vol", "status", "limit"]

Each tuple of the data is for each of the fields in order.

I have a dict

desc = { "name" : "string", "age" : "int", "vol" : "double", "status" : "byte", "limit" : "int" }

I need to generate a message to be sent over in the following format :

[{"column": "name", "value": {"String": "John"}}, {"column": "age", "value": {"Int": 14}}, {"column": "vol", "value": {"Double": 12132.213}}, {"column": "status", "value": {"Byte": 89}}, {"column": "limit", "value": {"Int": 34}},
{"column": "name", "value": {"String": "Andrew"}}, {"column": "age", "value": {"Int": 23}}, {"column": "vol", "value": {"Double":2121.21}}, {"column": "status", "value": {"Byte": 78}}, {"column": "limit", "value": {"Int": 66}}]

I have two functions that generates this :

def get_value(data_type, res):
    if data_type == 'string':
       return {'String' : res.strip()}
    elif data_type == 'byte' :
       return {'Byte' : ord(res[0])} 
    elif data_type == 'int':
       return {'Int' : int(res)}
    elif data_type == 'double':
       return {'Double' : float(res)}

def generate_message(data, fields, desc):
    result = []
    for row in data:
       for field, res in zip(fields, row):
           data_type = desc[field]
           val = {'column' : field, 
                  'value'  : get_value(data_type, res)}
           result.append(val)
    return result

However, the data is really large with a huge number of tuples (~200,000). It takes a lot of time to generate the above message format for each of them. Is there an efficient way of doing this.

PS Need such a message as i am sending this on a queue and the consumer is a C++ client that needs the type information.

List comprehensions should be quicker. They are also readable and concise.

In [94]: def generate_message_faster(data, fields, desc):
    ...:     return [
    ...:        {'column': field, 'value': get_value(desc[field], res)} 
    ...:        for row in data for field, res in zip(fields, row)
    ...:     ]
    ...:

In [95]: generate_message_fast(data, fields, desc)
Out[95]:
[{'column': 'name', 'value': {'String': 'John'}},
 {'column': 'age', 'value': {'Int': 14}},
 {'column': 'vol', 'value': {'Double': 12132.213}},
 {'column': 'status', 'value': {'Byte': 89}},
 {'column': 'limit', 'value': {'Int': 34}},
 {'column': 'name', 'value': {'String': 'Andrew'}},
 {'column': 'age', 'value': {'Int': 23}},
 {'column': 'vol', 'value': {'Double': 2121.21}},
 {'column': 'status', 'value': {'Byte': 78}},
 {'column': 'limit', 'value': {'Int': 66}}]

In [96]: %timeit(generate_message(data, fields, desc))
100000 loops, best of 3: 7.84 µs per loop

In [97]: %timeit(generate_message_faster(data, fields, desc))
The slowest run took 4.24 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.9 µs per loop

Building on aydow's answer and speeding up a bit more:

dt_action = {
  'string': (lambda res: {'String': res.strip()}),
  'byte': (lambda res: ord(res[0])),
  'int': (lambda res: int(res)),
  'double': (lambda res: float(res)),
}

def generate_message_faster(data, fields, desc):
  return [
    {'column': field, 'value': dt_action[desc[field]](res)}
    for row in data for field, res in zip(fields, row)
  ]

Timings:

  • original 6.44 µs per loop
  • with dt_action : 5.54 µs per loop
  • with dt_action and list comp: 4.92 µs per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM