简体   繁体   中英

What is the fastest way to get multiple copies of a tree in python?

In my python program I need multiple copies of a tree. Initially, I use deepcopy from the copy module, which turns out to be very slow. Then I write my own code to copy a tree, the code traverses the tree being copied and create a new node at each node being visited. Then I call this subroutines multiple times to get multiple copies. This solution is much faster (~40 times faster) than deepcopy.

Solution 2: Then I think, traversing a tree needs time T, make n copies, the time required is nT; but if I create n new nodes for each node being copied, I only need to traverse the tree being copied once, although at each node, you copy multiple nodes. Will this be faster? The result turns out to be: not much.

Still the copy operation is the bottleneck of my program. Is there any faster way to do that? Thanks! Stats -- using custom copy_tree function;

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   10.406   10.406 <string>:1(<module>)
        1    0.002    0.002   10.406   10.406 C:\Python27\sdk.py:1431(algorithm1)
       26    0.005    0.000    4.602    0.177 C:\Python27\sdk.py:1310(engage)
     1342    0.005    0.000    4.208    0.003 C:\Python27\lib\idlelib\rpc.py:594(__call__)
     1342    0.007    0.000    4.203    0.003 C:\Python27\lib\idlelib\rpc.py:208(remotecall)
     1342    0.017    0.000    3.992    0.003 C:\Python27\lib\idlelib\rpc.py:238(asyncreturn)
     1342    0.005    0.000    3.972    0.003 C:\Python27\lib\idlelib\rpc.py:279(getresponse)
     1342    0.033    0.000    3.961    0.003 C:\Python27\lib\idlelib\rpc.py:295(_getresponse)
   411/26    0.202    0.000    3.930    0.151 C:\Python27\sdk.py:1227(NodeEngage)
     1338    0.014    0.000    3.909    0.003 C:\Python27\lib\threading.py:235(wait)
     5356    3.877    0.001    3.877    0.001 {method 'acquire' of 'thread.lock' objects}
       27    0.001    0.000    3.798    0.141 C:\Python27\sdk.py:888(pick_best_group)
      378    0.003    0.000    3.797    0.010 C:\Python27\sdk.py:862(group_info)
46947/378    0.155    0.000    3.786    0.010 C:\Python27\sdk.py:833(core_possibilities)
    27490    0.114    0.000    3.547    0.000 C:\Python27\sdk.py:779(find_cores)
    46569    1.046    0.000    3.424    0.000 C:\Python27\sdk.py:798(find_a_true_core)
   280274    0.873    0.000    1.464    0.000 C:\Python27\sdk.py:213(next)
       27    0.002    0.000    1.393    0.052 C:\Python27\sdk.py:1008(s)
    28196    0.016    0.000    1.070    0.000 C:\Python27\sdk.py:1000(copy_tree)

.............................Compare with deepcopy approach

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  191.193  191.193 <string>:1(<module>)
        1    0.002    0.002  191.193  191.193 C:\Python27\sdk.py:1431(algorithm1)
       26    0.006    0.000  185.611    7.139 C:\Python27\sdk.py:1310(engage)
   411/26    1.200    0.003  185.013    7.116 C:\Python27\sdk.py:1227(NodeEngage)
30033397/28196   56.608    0.000  177.885    0.006 C:\Python27\lib\copy.py:145(deepcopy)
3340177/28196   15.354    0.000  177.741    0.006 C:\Python27\lib\copy.py:283(_deepcopy_inst)
6680354/28196   23.276    0.000  177.261    0.006 C:\Python27\lib\copy.py:253(_deepcopy_dict)
3340177/150307   22.345    0.000  171.525    0.001 C:\Python27\lib\copy.py:234(_deepcopy_tuple)
 13360708   23.793    0.000   23.793    0.000 {hasattr}
 13614747   12.483    0.000   15.349    0.000 C:\Python27\lib\copy.py:267(_keep_alive)
     1342    0.005    0.000    7.281    0.005 C:\Python27\lib\idlelib\rpc.py:594(__call__)
     1342    0.008    0.000    7.276    0.005 C:\Python27\lib\idlelib\rpc.py:208(remotecall)
     1342    0.019    0.000    7.039    0.005 C:\Python27\lib\idlelib\rpc.py:238(asyncreturn)
     1342    0.005    0.000    7.018    0.005 C:\Python27\lib\idlelib\rpc.py:279(getresponse)
     1342    0.035    0.000    7.006    0.005 C:\Python27\lib\idlelib\rpc.py:295(_getresponse)
 43649486    6.971    0.000    6.971    0.000 {method 'get' of 'dict' objects}
     1341    0.015    0.000    6.950    0.005 C:\Python27\lib\threading.py:235(wait)
     5365    6.917    0.001    6.917    0.001 {method 'acquire' of 'thread.lock' objects}
  6680354    5.325    0.000    5.325    0.000 {method 'iteritems' of 'dict' objects}
 57037048    4.854    0.000    4.854    0.000 {id}

@ThomasH: this is the copy function, which is very simple and custom. See my comment to Ross for the content of tree nodes

def r_copy_tree(node_to_copy, dad_info):
    new_node = node(dad_info)

    for (a,son_to_copy) in node_to_copy.sons.items():
        new_node.sons[a]=r_copy_tree(son_to_copy,(new_node,a))

    return new_node

def copy_tree(root):
    return r_copy_tree(root,(None,None))

When trying to improve performance you should almost always start with profiling data and then optimize based on what you see there. Start by using cProfile.run to run your top-level tree copy code, then use pstats.Stats class to inspect the profiling data and see where you should really focus your optimization. I recommend starting by sorting your stats by cumulative time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM