简体   繁体   中英

What is the most efficient way to concatenate two strings and remove everything before the first ',' in Python?

In Python, I have a string which is a comma separated list of values. eg '5,2,7,8,3,4'

I need to add a new value onto the end and remove the first value,

eg '5,22,7,814,3,4' -> '22,7,814,3,4,1'

Currently, I do this as follows:

mystr = '5,22,7,814,3,4'
latestValue='1'
mylist = mystr.split(',')
mystr = ''
for i in range(len(mylist)-1):
   if i==0:
      mystr += mylist[i+1]
   if i>0:
      mystr += ','+mylist[i+1]

mystr += ','+latestValue

This runs millions of times in my code and I've identified it as a bottleneck, so I'm keen to optimize it to make it run faster.

What is the most efficient to do this (in terms of runtime)?

Use this:

if mystr == '':
    mystr = latestValue
else:
    mystr = mystr[mystr.find(",")+1:] + "," + latestValue

This should be much faster than any solution which splits the list. It only finds the first occurrence of , and "removes" the beginning of the string. Also, if the list is empty, then mystr will be just latestValue (insignificant overhead added by this) -- thanks Paulo Scardine for pointing that out.

mystr = mystr.partition(",")[2]+","+latestValue

improvement suggested by Paulo to work if mystr has < 2 elements.
In the case of 0 elements, it does extend mystr to hold one element.

_,_,mystr = (mystr+','+latestValue).partition(',')

$ python -m timeit -s "mystr = '5,22,7,814,3,4';latestValue='1'" "mystr[mystr.find(',')+1:]+','+latestValue"
1000000 loops, best of 3: 0.847 usec per loop
$ python -m timeit -s "mystr = '5,22,7,814,3,4';latestValue='1'" "mystr = mystr.partition(',')[2]+','+latestValue"
1000000 loops, best of 3: 0.703 usec per loop
_, sep, rest = mystr.partition(",")
mystr = rest + sep + latestValue

It also works without any changes if mystr is empty or a single item (without comma after it) due to str.partition returns empty sep if there is no sep in mystr .

You could use mystr.rstrip(",") before calling partition() if there might be a trailing comma in the mystr .

best version : gnibbler's answer


Since you need speed (millions of times is a lot), I profiled. This one is about twice as fast as splitting the list:

i = 0
while 1:
    if mystr[i] == ',': break
    i += 1
mystr = mystr[i+1:] + ', ' + latest_value

It assumes that there is one space after each comma. If that's a problem, you can use:

i = 0
while 1:
    if mystr[i] == ',': break
    i += 1
mystr = mystr[i+1:].strip() + ', ' + latest_value

which is only slightly slower than the original but much more robust. It's really up to you to decide how much speed you need to squeeze out of it. They both assume that there will be a comma in the string and will raise an IndexError if one fails to appear. The safe version is:

i = 0
while 1:
    try:
        if mystr[i] == ',': break
    except IndexError:
        i = -1
        break
    i += 1
mystr = mystr[i+1:].strip() + ', ' + latest_value

Again, this is still significantly faster than than splitting the string but does add robustness at the cost of speed.


Here's the timeit results. You can see that the fourth method is noticeably faster than the third (most robust) method, but slightly slower than the first two methods. It's the fastest of the two robust solutions though so unless you are sure that your strings will have commas in them (ie it would already be considered an error if they didn't) then I would use it anyway.

$ python -mtimeit -s'from strings import tests, method1' 'method1(tests[0], "10")' 1000000 loops, best of 3: 1.34 usec per loop

$ python -mtimeit -s'from strings import tests, method2' 'method2(tests[0], "10")' 1000000 loops, best of 3: 1.34 usec per loop

$ python -mtimeit -s'from strings import tests, method3' 'method3(tests[0], "10")' 1000000 loops, best of 3: 1.5 usec per loop

$ python -mtimeit -s'from strings import tests, method4' 'method4(tests[0], "10")' 1000000 loops, best of 3: 1.38 usec per loop


$ python -mtimeit -s'from strings import tests, method5' 'method5(tests[0], "10")' 100000 loops, best of 3: 1.18 usec per loop

This is gnibbler's answer

mylist = mystr.split(',')
mylist.append(latestValue);
mystr = ",".join(mylist[1:])

String concatenation in python isn't very efficient (since strings are immutable). It's easier to work with them as lists (and more efficient). Basically in your code you are copying your string over and over again each time you concatenate to it.

Edited: Not the best, but I love one-liners. :-)

mystr = ','.join(mystr.split(',')[1:]+[latestValue])

Before testing I would bet it would perform better.

> python -m timeit "mystr = '5,22,7,814,3,4'" "latestValue='1'" \
"mylist = mystr.split(',')" "mylist.append(latestValue);" \
"mystr = ','.join(mylist[1:])"
1000000 loops, best of 3: 1.37 usec per loop
> python -m timeit "mystr = '5,22,7,814,3,4'" "latestValue='1'"\
"','.join(mystr.split(',')[1:]+[latestValue])"
1000000 loops, best of 3: 1.5 usec per loop
> python -m timeit "mystr = '5,22,7,814,3,4'" "latestValue='1'"\
'mystr=mystr[mystr.find(",")+1:]+","+latestValue'
1000000 loops, best of 3: 0.625 usec per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM