Python: Find first non-matching character

Question

Under Python, when you want to obtain the index of the first occurrence of a substring or character within a list, you use something like this:

s.find("f")

However, I'd like to find the index of the first character within the string that does not match. Currently, I'm using the following:

iNum = 0
for i, c in enumerate(line):
  if(c != mark):
    iNum = i
    break

Is there a more efficient way to do this, such as a built-in function I don't know about?

Answer 1

You can use regular expressions, for example:

>>> import re
>>> re.search(r'[^f]', 'ffffooooooooo').start()
4

[^f] will match any character except for f , and the start() method of a Match object (returned by re.search() ) will give the index that the match occurred.

To make sure you can also handle empty strings or strings that only contain f you would want to check to make sure the result of re.search() is not None , which will happen if the regex cannot be matched. For example:

first_index = -1
match = re.search(r'[^f]', line)
if match:
    first_index = match.start()

If you prefer not to use regex, you won't do any better than your current method. You could use something like next(i for i, c in enumerate(line) if c != mark) , but you would need to wrap this with a try and except StopIteration block to handle empty lines or lines that consist of only mark characters.

Answer 2

As python and as simple as possible. replace print(counter) with print counter for python 2.x

s = "ffffff5tgbh44frff"
counter = 0
for c in s:
    counter = counter + 1
    if c != "f":
        break

print (counter)

Answer 3

Now i am curious how these two fare.

>>> # map with a partial function
>>> import functools
>>> import operator
>>> f = functools.partial(operator.eq, 'f')
>>> map(f, 'fffffooooo').index(False)
5
>>> # list comprehension
>>> [c == 'f' for c in 'ffffoooo'].index(False)
4
>>>

Answer 4

I had this same problem and looked into timing the solutions here (except the map/list-comp ones from @wwii which are significantly slower than any other options). I also added in a Cython version of the original version.

I made and tested these all in Python v2.7. I was using byte-strings (instead of Unicode strings). I am unsure if the regular-expression methods need something different to work with byte-strings in Python v3. The 'mark' is hard-coded to being the null byte. This could be easily changed.

All methods return -1 if the entire byte-string is the null-byte. All of these were tested in IPython (lines starting with % are special).

import re

def f1(s): # original version
    for i, c in enumerate(s):
        if c != b'\0': return i
    return -1

def f2(s): # @ChristopherMahan's version
    i = 0
    for c in s:
        if c != b'\0': return i
        i += 1
    return -1

def f3(s): # @AndrewClark's alternate version
    # modified to use optional default argument instead of catching StopIteration
    return next((i for i, c in enumerate(s) if c != b'\0'), -1)

def f4(s): # @AndrewClark's version
    match = re.search(br'[^\0]', s)
    return match.start() if match else -1

_re = re.compile(br'[^\0]')
def f5(s): # @AndrewClark's version w/ precompiled regular expression
    match = _re.search(s)
    return match.start() if match else -1

%load_ext cythonmagic
%%cython
# original version optimized in Cython
import cython
@cython.boundscheck(False)
@cython.wraparound(False)
def f6(bytes s):
    cdef Py_ssize_t i
    for i in xrange(len(s)):
        if s[i] != b'\0': return i
    return -1

The timing results:

s = (b'\x00' * 32) + (b'\x01' * 32) # test string

In [11]: %timeit f1(s) # original version
100000 loops, best of 3: 2.48 µs per loop

In [12]: %timeit f2(s) # @ChristopherMahan's version
100000 loops, best of 3: 2.35 µs per loop

In [13]: %timeit f3(s) # @AndrewClark's alternate version
100000 loops, best of 3: 3.07 µs per loop

In [14]: %timeit f4(s) # @AndrewClark's version
1000000 loops, best of 3: 1.91 µs per loop

In [15]: %timeit f5(s) # @AndrewClark's version w/ precompiled regular expression
1000000 loops, best of 3: 845 ns per loop

In [16]: %timeit f6(s) # original version optimized in Cython
1000000 loops, best of 3: 305 ns per loop

Overall, @ChristopherMahan's version is slightly faster than the original (apparently enumerate is slower than using your own counter). Using the next (@AndrewClark's alternate version) method is slower than the original even though it is essentially the same thing in a one-line form.

Using regular-expresions (@AndrewClark's version) are significantly faster than a loop, especially if you pre-compile the regex!

Then, if you can use Cython, it is by far the fastest. The OP's concern that using a regex is slow is validated, but a loop in Python is even slower. The loop in Cython is quite fast.

Answer 5

Here is a oneliner:

> print([a == b for (a_i, a) in enumerate("compare_me") for
(b_i, b) in enumerate("compar me") if a_i == b_i].index(False))
> 6
> "compare_me"[6]
> 'e'

Python: Find first non-matching character

Question

5 answers

solution1
8 ACCPTED 2013-10-04 21:46:47

solution2
1 2013-10-04 21:55:52

solution3
0 2013-10-04 23:01:11

solution4
0 2015-06-10 05:07:23

solution5
0 2018-03-09 16:22:50

Python: Find first non-matching character

Question

5 answers

solution1 8 ACCPTED 2013-10-04 21:46:47

solution2 1 2013-10-04 21:55:52

solution3 0 2013-10-04 23:01:11

solution4 0 2015-06-10 05:07:23

solution5 0 2018-03-09 16:22:50

solution1
8 ACCPTED 2013-10-04 21:46:47

solution2
1 2013-10-04 21:55:52

solution3
0 2013-10-04 23:01:11

solution4
0 2015-06-10 05:07:23

solution5
0 2018-03-09 16:22:50