简体   繁体   中英

how to find the index of the first non-whitespace character in a string in python?

Scenario:

>>> a='   Hello world'
index = 3

In this case the "H" index is '3'. But I need a more general method such that for any string variable 'a' takes I need to know the index of the first character?

Alternative scenario:

>>> a='\tHello world'
index = 1

If you mean the first non-whitespace character, I'd use something like this ...

>>> a='   Hello world'
>>> len(a) - len(a.lstrip())
3

Another one which is a little fun:

>>> sum(1 for _ in itertools.takewhile(str.isspace,a))
3

But I'm willing to bet that the first version is faster as it does essentially this exact loop, only in C -- Of course, it needs to construct a new string when it's done, but that's essentially free.


For completeness, if the string is empty or composed of entirely whitespace, both of these will return len(a) (which is invalid if you try to index with it...)

>>> a = "foobar"
>>> a[len(a)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

Using regex :

>>> import re
>>> a='   Hello world'
>>> re.search(r'\S',a).start()
3
>>> a='\tHello world'
>>> re.search(r'\S',a).start()
1
>>>

Function to handle the cases when the string is empty or contains only white spaces:

>>> def func(strs):
...     match = re.search(r'\S',strs)
...     if match:
...         return match.start()
...     else:
...         return 'No character found!'
...     
>>> func('\t\tfoo')
2
>>> func('   foo')
3
>>> func('     ')
'No character found!'
>>> func('')
'No character found!'

You can also try:

a = '   Hello world'
a.index(a.lstrip()[0])
=> 3

It'll work as long as the string contains at least one non-space character. We can be a bit more careful and check this before:

a = '    '
-1 if not a or a.isspace() else a.index(a.lstrip()[0])
=> -1

Another method, just for fun... Using a special function!

>>> def first_non_space_index(s):
    for idx, c in enumerate(s):
        if not c.isspace():
            return idx


>>> a = '   Hello world'        
>>> first_non_space_index(a)
3

Following mgilson's answer, you can use lstrip to strip any characters you'd like -

unwanted = ':!@#$%^&*()_+ \t\n'
a= '  _Hello world'
res = len(a) - len(a.lstrip(unwanted)) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM