简体   繁体   中英

Python - How extract specific numbers from strings?

I have a series of strings in the follow pattern: string = 'ABCD 1NAME 123456' . I need to extract the last digits in order to make an ID. I tried to use the isdigit method, but the problem is that it also returns the digit before the Name.

Caveats:

  1. Sometimes the name doesn't come with the digit.
  2. len from last digits ranges from 5 to 9.

Could anyone suggest to me an alternative? I think that I need to test if the the previous position or next position are digits in order to extract the id, but I cant figure out how to implement this test.

You could use a regex:

import re


pattern = re.compile('\d{5,9}$')

for match in pattern.findall('ABCD 1NAME 123456'):
    print(match)

Output

123456

The above regex means:

  • \\d{5,9} match a group of 5 - 9 digits
  • $ means that the group of digits must be at the end, if the group can be anywhere in the string just remove this symbol.

I agree and believe @DanielMesejo response is the best solution to my knowledge, but just to offer an alternative idea.

Solution

You could create a list and store all the [-1] indices from each string in it.

string = 'ABCD 1NAME 123456'
string = string.split()
num_ids = []
num_ids.append(string[-1])

Could use this with a loop to extract for all.

With loop:

strings = ['ABCD 1NAME 123456','BHDU 1NAME 45678','OIUS 1NAME 109028']
num_ids = []
for string in strings:
    string = string.split()
    num_ids.append(string[-1])
print(num_ids)

Optional with comprehension

as mention by @Alexander

strings = ['ABCD 1NAME 123456','BHDU 1NAME 45678','OIUS 1NAME 109028']
num_ids = [string.split()[-1] for string in strings]
print(num_ids)

Output

 (xenial)vash@localhost:~/python/AtBS$ python3.7 pattern.py ['123456', '45678', '109028'] 

You can use split to split the string by spaces and then index it with -1 to extract the id portion.

string = 'ABCD 1NAME 123456'
val=string.split()
print(val[-1])

How about rsplit() ?

s = 'ABCD 1NAME 123456'
print(s.rsplit(' ', 1)[1])
# 123456

I'm assuming that you are using a for loop to iterate the string in question

string = 'ABCD 1NAME 123456'

for i in string:
    if i.isdigit():
        print(i)

You probably should use split():

string = 'ABCD 1NAME 123456'

for i in string.split():
    if i.isdigit():
        print(i)

You could probably do:

a = string.split()
if a[-1].isdigit():
    print(a[-1])

This splits the string and outputs rightmost one if it is indeed a series of digits.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM