简体   繁体   中英

Split a string on multiple characters - Python

I have a string :

V51M229D180728T132714_ACCEPT_EC_NC

This needs to be split into

String 1 : V51 (Can be variable but always ends before M)
String 2 : M22 (Can be variable but always ends before D)
String 3 : D180728 (Date in YYMMDD format)
String 4 : 132714 (Timestamp in HHMMSS format)
String 5 : ACCEPT (Occurs between "_")
String 6 : EC (Occurs between "_")
String 7 : NC (Occurs between "_")

I am new to python and hoping to get some help with this.

Thanks.

Use re module:

import re
a = 'V51M229D180728T132714_ACCEPT_EC_NCM'
re.search('(\w+)(M\w+)(D\d+)(T\d+)_(\w+)_(\w+)_(\w+)', a).groups()

You will get:

('V51', 'M229', 'D180728', 'T132714', 'ACCEPT', 'EC', 'NCM')

You probably want to use a regex with matching groups. See the re module.

For example,

>>> mystr = 'V51M229D180728T132714_ACCEPT_EC_NC'
>>> re.match('(.*?)(M.*?)(D.*?)T(.*?)_(.*?)_(.*?)_(.*?)', mystr).groups()
('V51', 'M229', 'D180728', '132714', 'ACCEPT', 'EC', 'NC')

In the pattern, the () indicate a group, and .*? will match the minimal number of characters to make the pattern fit.

Use split(). From docs:

str.split(sep=None, maxsplit=-1)

Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified or -1, then there is no limit on the number of splits (all possible splits are made).

So you can use split('M', 1) to get the list of ['V51', '229D180728T132714_ACCEPT_EC_NC'], then split the second entry of the list with 'D' delimiter to get ['229', '180728T132714_ACCEPT_EC_NC']...

Hope you get the idea.

As mxmt said, use regular expressions. Here is another equivalent regex, which might be a little easier to read:

import re

s = 'V51M229D180728T132714_ACCEPT_EC_NC'

pattern = re.compile(r'''
    ^        # beginning of string
    (V\w+)   # first pattern, starting with V
    (M\w+)   # second pattern, starting with M
    (D\d{6}) # third string pattern, six digits starting with D
    T(\d{6}) # time, six digits after T
    _(\w+)
    _(\w+)
    _(\w+)   # final three patterns
    $        # end of string
    ''', re.VERBOSE
)

re.match(pattern, s).groups() -> ('V51', 'M229', 'D180728', '132714', 'ACCEPT', 'EC', 'NC')

If your data is of fixed pattern just sting slicing and list slicing works.

  aa = "V51M229D180728T132714_ACCEPT_EC_NC"                                          
  a = aa.split("_")                                                                 
  str1 = a[0][0:3]                                                                  
  str2 = a[0][3:6]                                                                  
  str3 = a[0][7:14]                                                                 
  str4 = a[0][15:21]                                                                
  str5 = a[1]                                                                       
  str6 = a[2]                                                                     
  str7 = a[3]                                
  print(str1,str2,str3,str4,str5,str6,str7)

Output

V51 M22 D180728 132714 ACCEPT EC NC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM