简体   繁体   中英

python parsing a string

I have a list with strings.

list_of_strings

They look like that:

'/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'

I want to part this string into: /folder1/folder2/folder3/folder4/folder5/exp-* and put this into a new list.

I thought to do something like that, but I am lacking the right snippet to do what I want:

list_of_stringparts = []

for string in sorted(list_of_strings):
    part= string.split('/')[7]  # or whatever returns the first part of my string
    list_of_stringparts.append(part)

has anyone an idea? Do I need a regex?

You are using array subscription which extracts one (eigth) element. To get first seven elements, you need a slicing [N:M:S] like this:

>>> l = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> l.split('/')[:7]
['', 'folder1', 'folder2', 'folder3', 'folder4', 'folder5', 'exp-*']

In our case N is ommitted (by default 0) and S is step which is by default set to 1, so you'll get elements 0-7 from the result of split .

To construct your string back, use join() :

>>> '/'.join(s)
'/folder1/folder2/folder3/folder4/folder5/exp-*'

I would do like this,

>>> s = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> s.split('/')[:7]
['', 'folder1', 'folder2', 'folder3', 'folder4', 'folder5', 'exp-*']
>>> '/'.join(s.split('/')[:7])
'/folder1/folder2/folder3/folder4/folder5/exp-*'

Using re.match

>>> s = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> re.match(r'.*?\*', s).group()
'/folder1/folder2/folder3/folder4/folder5/exp-*'

Your example suggests that you want to partition the strings at the first * character. This can be done with str.partition() :

list_of_stringparts = []

list_of_strings = ['/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file', '/folder1/exp-*/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file', '/folder/blah/pow']
for s in sorted(list_of_strings):
    head, sep, tail = s.partition('*')
    list_of_stringparts.append(head + sep)

>>> list_of_stringparts
['/folder/blah/pow', '/folder1/exp-*', '/folder1/folder2/folder3/folder4/folder5/exp-*']

Or this equivalent list comprehension:

list_of_stringparts = [''.join(s.partition('*')[:2]) for s in sorted(list_of_strings)]

This will retain any string that does not contain a * - not sure from your question if that is desired.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM