简体   繁体   中英

How to remove everything before certain character in Python

I'm new to python and struggle with a certain task:

I have a String that could have anything in it, but it always "ends" the same. It can be just a Filename, a complete path, or just a random string, ending with a Version Number.

Example:

C:\Users\abc\Desktop\string-anotherstring-15.1R7-S8.1
string-anotherstring-15.1R7-S8.1
string-anotherstring.andanother-15.1R7-S8.1

What always is the same (looking from the end) is that if you reach the second dot and go 2 characters in front of it, you always match the part that I'm interested in. Cutting everything after a certain string was "easy," and I solved it myself - that's why the string ends with the version now:)

Is there a way to tell python, "look for the second dot from behind the string and go 2 in front of it and delete everything in front of that so that I get the Version as a string?

Happy for any pointers in the right direction.

Thanks

If you want the version number, can you use the hyphen (-) to split the string? Or do you need to depend on the dots only?

Please see below use of rsplit and join which can help you.

>>> a = 'string-anotherstring.andanother-15.1R7-S8.1'
>>> a.rsplit('-')
['string', 'anotherstring.andanother', '15.1R7', 'S8.1']
>>> a.rsplit('-')[-2:] #Get everything from second last to the end
['15.1R7', 'S8.1']
>>> '-'.join(a.rsplit('-')[-2:]) #Get everything from second last to the end, and join them with a hyphen
'15.1R7-S8.1'
>>> 

For using dots, use the same way

>>> a
'string-anotherstring.andanother-15.1R7-S8.1'
>>> data = a.rsplit('.')
>>> [data[-3][-2:]]
['15']
>>> [data[-3][-2:]] + data[-2:]
['15', '1R7-S8', '1']
>>> '.'.join([data[-3][-2:]] + data[-2:])
'15.1R7-S8.1'
>>> 

You can build a regex from the end mark of a line using the anchor $ .

Using your own description, use the regex:

(\d\d\.[^.]*)\.[^.]*$

Demo

If you want the last characters from the end included, just move the capturing parenthesis:

(\d\d\.[^.]*\.[^.]*)$

Demo

Explanation:

(\d\d\.[^.]*\.[^.]*)$

 ^  ^                    #digits
      ^                  # a literal '.'
        ^                # anything OTHER THAN a '.'
            ^            # literal '.'
               ^         # anything OTHER THAN a '.'
                     ^   # end of line

Assuming I understand this correctly, there are two ways to do this that come to mind:

Including both, since I might not understand this correctly, and for completeness reasons. I think the split/parts solution is cleaner, particularly when the 'certain character' is a dot.

>>> msg = r'C:\Users\abc\Desktop\string-anotherstring-15.1R7-S8.1'

>>> re.search(r'.*(..\..*)', msg).group(1)
'S8.1'

>>> parts = msg.split('.')
>>> ".".join((parts[-2][-2:], parts[-1]))
'S8.1'

For your example, you can split the string by the separator '-', and then join the last two indices. Like so:

txt = "string-anotherstring-15.1R7-S8.1"

x = txt.split("-")

y = "".join(x[-2:])

print(y)  # outputs 15.1R7S8.1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM