简体   繁体   中英

How to use regex in python in getting a string between two characters?

I have this as my input

content = 'abc.zip'\n

I want to take out abc out of it. How do I do it using regex in python?

Edit:

No this is not a homework question. I am trying to automate something and I am stuck at a certain point so that I can make the automate generic to any zip file I have.

os.system('python unzip.py -z data/ABC.zip -o data/')

After I take in the zip file, I unzip it. I am planning to make it generic, by getting the filename from the directory the zip file was put in and then provide the file name to the upper stated syntax to unzip it

As I implied in my comment, regular expressions are unlikely to be the best tool for the job (unless there is some artificial restriction on the problem, or it is far more complex than your example). The standard string and/or path libraries provide functions which should do what you are after. To better illustrate how these work, I'll use the following definition of content instead:

>>> content = 'abc.def.zip'

If its a file, and you want the name and extension:

>>> import os.path
>>> filename, extension = os.path.splitext(content)
>>> print filename
abc.def
>>> print extension
.zip

If it is a string, and you want to remove the substring 'abc':

>>> noabc = content.replace('abc', '')
>>> print noabc
.def.zip

If you want to break it up on each occurrence of a period;

>>> broken = content.split('.')
>>> print broken
['abc', 'def', 'zip']

If it has multiple periods, and you want to break it on the first or last one:

>>> broken = content.split('.', 1)
>>> print broken
['abc', 'def.zip']
>>> broken = content.rsplit('.', 1)
>>> print broken
['abc.def', 'zip']

Edit: Changed the regexp to match for "content = 'abc.zip\n'" instead of the string "abc.zip".

import re 

#Matching for "content = 'abc.zip\n'"
matches = re.match("(?P<filename>.*).zip\n'$", "content = 'abc.zip\n'")
matches = matches.groupdict()
print matches

#Matching for "abc.zip"    
matches = re.match("(?P<filename>.*).zip$", "abc.zip")
matches = matches.groupdict()
print matches

Output:

{'filename': 'abc'}

This will print the matches of everything before .zip . You can access everything like a regular dictionary.

If you're trying to break up parts of a path, you may find the os.path module to be useful. It has nice abstractions with clear semantics that are easy to use.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM