简体   繁体   中英

Python RegEx matching substrings on various conditions

Been struggling with this one for a while now - I simply can't wrap my brain around it.

Given the following string variations:

some text
some text http://a.link.to/something
some text - http://a.link.to/something
some text: http://a.link.to/something
http://a.link.to/something

I am looking for a RegEx that would produce the following:

{'text': 'some text',
 'link': ''}

{'text': 'some text',
 'link': 'http://a.link.to/something'}

{'text': '',
 'link': 'http://a.link.to/something'}

Cheers!

Use named capturing groups in re.match function so that you could be able to create dictionary with user defined keys.

>>> s = '''some text
some text http://a.link.to/something
some text - http://a.link.to/something
some text: http://a.link.to/something
http://a.link.to/something'''
>>> for i in s.split('\n'):
        re.match(r'(?P<text>(?:(?!http://).)*?)\W*\b(?P<link>http://.*)?$', i).groupdict()


{'link': None, 'text': 'some text'}
{'link': 'http://a.link.to/something', 'text': 'some text'}
{'link': 'http://a.link.to/something', 'text': 'some text'}
{'link': 'http://a.link.to/something', 'text': 'some text'}
{'link': 'http://a.link.to/something', 'text': ''}

You can use a regex like this:

(.+?)(http.*)?$

Working demo

在此处输入图片说明

As you can see is not fully achieving what you want for the case of:

some text - http://a.link.to/something

Since it generates:

{'text': 'some text - ',  'link': 'http://a.link.to/something'}
                    ^--- Dash here

But you can do a pre or post clean to the text.

I'm posting the answer since it might help you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM