简体   繁体   中英

Pythonic way of parsing this string?

I'm parsing this line -

0386          ; Greek # L&       GREEK CAPITAL LETTER ALPHA WITH TONOS

Basically, I need -

point = 0386
script = Greek

And I'm doing it like this,

point = line.split(";")[0].replace(" ","")
script = line.split("#")[0].split(";")[1].replace(" ","")

I'm not convinced that what I'm doing is the most pythonic way of doing it, is there a more elegant way of doing this? Maybe a regex one-liner?

If you want a regex one liner:

point, script = re.search("^(\d+)\s*;\s*(\S+)\s*.*$",s).groups()

where s is your string, and of course you need to import re

>>> code, desc = line[:line.rfind('#')].split(';')
>>> code.strip()
'0386'
>>> desc.strip()
'Greek'

Using map with unbound method str.strip :

>>> line = '0386      ; Greek # L&   GREEK CAPITAL LETTER ALPHA WITH TONOS'
>>> point, script = map(str.strip, line.split('#')[0].split(';'))
>>> point
'0386'
>>> script
'Greek'

Using list comprehension:

>>> point, script = [word.strip() for word in line.split('#')[0].split(';')]
>>> point
'0386'
>>> script
'Greek'

This is how I would've done it:

>>> s = "0386          ; Greek # L&       GREEK CAPITAL LETTER ALPHA WITH TONOS"
>>> point = s.split(';')[0].strip()
>>> point
'0386'
>>> script = s.split(';')[1].split('#')[0].strip()
>>> script
'Greek'

Note that you can re-use s.split(';') . So perhaps saving it to a var would be a good idea:

>>> var = s.split(';')
>>> point = var[0].strip()  # Strip gets rid of all the whitespace
>>> point
'0386'
>>> script = var[1].split('#')[0].strip()
>>> script
'Greek'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM