[英]Splitting a string based on commas between words with special conditions - Python
[英]splitting a dot delimited string into words but with a special case
不确定是否有一种简单的方法来拆分以下字符串:
'school.department.classes[cost=15.00].name'
进入这个:
['school', 'department', 'classes[cost=15.00]', 'name']
注意:我想保持'classes[cost=15.00]'
不变。
>>> import re
>>> text = 'school.department.classes[cost=15.00].name'
>>> re.split(r'\.(?!\d)', text)
['school', 'department', 'classes[cost=15.00]', 'name']
更具体的版本:
>>> re.findall(r'([^.\[]+(?:\[[^\]]+\])?)(?:\.|$)', text)
['school', 'department', 'classes[cost=15.00]', 'name']
详细:
>>> re.findall(r'''( # main group
[^ . \[ ]+ # 1 or more of anything except . or [
(?: # (non-capture) opitional [x=y,...]
\[ # start [
[^ \] ]+ # 1 or more of any non ]
\] # end ]
)? # this group [x=y,...] is optional
) # end main group
(?:\.|$) # find a dot or the end of string
''', text, flags=re.VERBOSE)
['school', 'department', 'classes[cost=15.00]', 'name']
括号内的跳过点:
import re
s='school.department.classes[cost=15.00].name'
print re.split(r'[.](?![^][]*\])', s)
输出:
['school', 'department', 'classes[cost=15.00]', 'name']
这可能会很匆忙,您可能需要实际解析此字符串而不是仅仅将其拆分:
from pyparsing import (Forward,Suppress,Word,alphas,quotedString,
alphanums,Regex,oneOf,Group,delimitedList)
# define some basic punctuation, numerics, operators
LBRACK,RBRACK = map(Suppress, '[]')
ident = Word(alphas+'_',alphanums+'_')
real = Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
compOper = oneOf('= != < > <= >=')
# a full reference may be composed of full references, i.e., a recursive
# grammar - forward declare a full reference
fullRef = Forward()
# a value in a filtering expression could be a full ref or numeric literal
value = fullRef | real | integer | quotedString
filterExpr = Group(value + compOper + value)
# a single dotted ref could be one with a bracketed filter expression
# (which we would want to keep together in a group) or just a plain identifier
ref = Group(ident + LBRACK + filterExpr + RBRACK) | ident
# now insert the definition of a fullRef, using '<<' instead of '='
fullRef << delimitedList(ref, '.')
# try it out
s = 'school.department.classes[cost=15.00].name'
print fullRef.parseString(s)
s = 'school[size > 10000].department[school.type="TECHNICAL"].classes[cost=15.00].name'
print fullRef.parseString(s)
打印:
['school', 'department', ['classes', ['cost', '=', 15.0]], 'name']
[['school', ['size', '>', 10000]], ['department', ['school', 'type', '=', '"TECHNICAL"']], ['classes', ['cost', '=', 15.0]], 'name']
(如果需要,将“课程[费用= 15.00]”重新组合起来并不困难。)
#最简单的句子拆分方法是使用.split('.') 如下所示:
s = 'school.department.classes[cost=15.00].name'
s.split('.')
这是您预期的 output:
['school', 'department', 'classes[cost=15', '00]', 'name']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.