繁体   English   中英

Python re 表达式可用于十进制和科学记数法

[英]Python re expression available with decimal and scientific notations

我需要一个正则表达式,给定一个文本字符串,只提取数字。 通常这些数字将以十进制表示,在这种情况下,到目前为止我使用了以下表达式:

r'-?(?:\d+(?:\.\d*)?)'

可以通过以下示例进行验证:

>>>text1=u'''MULTIPOLYGON (((-0.026629449670668229 38.880267142395049, 
                       -0.037640029706400797 38.887965291134428, 
                       -0.038243258379973236 38.886652370401961, 
                       -0.038324794358468445 38.886474904266947, 
                       -0.039081561703673183 38.885154939177824)))'''
>>>re.findall(r'-?(?:\d+(?:\.\d*)?)',text1)
[u'-0.026629449670668229', u'38.880267142395049', u'-0.037640029706400797', u'38.887965291134428', u'-0.038243258379973236', u'38.886652370401961', u'-0.038324794358468445', u'38.886474904266947', u'-0.039081561703673183', u'38.885154939177824']

但是,在某些情况下(我最初没有考虑过)表示数字的符号是科学的( AeN ),这与上述表达式不兼容,如下例所示:

>>>text2=u'''MULTIPOLYGON (((-1.1577490327131464e-05 38.865878133979862, 
                       -0.037640029706400797 38.887965291134428, 
                       -0.038243258379973236 38.886652370401961, 
                       -0.038324794358468445 38.886474904266947, 
                       -0.039081561703673183 38.885154939177824)))'''
>>>re.findall(r'-?(?:\d+(?:\.\d*)?)',text2)
[u'-1.1577490327131464', u'-05', u'38.865878133979862', u'-0.037640029706400797', u'38.887965291134428', u'-0.038243258379973236', u'38.886652370401961', u'-0.038324794358468445', u'38.886474904266947', u'-0.039081561703673183', u'38.885154939177824']

我想知道是否有一个表达式可以为前面的示例获得以下结果:

>>>re.findall(RE_EXPRESSION,text2)
    [u'-1.1577490327131464e-05', u'38.865878133979862', u'-0.037640029706400797', u'38.887965291134428', u'-0.038243258379973236', u'38.886652370401961', u'-0.038324794358468445', u'38.886474904266947', u'-0.039081561703673183', u'38.885154939177824']
Alternation may help.

re.findall(r'-?\d+\.\d+(?:[Ee][-+]?\d+)|-?\d+\.\d+',text2)

['-1.1577490327131464e-05', '38.865878133979862', '-0.037640029706400797', '38.887965291134428', '-0.038243258379973236', '38.886652370401961', '-0.038324794358468445', '38.886474904266947', '-0.039081561703673183', '38.885154939177824']

您的正则表达式似乎很好,如果您只想覆盖指数 forms 除了正常的 forms 之外,它可以像e-123e+123e123 ,其中e有时可以是E ,您只需要添加(?:[eE][-+]?\d+)? 在您现有的正则表达式中并使用以下正则表达式,

-?(?:\d+(?:\.\d*)?)(?:[eE][-+]?\d+)?

演示

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM