简体   繁体   English

用于匹配固定宽度的浮点数的正则表达式

[英]Regex for matching a floating point number of fixed width

Short短的

I am looking for a regular expression capturing the longest possible floating point number up to n characters.我正在寻找一个正则表达式,捕获最多 n 个字符的最长可能浮点数。 It should be able to directly replace C format specifiers like "%5f" in C's scanf function.它应该能够直接替换 C 的 scanf 函数中的 C 格式说明符,如“%5f”。

Examples例子

For n=5, we want to capture对于 n=5,我们要捕获

string       captured group
123        ->   123
1.23       ->   1.23
123456     ->   12345
+12.34     ->   +12.3
-12.34     ->   -12.3
1.2-3      ->   1.2
1.2.3.4    ->   1.2

Background背景

The above behaviour is similar to C's "sscanf" function with "%5f" as format specifier.上述行为类似于 C 的“sscanf”函数,其中“%5f”作为格式说明符。 From the manpage: "Reading of characters stops either when this maximum [n in our case] is reached or when a nonmatching character is found, whichever happens first."来自联机帮助页:“当达到此最大值 [在我们的示例中为 n] 或找到不匹配的字符时,以先发生者为准,停止读取字符。” I am trying to build a python analog to scanf.我正在尝试为 scanf 构建一个 python 模拟。 There are existing projects like this one , but without support for the maximum field width.有现有项目像这一个,但没有为最大字段宽度的支持。

My approach我的方法

My question is somewhat similar to this question .我的问题有点类似于这个问题

I tried the following regex: ((?=[+-]?\\d+(?:\\.\\d*)?)[\\d\\.+-]{1,5})我尝试了以下正则表达式: ((?=[+-]?\\d+(?:\\.\\d*)?)[\\d\\.+-]{1,5})

It consist of a lookahead, checking the format of the number, and a following character class with an interval, restricting the length of the captured group.它包括一个前瞻,检查数字的格式,以及一个带有间隔的跟随字符类,限制捕获组的长度。 The problem lies in the last example above, where the lookahead correctly matches only the first part of the string, but the interval extends up to the trailing '-3'.问题在于上面的最后一个示例,其中前瞻仅正确匹配字符串的第一部分,但间隔一直延伸到尾随的“-3”。 Do you have any suggestions?你有什么建议吗? Can we create a regex, where the interval refers only to the characters matched by the lookahead?我们可以创建一个正则表达式,其中间隔仅指前瞻匹配的字符吗?

Thank you very much in advance!非常感谢您提前!

Well, you could possibly get along with just好吧,你可能只是相处

^(?:-[\d.]{1,4}|[\d.]{1,5})

if your column only has these strings.如果您的列只有这些字符串。 See a demo on regex101.com .在 regex101.com 上查看演示
Note, that this expression would also allow sth.请注意,这个表达式也允许 sth。 like ...5 ....5 If you have these sort of strings, you'd need to make the expression more strict.如果您有这些类型的字符串,则需要使表达式更加严格。

You can use您可以使用

^[+-]?\d+(?:\.\d*)?(?<!.{6})
^[+-]?\d+(?:\.\d+)?(?<!.{6})

See the regex demo .请参阅正则表达式演示 Details :详情

  • ^ - start of string ^ - 字符串的开始
  • [+-]? - an optional + or - - 一个可选的+-
  • \\d+ - one or more digits \\d+ - 一位或多位数字
  • (?:\\.\\d*)? - an optional occurrence of . - 的可选出现. and zero or more digits (one or more digits if you use \\d+ )和零个或多个数字(一个或多个数字,如果您使用\\d+
  • (?<!.{6}) - a negative lookbehind that fails the match if there are six chars other than line break chars immediately to the left of the current position. (?<!.{6}) - 如果当前位置左侧有六个字符而不是换行符,则匹配失败的负向后视。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM