简体   繁体   English

合理地解析科学记数法?

[英]Parsing scientific notation sensibly?

I want to be able to write a function which receives a number in scientific notation as a string and splits out of it the coefficient and the exponent as separate items.我希望能够编写一个函数,它接收科学计数法中的数字作为字符串,并将系数和指数拆分为单独的项目。 I could just use a regular expression, but the incoming number may not be normalised and I'd prefer to be able to normalise and then break the parts out.我可以只使用正则表达式,但传入的数字可能没有被规范化,我希望能够规范化然后将部分分解。

A colleague has got part way of an solution using VB6 but it's not quite there, as the transcript below shows.一位同事使用 VB6 获得了解决方案的一部分,但它并不完全存在,如下面的成绩单所示。

cliVe> a = 1e6
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 10 exponent: 5 

should have been 1 and 6应该是 1 和 6

cliVe> a = 1.1e6
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.1 exponent: 6

correct正确的

cliVe> a = 123345.6e-7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: -2

correct正确的

cliVe> a = -123345.6e-7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: -2

should be -1.233456 and -2应该是 -1.233456 和 -2

cliVe> a = -123345.6e+7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: 12

correct正确的

Any ideas?有任何想法吗? By the way, Clive is a CLI based on VBScript and can be found on my weblog .顺便说一下,Clive 是一个基于 VBScript 的 CLI,可以在我的博客上找到。

Google on "scientific notation regexp" shows a number of matches, including this one ( don't use it!!!! ) which uses谷歌上的“科学记数法正则表达式”显示了许多匹配项,包括这个不要使用它!!!! )它使用

*** warning: questionable ***
/[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?/

which includes cases such as -.5e7 and +00000e33 (both of which you may not want to allow).其中包括诸如 -.5e7 和 +00000e33 之类的情况(您可能不想允许这两种情况)。

Instead, I would highly recommend you use the syntax on Doug Crockford's JSON website which explicitly documents what constitutes a number in JSON.相反,我强烈建议您使用 Doug Crockford 的JSON 网站上的语法,该网站明确记录了 JSON 中数字的构成。 Here's the corresponding syntax diagram taken from that page:这是从该页面获取的相应语法图:

替代文字
(source: json.org ) (来源: json.org

If you look at line 456 of his json2.js script (safe conversion to/from JSON in javascript), you'll see this portion of a regexp:如果您查看他的json2.js脚本的第 456 行(在 javascript 中与 JSON 的安全转换),您将看到正则表达式的这一部分:

/-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/

which, ironically, doesn't match his syntax diagram.... (looks like I should file a bug) I believe a regexp that does implement that syntax diagram is this one:具有讽刺意味的是,这与他的语法图不匹配....(看起来我应该提交一个错误)我相信实现该语法图的正则表达式是这个:

/-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?/

and if you want to allow an initial + as well, you get:如果你也想允许一个初始 + ,你会得到:

/[+\-]?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?/

Add capturing parentheses to your liking.根据自己的喜好添加捕获括号。

I would also highly recommend you flesh out a bunch of test cases, to ensure you include those possibilities you want to include (or not include), such as:我还强烈建议您充实一堆测试用例,以确保包含您想要包含(或不包含)的那些可能性,例如:

allowed:
+3
3.2e23
-4.70e+9
-.2E-4
-7.6603

not allowed:
+0003   (leading zeros)
37.e88  (dot before the e)

Good luck!祝你好运!

Building off of the highest rated answer, I modified the regex slightly to be /^[+\-]?(?=.)(?:0|[1-9]\d*)?(?:\.\d*)?(?:\d[eE][+\-]?\d+)?$/ .根据评分最高的答案,我将正则表达式稍微修改为/^[+\-]?(?=.)(?:0|[1-9]\d*)?(?:\.\d*)?(?:\d[eE][+\-]?\d+)?$/

The benefits this provides are:这提供的好处是:

  1. allows matching numbers like .9 (I made the (?:0|[1-9]\d*) optional with ? )允许匹配像.9这样的数字(我将(?:0|[1-9]\d*)设为可选?
  2. prevents matching just the operator at the beginning and prevents matching zero-length strings (uses lookahead, (?=.) )防止仅匹配开头的运算符并防止匹配零长度字符串(使用前瞻, (?=.)
  3. prevents matching e9 because it requires the \d before the scientific notation防止匹配e9因为它需要在科学记数法之前的\d

My goal in this is to use it for capturing significant figures and doing significant math.我的目标是用它来捕捉重要的数字和做重要的数学。 So I'm also going to slice it up with capturing groups like so: /^[+\-]?(?=.)(0|[1-9]\d*)?(\.\d*)?(?:(\d)[eE][+\-]?\d+)?$/ .因此,我还将使用如下捕获组对其进行切片: /^[+\-]?(?=.)(0|[1-9]\d*)?(\.\d*)?(?:(\d)[eE][+\-]?\d+)?$/

An explanation of how to get significant figures from this:关于如何从中获得有效数字的解释:

  1. The entire capture is the number you can hand to parseFloat()整个捕获是您可以交给parseFloat()的数字
  2. Matches 1-3 will show up as undefined or strings, so combining them (replace undefined 's with '' ) should give the original number from which significant figures can be extracted.匹配 1-3 将显示为 undefined 或字符串,因此将它们组合起来(将undefined 's 替换为'' )应该给出可以从中提取有效数字的原始数字。

This regex also prevents matching left-padded zeros, which JavaScript sometimes accepts but which I have seen cause issues and which adds nothing to significant figures, so I see preventing left-padded zeros as a benefit (especially in forms).这个正则表达式还可以防止匹配左填充零,JavaScript 有时会接受,但我已经看到它会导致问题并且不会对有效数字添加任何内容,因此我认为防止左填充零是一个好处(尤其是在表单中)。 However, I'm sure the regex could be modified to gobble up left-padded zeros.但是,我确信可以修改正则表达式以吞噬左填充的零。

Another problem I see with this regex is it won't match 90.e9 or other such numbers.我看到这个正则表达式的另一个问题是它与90.e9或其他此类数字不匹配。 However, I find this or similar matches highly unlikely as it is the convention in scientific notation to avoid such numbers.但是,我发现这种匹配或类似匹配极不可能,因为科学记数法中的惯例是避免此类数字。 Though you can enter it in JavaScript, you can just as easily enter 9.0e10 and achieve the same significant figures.尽管您可以在 JavaScript 中输入它,但您也可以很容易地输入9.0e10并获得相同的有效数字。

UPDATE更新

In my testing, I also caught the error that it could match '.'在我的测试中,我还发现了它可以匹配'.'的错误。 . . So the look-ahead should be modified to (?=\.\d|\d) which leads to the final regex:因此,应该将前瞻修改为(?=\.\d|\d) ,这会导致最终的正则表达式:

/^[+\-]?(?=\.\d|\d)(?:0|[1-9]\d*)?(?:\.\d*)?(?:\d[eE][+\-]?\d+)?$/

Here is some Perl code I just hacked together quickly.这是我刚刚快速编写的一些 Perl 代码。

my($sign,$coeffl,$coeffr,$exp) = $str =~ /^\s*([-+])?(\d+)(\.\d*)?e([-+]?\d+)\s*$/;

my $shift = length $coeffl;
$shift = 0 if $shift == 1;

my $coeff =
  substr( $coeffl, 0, 1 );

if( $shift || $coeffr ){
  $coeff .=
    '.'.
    substr( $coeffl, 1 );
}

$coeff .= substr( $coeffr, 1 ) if $coeffr;

$coeff = $sign . $coeff if $sign;

$exp += $shift;

say "coeff: $coeff exponent: $exp";

Building on @Troy Weber, I would suggest在@Troy Weber 的基础上,我建议

/^[+\-]?(?=\.\d|\d)(?:0|[1-9]\d*)?(?:\.\d+)?(?:(?<=\d)(?:[eE][+\-]?\d+))?$/

to avoid matching 3. , per @Jason S's rules避免匹配3. ,根据@Jason S 的规则

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM