简体   繁体   English

非指数格式的浮点数

[英]Non-exponential formatted float

I have a UTF-8 formatted data file that contains thousands of floating point numbers. 我有一个UTF-8格式的数据文件,其中包含数千个浮点数。 At the time it was designed the developers decided to omit the 'e' in the exponential notation to save space. 在设计时,开发人员决定省略指数表示法中的'e'以节省空间。 Therefore the data looks like: 因此数据看起来像:

 1.85783+16 0.000000+0 1.900000+6-3.855418-4 1.958263+6 7.836995-4
-2.000000+6 9.903130-4 2.100000+6 1.417469-3 2.159110+6 1.655700-3
 2.200000+6 1.813662-3-2.250000+6-1.998687-3 2.300000+6 2.174219-3
 2.309746+6 2.207278-3 2.400000+6 2.494469-3 2.400127+6 2.494848-3
-2.500000+6 2.769739-3 2.503362+6 2.778185-3 2.600000+6 3.020353-3
 2.700000+6 3.268572-3 2.750000+6 3.391230-3 2.800000+6 3.512625-3
 2.900000+6 3.750746-3 2.952457+6 3.872690-3 3.000000+6 3.981166-3
 3.202512+6 4.437824-3 3.250000+6 4.542310-3 3.402356+6 4.861319-3

The problem is float.Parse() will not work with this format. 问题是float.Parse()不适用于这种格式。 The intermediate solution I had was, 我的中间解决方案是,

    protected static float ParseFloatingPoint(string data)
    {

        int signPos;
        char replaceChar = '+';

        // Skip over first character so that a leading + is not caught
        signPos = data.IndexOf(replaceChar, 1);

        // Didn't find a '+', so lets see if there's a '-'
        if (signPos == -1)
        {
            replaceChar = '-';
            signPos = data.IndexOf('-', 1);
        }

        // Found either a '+' or '-'
        if (signPos != -1)
        {
            // Create a new char array with an extra space to accomodate the 'e'
            char[] newData = new char[EntryWidth + 1];

            // Copy from string up to the sign
            for (int i = 0; i < signPos; i++)
            {
                newData[i] = data[i];
            }

            // Replace the sign with an 'e + sign'
            newData[signPos] = 'e';
            newData[signPos + 1] = replaceChar;

            // Copy the rest of the string
            for (int i = signPos + 2; i < EntryWidth + 1; i++)
            {
                newData[i] = data[i - 1];
            }

            return float.Parse(new string(newData), NumberStyles.Float, CultureInfo.InvariantCulture);
        }
        else
        {
            return float.Parse(data, NumberStyles.Float, CultureInfo.InvariantCulture);
        }
    }

I can't call a simple String.Replace() because it will replace any leading negative signs. 我不能调用一个简单的String.Replace()因为它将替换任何前导的负号。 I could use substrings but then I'm making LOTS of extra strings and I'm concerned about the performance. 我可以使用子串,但后来我正在制作很多额外的字符串,我很担心性能。

Does anyone have a more elegant solution to this? 有人有更优雅的解决方案吗?

string test = "1.85783-16";
char[] signs = { '+', '-' };

int decimalPos = test.IndexOf('.');
int signPos = test.LastIndexOfAny(signs); 

string result = (signPos > decimalPos) ?
     string.Concat(
         test.Substring(0, signPos), 
         "E", 
         test.Substring(signPos)) : test;

float.Parse(result).Dump();  //1.85783E-16

The ideas I'm using here ensure the decimal comes before the sign (thus avoiding any problems if the exponent is missing) as well as using LastIndexOf() to work from the back (ensuring we have the exponent if one existed). 我在这里使用的想法确保小数位于符号之前(因此如果指数丢失则避免任何问题)以及使用LastIndexOf()从后面工作(确保我们有指数,如果存在)。 If there is a possibility of a prefix "+" the first if would need to include || signPos < decimalPos 如果有可能是前缀“+”,则第一个如果需要包含|| signPos < decimalPos || signPos < decimalPos . || signPos < decimalPos

Other results: 其他结果:

"1.85783" => "1.85783"; //Missing exponent is returned clean
"-1.85783" => "-1.85783"; //Sign prefix returned clean
"-1.85783-3" => "-1.85783e-3" //Sign prefix and exponent coexist peacefully.

According to the comments a test of this method shows only a 5% performance hit (after avoiding the String.Format(), which I should have remembered was awful). 根据评论,这个方法的测试显示只有5%的性能命中(在避免String.Format()之后,我应该记住它是可怕的)。 I think the code is much clearer: only one decision to make. 我认为代码更清晰:只做出一个决定。

In terms of speed, your original solution is the fastest I've tried so far (@Godeke's is a very close second). 在速度方面,你原来的解决方案是我迄今为止尝试过的最快的(@ Godeke是非常接近的第二个)。 @Godeke's has a lot of readability, for only a minor amount of performance degradation. @ Godeke有很多可读性,只是性能下降很小。 Add in some robustness checks, and his may be the long term way to go. 添加一些稳健性检查,他可能是长期的方法。 In terms of robustness, you can add that in to yours like so: 在稳健性方面,您可以将其添加到您的内容中:

static char[] signChars = new char[] { '+', '-' };

static float ParseFloatingPoint(string data)
{
    if (data.Length != EntryWidth)
    {
        throw new ArgumentException("data is not the correct size", "data");
    }
    else if (data[0] != ' ' && data[0] != '+' && data[0] != '-')
    {
        throw new ArgumentException("unexpected leading character", "data");
    }

    int signPos = data.LastIndexOfAny(signChars);

    // Found either a '+' or '-'
    if (signPos > 0)
    {
        // Create a new char array with an extra space to accomodate the 'e'
        char[] newData = new char[EntryWidth + 1];

        // Copy from string up to the sign
        for (int ii = 0; ii < signPos; ++ii)
        {
            newData[ii] = data[ii];
        }

        // Replace the sign with an 'e + sign'
        newData[signPos] = 'e';
        newData[signPos + 1] = data[signPos];

        // Copy the rest of the string
        for (int ii = signPos + 2; ii < EntryWidth + 1; ++ii)
        {
            newData[ii] = data[ii - 1];
        }

        return Single.Parse(
            new string(newData),
            NumberStyles.Float,
            CultureInfo.InvariantCulture);
    }
    else
    {
        Debug.Assert(false, "data does not have an exponential? This is odd.");
        return Single.Parse(data, NumberStyles.Float, CultureInfo.InvariantCulture);
    }
}

Benchmarks on my X5260 (including the times to just grok out the individual data points): 我的X5260基准测试(包括了解各个数据点的时间):

Code                Average Runtime  Values Parsed
--------------------------------------------------
Nothing (Overhead)            13 ms              0
Original                      50 ms         150000
Godeke                        60 ms         150000
Original Robust               56 ms         150000

Thanks Godeke for your contiually improving edits. 感谢Godeke对您的改进进行改进。

I ended up changing the parameters of the parsing function to take a char[] rather than a string and used your basic premise to come up with the following. 我最后更改了解析函数的参数以获取char []而不是字符串,并使用您的基本前提来提出以下内容。

    protected static float ParseFloatingPoint(char[] data)
    {
        int decimalPos = Array.IndexOf<char>(data, '.');
        int posSignPos = Array.LastIndexOf<char>(data, '+');
        int negSignPos = Array.LastIndexOf<char>(data, '-');

        int signPos = (posSignPos > negSignPos) ? posSignPos : negSignPos;

        string result;
        if (signPos > decimalPos)
        {
            char[] newData = new char[data.Length + 1];
            Array.Copy(data, newData, signPos);
            newData[signPos] = 'E';
            Array.Copy(data, signPos, newData, signPos + 1, data.Length - signPos);
            result = new string(newData);
        }
        else
        {
            result = new string(data);
        }

        return float.Parse(result, NumberStyles.Float, CultureInfo.InvariantCulture);
    }

I changed the input to the function from string to char[] because I wanted to move away from ReadLine(). 我将函数的输入从string更改为char []因为我想远离ReadLine()。 I'm assuming this would perform better then creating lots of strings. 我假设这会比创建大量字符串更好。 Instead I get a fixed number of bytes from the data file (since it will ALWAYS be 11 char width data), converting the byte[] to char[], and then performing the above processing to convert to a float. 相反,我从数据文件中获取固定数量的字节(因为它总是11个字符宽度数据),将byte []转换为char [],然后执行上述处理以转换为float。

Could you possibly use a regular expression to pick out each occurrence? 您是否可以使用正则表达式来挑选每个事件?

Some information here on suitable expresions: 有关适当表达的一些信息:

http://www.regular-expressions.info/floatingpoint.html http://www.regular-expressions.info/floatingpoint.html

Why not just write a simple script to reformat the data file once and then use float.Parse() ? 为什么不写一个简单的脚本来重新格式化数据文件一次然后使用float.Parse()

You said "thousands" of floating point numbers, so even a terribly naive approach will finish pretty quickly (if you said "trillions" I would be more hesitant), and code that you only need to run once will (almost) never be performance critical. 你说的是“数以千计”的浮点数,所以即使一个非常天真的方法也会很快完成(如果你说“万亿”我会更犹豫),而你只需要运行一次的代码(几乎)永远不会是性能危急。 Certainly it would take less time to run then posting the question to SO takes, and there's much less opportunity for error. 当然,运行然后将问题发布到SO需要花费更少的时间,并且错误的机会更少。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM