简体   繁体   中英

Reading double values from a text file

Trying to read data from a text file using a C# application. There're multiple lines of data and each of them start with an integer and then followed by bunch of double values. A part of the text file looks like this,

   33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03
   34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03
   35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03

Here 33, 34, 35 are integer values and it's followed by 6 double values. And these double values are not guaranteed to have space or some other delimiter between them. ie, if a double is negative then it will have a "-" before it and this will take up the space. So basically, it's possible that all 6 double values will be together.

Now the challenge is, how to extract this gracefully?

What I tried:

String.Split(' ');

This will not work as a space is not guaranteed between the initial integer values and then the rest of double values.

This can be easily solved in C++ using sscanf .

double a, b, c, d, e, f;

sscanf(string, "%d %lf%lf%lf%lf%lf%lf", &a, &b, &c, &d, &e, &f);
// here string contains a line of data from text file.

The text file containing double values are generated by a 3rd party tool and I have no control over its output.

Is there a way the integer and double values can be gracefully extracted line by line?

If I am seeing that right, you have a "Fixed Width Data" format. Than you can simply parse on that fact.

ie assuming the values are in a file d:\\temp\\doubles.txt :

void Main()
{
    var filename = @"d:\temp\doubles.txt";
    Func<string, string[]> split = (s) =>
    {
        string[] res = new string[7];
        res[0] = s.Substring(0, 2);
        for (int i = 0; i < 6; i++)
        {
            res[i + 1] = s.Substring(2 + (i * 19), 19);
        }
        return res;
    };
    var result = from l in File.ReadAllLines(filename)
                 let la = split(l)
                 select new
                 {
                    i = int.Parse(la[0]),
                     d1 = double.Parse(la[1]),
                     d2 = double.Parse(la[2]),
                     d3 = double.Parse(la[3]),
                     d4 = double.Parse(la[4]),
                     d5 = double.Parse(la[5]),
                     d6 = double.Parse(la[6])

                 };
    foreach (var e in result)
    {
        Console.WriteLine($"{e.i}, {e.d1}, {e.d2}, {e.d3}, {e.d4}, {e.d5}, {e.d6}");
    }
}

Outputs:

33, 0.0573140941467, 0.00011291426239, 0.00255553577735, 4.97192659486E-05, 0.0141869181079, -0.000147813598922
34, 0.0570076593453, 0.000100112550891, 0.00256427138318, -8.68691490164E-06, 0.0142821920093, -0.000346011975369
35, 0.0715507714946, 0.000316132133031, -0.0106581466521, -9.205137369E-05, 0.0138018668842, -0.000212219497066

PS: With your exact data, int should be allocating more space.

Solve this with a regular expression. My first shot is:

"[\s-+]\d+\.\d+E[+-]\d\d"

I just tried it this way:

using System;
using System.Globalization;
using System.Text.RegularExpressions;

namespace ConsoleApp1 {

    class Program {
        static void Main(string[] args) {
            var fileContents =
                  "33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03"
                + "34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03"
                + "35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03";

            var rex = new Regex(@"[\s-+]\d+\.\d+E[+-]\d\d", RegexOptions.Multiline);
            foreach (Match match in rex.Matches(fileContents)) {
                double d = double.Parse(match.Value.TrimStart(), NumberFormatInfo.InvariantInfo);
                Console.WriteLine("found a match: " + match.Value.TrimStart() + " => " + d);
            }

            Console.ReadLine();
        }
    }
}

With this output (german localization, with comma as decimal separator):

found a match: 0.573140941467E-01 => 0,0573140941467
found a match: 0.112914262390E-03 => 0,00011291426239
found a match: 0.255553577735E-02 => 0,00255553577735
found a match: 0.497192659486E-04 => 4,97192659486E-05
found a match: 0.141869181079E-01 => 0,0141869181079
found a match: -0.147813598922E-03 => -0,000147813598922
found a match: 0.570076593453E-01 => 0,0570076593453
found a match: 0.100112550891E-03 => 0,000100112550891
found a match: 0.256427138318E-02 => 0,00256427138318
found a match: -0.868691490164E-05 => -8,68691490164E-06
found a match: 0.142821920093E-01 => 0,0142821920093
found a match: -0.346011975369E-03 => -0,000346011975369
found a match: 0.715507714946E-01 => 0,0715507714946
found a match: 0.316132133031E-03 => 0,000316132133031
found a match: -0.106581466521E-01 => -0,0106581466521
found a match: -0.920513736900E-04 => -9,205137369E-05
found a match: 0.138018668842E-01 => 0,0138018668842
found a match: -0.212219497066E-03 => -0,000212219497066

I just went non optimal and replaced the "E-" string to something else while I replaced all the negative sign with a space and a negative sign (" -") then reverted all the "E-" values.

Then I was able to use split to extract the values.

private static IEnumerable<double> ExtractValues(string values)
{
    return values.Replace("E-", "E*").Replace("-", " -").Replace("E*", "E-").Split(' ').Select(v => double.Parse(v));
}

You could do this:

public void ParseFile(string fileLocation)
{
   string[] lines = File.ReadAllLines(fileLocation);

   foreach(var line in lines)
   {
       string[] parts = var Regex.Split(line, "(?((?<!E)-)| )");

       if(parts.Any())
       {
          int first = int.Parse(parts[0]);

          double[] others = parts.Skip(1).Select(a => double.Parse(a)).ToArray();
       }
   }
}   

The answers i have seen so far are so complex. Here is a simple one without overthinking

According to @Veljko89's comment, i have updated the code with unlimited number support

    List<double> ParseLine(string line)
    {
        List<double> ret = new List<double>();

        ret.Add(double.Parse(line.Substring(0, line.IndexOf(' '))));
        line = line.Substring(line.IndexOf(' ') + 1);

        for (; !string.IsNullOrWhiteSpace(line); line = line.Substring(line.IndexOf('E') + 4))
        {
            ret.Add(double.Parse(line.Substring(0, line.IndexOf('E') + 4)));
        }

        return ret;
    }

If we can't use string.Split we can try to split by regular expressions with a help of Regex.Split ; for a given line

string line = @"  33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03";

We can try

// Split either
//   1. by space
//   2. zero length "char" which is just after a [0..9] digit and followed by "-" or "+"
var items = Regex
  .Split(line, @" |((?<=[0-9])(?=[+-]))")
  .Where(item => !string.IsNullOrEmpty(item)) // we don't want empty parts 
  .Skip(1)                                    // skip 1st 33
  .Select(item => double.Parse(item));        // we want double

Console.WriteLine(string.Join(Environment.NewLine, items));

and get

0.573140941467E-01
0.112914262390E-03
0.255553577735E-02
0.497192659486E-04
0.141869181079E-01
-0.147813598922E-03

In case of a Text file we should split each line:

Regex regex = new Regex(@" |((?<=[0-9])(?=[+-]))");

var records = File
  .ReadLines(@"c:\MyFile.txt") 
  .Select(line => regex
     .Split(line)
     .Where(item => !string.IsNullOrEmpty(item))
     .Skip(1)
     .Select(item => double.Parse(item))
     .ToArray());

Demo:

  string[] test = new string[] {
     // your examples
     "  33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03",
     "  34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03",
     " 35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03",

     // Some challenging cases (mine)
     "    36 123+456-789    123e+78 9.9e-95 0.0001", 
  };

  Regex regex = new Regex(@" |((?<=[0-9])(?=[+-]))");

  var records = test
    .Select(line => regex
      .Split(line)
      .Where(item => !string.IsNullOrEmpty(item))
      .Skip(1)
      .Select(item => double.Parse(item))
      .ToArray());

  string testReport = string.Join(Environment.NewLine, records
    .Select(record => $"[{string.Join(", ", record)}]"));

  Console.WriteLine(testReport);

Outcome:

[0.0573140941467, 0.00011291426239, 0.00255553577735, 4.97192659486E-05, 0.0141869181079, -0.000147813598922]
[0.0570076593453, 0.000100112550891, 0.00256427138318, -8.68691490164E-06, 0.0142821920093, -0.000346011975369]
[0.0715507714946, 0.000316132133031, -0.0106581466521, -9.205137369E-05, 0.0138018668842, -0.000212219497066]
[123, 456, -789, 1.23E+80, 9.9E-95, 0.0001]

Yet another solution, processing each line by itself and including the int value:

    static void Main(string[] args) {
        string[] fileLines = {
            "33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03",
            "34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03",
            "35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03"
        };

        var rex = new Regex(@"\b([-+]?\d+(?:\.\d+(?:E[+-]\d+)?)?)\b", RegexOptions.Compiled);
        foreach (var line in fileLines) {

            var dblValues = new List<double>();
            foreach (Match match in rex.Matches(line)) {
                string strVal = match.Groups[1].Value;
                double number = Double.Parse(strVal, NumberFormatInfo.InvariantInfo);
                dblValues.Add(number);
            }

            Console.WriteLine(string.Join("; ", dblValues));
        }

        Console.ReadLine();
    }
}

The result/output is:

33; 0,0573140941467; 0,00011291426239; 0,00255553577735; 4,97192659486E-05; 0,0141869181079; -0,000147813598922
34; 0,0570076593453; 0,000100112550891; 0,00256427138318; -8,68691490164E-06; 0,0142821920093; -0,000346011975369
35; 0,0715507714946; 0,000316132133031; -0,0106581466521; -9,205137369E-05; 0,0138018668842; -0,000212219497066

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM