简体   繁体   English

从文本文件中提取数组

[英]Extract array from text file

I have a huge (~500K) text file, which looks like this: 我有一个巨大的(~500K)文本文件,如下所示:

{   // H-20e180a.wav 
    {-7,5,-4,-1,-9,2,-5,-1,2,-12,12,-33,34,-48,41,-40,16,20,730,4751,3861},
    {-7,5,-4,-1,-9,2,-5,-1,2,-12,12,-33,34,-48,41,-40,16,20,}
}

(NOTE: in the actual file there is some extra clutter, and the array-pairs are much longer each ~140 elts). (注意:在实际文件中有一些额外的混乱,并且每个〜140埃的阵列对更长)。

I am looking to create a C# / .NET routine that allows me to extract a pair of arrays: 我想创建一个C#/ .NET例程,允许我提取一对数组:

int [] [] elev_neg20__azi_180 = ArraysForLocation( -20, 180 );

What would be my basic strategy? 我的基本策略是什么?

From my days of coding BASIC, I would read in one line at a time, looking for '// H', then extract the 2 numbers, and if they match I would process the next two lines. 从我编写BASIC的日子开始,我会一次读一行,寻找'// H',然后提取2个数字,如果匹配,我将处理接下来的两行。 But things have probably moved on since then! 但事情可能已经发生了变化!

I'm guessing that there is no shortcut to reading through the entire file... 我猜测没有阅读整个文件的快捷方式......

From my days of coding BASIC, I would read in one line at a time, looking for '// H', then extract the 2 numbers, and if they match I would process the next two lines. 从我编写BASIC的日子开始,我会一次读一行,寻找'// H',然后提取2个数字,如果匹配,我将处理接下来的两行。

Approach it the same way. 以同样的方式接近它。 Using System.IO.StreamReader you can repeatedly ReadLine until you find the desired section, read the next two lines of data, and Close . 使用System.IO.StreamReader您可以重复ReadLine直到找到所需的部分,读取接下来的两行数据,然后Close Then String.Split the comma separated values and Convert.ToInt32 . 然后String.Split以逗号分隔的值和Convert.ToInt32

And actually you probably wouldn't explicitly call Close . 实际上你可能不会明确地调用Close The StreamReader class implements IDisposable so a best practice is to wrap it in a using statement (which automatically calls Dispose which will close the stream). StreamReader类实现了IDisposable因此最佳做法是将其包装在using语句中(它会自动调用Dispose来关闭流)。

using (var reader = new StreamReader("somefile.txt"))
{
   string line = reader.ReadLine();
}

Parsing a string containing a line of your data could be done like this: 解析包含一行数据的字符串可以这样做:

string line = "{-7,5,-4,-1,-9,2,-5,-1,2,-12,12,-33,34,-48,41,-40,16,20,730,4751,3861},";

var regex = new Regex("[{},]");
int[] ints = regex.Replace(line, " ").Trim().
                   Split(new char[] { ' ' }).Select(int.Parse).ToArray();

An option for returning the arrays from a method is to use out values. 从方法返回数组的选项是使用out值。 That way your normal return value could be used to indicate success. 这样,您的正常回报值可用于表示成功。 A method signature like this: 像这样的方法签名:

public bool ArraysForLocation(int x, int y, out int[] array1, out int[] array2)

could be called like this: 可以像这样调用:

int[] a1;
int[] a2;
bool ok = ArraysForLocation(-20, 180, out a1, out a2);

I'm guessing that there is no shortcut to reading through the entire file... 我猜测没有阅读整个文件的快捷方式......

You won't read the entire file unless what you are seeking is at the tail. 除非您正在寻找的是尾部,否则您不会阅读整个文件。 You are streaming the data so only a line is read at a time. 您正在流式传输数据,因此一次只能读取一行。 Unless the file content is sorted so that you could do a binary search with FileStream.Seek , then you'll need to read through the file until you find the data you are looking for. 除非文件内容已排序,以便您可以使用FileStream.Seek进行二进制搜索,否则您需要通读该文件,直到找到所需的数据。

You can use the string.Split(Char[]) method: http://msdn.microsoft.com/en-us/library/b873y76a.aspx 您可以使用string.Split(Char [])方法: http//msdn.microsoft.com/en-us/library/b873y76a.aspx

This method returns an array of strings. 此方法返回一个字符串数组。

The char parameter is the delimiter you want to split on. char参数是要拆分的分隔符。 So you would call it once to split your long string into the two arrays you want, and then on each one split on the comma to get the respective arrays of individual values. 因此,您可以将其调用一次,将长字符串拆分为所需的两个数组,然后在逗号上拆分每个数组以获取各个值的相应数组。 After that you could convert the strings to int if needed. 之后,如果需要,您可以将字符串转换为int。

If you are doing much lookup and speed is more important than memory you might want to process the file once and put the information into a dictionary. 如果您正在进行大量查找,速度比内存更重要,您可能需要处理文件一次并将信息放入字典中。 That way lookup is very fast and you only have to read the file once. 这样查找速度非常快,您只需要读取一次文件。

Here's some code that will parse data like the example you gave: 这里有一些代码可以像你给出的例子那样解析数据:

class Program
{
    static void Main(string[] args)
    {
        string filename = "example.txt";

        Dictionary<string, int[][]> myDictionary = new Dictionary<string, int[][]>();

        BuildMyDataDictionary(filename, myDictionary);

        //lookup via key
        int x = 20;
        int y = 180;
        string key = string.Format("{0}.{1}", x, y);
        int[][] values = myDictionary[key];

        //print the values to check
        foreach (int[] array in values)
            foreach (int i in array)
                Console.Write(i + ", ");
        Console.WriteLine();

        Console.ReadKey();
    }

    private static void BuildMyDataDictionary(string filename, Dictionary<string, int[][]> myDictionary)
    {
        using (StreamReader r = new StreamReader(filename))
        {
            string line = r.ReadLine();
            // read through the file line by line and build the dictionary
            while (line != null)
            {
                Regex regx = new Regex(@"//\s*H\-(\d*)\w(\d*)");
                Match m = regx.Match(line);
                if (m.Success)
                {
                    // make a key of the two parts int 1 and int2 separated by a "."
                    string key = string.Format("{0}.{1}", m.Groups[1], m.Groups[2]);

                    // continue reading the block
                    List<int[]> intList = new List<int[]>();
                    line = r.ReadLine();
                    while (!Regex.IsMatch(line, @"^\s*\}"))
                    {
                        Regex regex = new Regex("[{},]");
                        intList.Add(regex.Replace(line, " ").Trim().Split(new char[] { ' ' }).Select(int.Parse).ToArray());
                        line = r.ReadLine();
                    }
                    myDictionary.Add(key, intList.ToArray());
                }
                line = r.ReadLine();
            }
        }
    }
}

The example file I tested with was: 我测试过的示例文件是:

{   // H-20e180a.wav 
    {-7,5,-4,-1,-9,2,-5,-1,2,-12,12,-33,34,-48,41,-40,16,20,730,4751,3861},
    {-7,5,-4,-1,-9,2,-5,-1,2,-12,12,-33,34,-48,41,-40,16,20,}
}
{   // H-21e181a.wav 
    {-7,5,-4,-1,-9,2,-5,-1,2,-12,12,-33,34,-48,41,-40,16,20,730,4751,3861},
    {-7,5,-4,-1,-9,2,-5,-1,2,-12,12,-33,34,-48,41,-40,16,20,}
    {-7,5,-4,-1,-9,2,-5,-1,2,-12,12,-33,34,-48,41,-40,16,20,730,4751,3861},
}

I borrowed the int[] parsing and creation from jltrem above. 我从上面的jltrem借用了int []解析和创建。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM