简体   繁体   English

在C#中读取txt文件的最快方法

[英]Fastest way of reading txt files in C#

I'm working with a project and I'm a little bit confused. 我正在处理一个项目,我有点困惑。 I've got from my teacher some txt files (from his site files: wt40.txt, wt50.txt, wt100.txt). 我从我的老师那里得到了一些txt文件(来自他的网站文件:wt40.txt,wt50.txt,wt100.txt)。

Every file structure looks similiar: 每个文件结构看起来都很相似:

26    24    79    46    32    35    73    74    14    67    86    46    78    40    29    94    64    27    90    55
35    52    36    69    85    95    14    78    37    86    44    28    39    12    30    68    70     9    49    50
 1    10     9    10    10     4     3     2    10     3     7     3     1     3    10     4     7     7     4     7
 5     3     5     4     9     5     2     8    10     4     7     4     9     5     7     7     5    10     1     3
  • Every number has 6 chars, but instead of leading zeros there are spaces 每个数字都有6个字符,但不是前导零,而是有空格
  • At every line there are 20 numbers 每行有20个数字

File wt40.txt should be read as: first two lines to first List, next two lines to next List and third pair of lines to the third list. 文件wt40.txt应该读作:前两行到第一列表,接下来两行到下一列表,第三对行到第三列表。 Next lines again should be put in pairs to those Lists. 下一行应该再次与这些列表成对。

In C++ I'm doing it in this simple way: 在C ++中,我是以这种简单的方式完成的:

for(int ins=0; ins<125; ins++) //125 instances in file
{
    for(int i=0; i<N; i++)  file>>tasks[i].p; //N elements at two first lines
    for(int i=0; i<N; i++)  file>>tasks[i].w;
    for(int i=0; i<N; i++)  file>>tasks[i].d;
    tasks[i].putToLists();
}

But when I'm writing this in C# I have to open StreamReader, read every line, split it by regexp, cast them to int and add to lists. 但是当我在C#中编写它时,我必须打开StreamReader,读取每一行,用regexp拆分它们,将它们转换为int并添加到列表中。 That's a lot of loops. 这是很多循环。 I cannot read every 6 chars and add them in three loops because those text files have messed up end of lines chars - sometimes it's just '\\n' sometimes something more. 我无法读取每6个字符并将它们添加到三个循环中,因为那些文本文件已经搞乱了行字符的结尾 - 有时它只是'\\ n'有时会更多。

Isn't there any more simple way? 有没有更简单的方法?

There is essentially a 20 by n table of 6 digit(character) numbers with leading spaces. 基本上有一个20乘n的6位(字符)数字表,带有前导空格。

26    24    79    46    32    35    73    74    14    67    86    46    78    40    29    94    64    27    90    55
35    52    36    69    85    95    14    78    37    86    44    28    39    12    30    68    70     9    49    50
 1    10     9    10    10     4     3     2    10     3     7     3     1     3    10     4     7     7     4     7
 5     3     5     4     9     5     2     8    10     4     7     4     9     5     7     7     5    10     1     3

I don't understand the last sentence: 我不明白最后一句话:

File wt40.txt should be read as: first two lines to first List, next two lines to next List and third pair of lines to the third list. 文件wt40.txt应该读作:前两行到第一列表,接下来两行到下一列表,第三对行到第三列表。 Next lines again should be put in pairs to those Lists. 下一行应该再次与这些列表成对。

Say you want to get the first 6 rows and create 3 lists each with 2 rows, you do could something like: 假设您想要获得前6行并创建3个列表,每个列有2行,您可以这样做:

It is eager in that it reads everything into memory and then does its work. 它渴望将所有内容读入内存然后完成其工作。

const int maxNumberDigitLength = 6;
const int rowLengthInChars = maxNumberDigitLength * 20;
const int totalNumberOfCharsToRead = rowLengthInChars * maxNumberDigitLength;

char[] buffer = new char[totalNumberOfCharsToRead];
using (StreamReader reader = new StreamReader("wt40.txt")
{
    int numberOfCharsRead = reader.Read(buffer, 0, totalNumberOfCharsToRead);
}

// put them in your lists
IEnumerable<char> l1 = buffer.Take(rowLengthInChars);
IEnumerable<char> l2 = buffer.Skip(rowLengthInChars).Take(rowLengthInChars);
IEnumerable<char> l3 = buffer.Skip(rowLengthInChars*2).Take(rowLengthInChars);

// Get the list of strings from the list of chars using non LINQ method.
List<string> list1 = new List<string>();
int i = 0;
StringBuilder sb = new StringBuilder();
foreach(char c in l1)
{
    if(i < maxNumberDigitLength)
    {
        sb.Append(c);
        i++;
    }
    i = 0;
    list1.Add(sb.ToString());
}

// LINQ method
string s = string.Concat(l1);
List<string> list1 = Enumerable
                   .Range(0, s.Length / maxNumberDigitLength)
                   .Select(i => s.Substring(i * maxNumberDigitLength, maxNumberDigitLength))
                   .ToList();     

// Parse to ints using LINQ projection
List<int> numbers1 = list1.Select(int.Parse);
List<int> numbers2 = list2.Select(int.Parse);
List<int> numbers3 = list3.Select(int.Parse);

Isn't there any more simple way? 有没有更简单的方法?

Don't know if it's simpler but there is only one loop and a bit of LINQ : 不知道它是否更简单,但只有一个循环和一点LINQ

List<List<int>> lists = new List<List<int>>();
using (StreamReader reader = new StreamReader("wt40.txt"))
{
    string line;
    int count = 0;
    while ((line = reader.ReadLine()) != null)
    {
        List<int> currentList =
            Regex.Split(line, "\\s")
            .Where(s => !string.IsNullOrWhiteSpace(s))
            .Select(int.Parse).ToList();
        if (currentList.Count > 0) // skip empty lines
        {
            if (count % 2 == 0) // append each second list to the previous one
            {
                lists.Add(currentList);
            }
            else
            {
                lists[count / 2].AddRange(currentList);
            }
        }
        count++;
    }
}

In total you end up with 375 lists each containing 40 numbers (at least for wt40.txt input). 总共最终得到375个列表,每个列表包含40个数字(至少对于wt40.txt输入)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM