简体   繁体   English

读取文件中的所有行并拆分为多个字符串 c#

[英]Reading all lines in a file and splitting on multiple strings c#

I am attempting to read all files in a directory and write text to an external file depending on a specific string in the files contained in the directory.我试图读取目录中的所有文件并将文本写入外部文件,具体取决于目录中包含的文件中的特定字符串。

foreach (string line in File.ReadAllLines(pendingFile).Where(line => line.Split(';').Last().Contains("Test1")))
                        {
                            File.AppendAllText(path, line + Environment.NewLine);
                        }

How do I specify multiple strings here?如何在此处指定多个字符串? like so "Test1", "Test2", "Test3"?像这样“Test1”,“Test2”,“Test3”?

foreach (string line in File.ReadAllLines(pendingFile).Where(line => line.Split(';').Last().Contains("Test1", "Test2", "Test3")))

You "do it the other way round";你“反过来做”; you don't ask "does this last bit of the line contain any of these strings", you ask "are any of these strings contained in the last bit of the line"您不会问“该行的最后一位是否包含这些字符串中的任何一个”,而是问“该行的最后一位是否包含这些字符串中的任何一个”

var interestrings = new []{"Test1", "Test2", "Test3"};

File.ReadAllLines(pendingFile)
    .Where(line => 
        interestrings.Any(interestring => 
            line.Split(';').Last().Contains(interestring)
        )
    )

It's probably worth pointing out your code would be a lot more readable if you didn't try and do it all in the for header:可能值得指出的是,如果您不尝试在for标头中执行所有操作,您的代码将更具可读性:

var interestrings = new []{"Test1", "Test2", "Test3"};
foreach (string line in File.ReadAllLines(pendingFile))
{

    var lastOne = line.Split(';').Last();
    if(!interestrings.Any(interestring => lastOne.Contains(interestring))
        continue;

    File.AppendAllText(path, line + Environment.NewLine);
}

It won't perform significantly differently, because LINQ will (behind the scenes) be enumerating all the lines, but skipping those where the condition doesn't match and only giving you those that does - this loop essentially does the same thing without the chained enumeration它的表现不会有显着不同,因为 LINQ 将(在幕后)枚举所有行,但跳过那些条件不匹配的行,只给你那些匹配的行——这个循环本质上做同样的事情,没有链接枚举

You could get some useful performance boost by not using Split (use a substring from the last index of ';' ) and also consider collecting your strings into a stringbuilder rather than repeatedly appending them to a file.通过不使用Split (使用';'的最后一个索引中的子字符串),您可以获得一些有用的性能提升,并且还可以考虑将您的字符串收集到 stringbuilder 中,而不是重复将它们附加到文件中。 Also if you use File.ReadLines rather than ReadAllLines , you'll incrementally read the file rather than buffering it all into memory:此外,如果您使用File.ReadLines而不是ReadAllLines ,您将逐步读取文件而不是将其全部缓冲到内存中:

var sb = new StringBuilder(10000); //

var interestrings = new []{"Test1", "Test2", "Test3"};
foreach (string line in File.ReadLines(pendingFile))
{
    var lastOne = line;

    var idx = line.LastIndexOf(';');
    if(idx == -1)
        lastOne = line.Substring(idx);

    if(!interestrings.Any(interestring => lastOne.Contains(interestring))
        continue;

    sb.AppendLine(line);
}

File.AppendAllText(path, sb.ToString());

If the file is huge, consider opening a stream and writing it line by line too, rather than buffering much of it into a stringbuilder如果文件很大,请考虑打开一个流并逐行写入,而不是将其中的大部分缓冲到 stringbuilder 中

use regular expression instead:改用正则表达式:

.Where(line => Regex.IsMatch(line, @"Test\d+$")) 

(haven't tested this exact piece of code, just giving an idea) (没有测试过这段确切的代码,只是给出了一个想法)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM