简体   繁体   English

如何复制文本文件的特定部分

[英]How to copy a specific part of a text file

My goal is to copy specific contents of .txt files into 1 big text file. 我的目标是将.txt文件的特定内容复制到1个大文本文件中。 I've searched through the website and found a way to merge my files. 我搜索了该网站,发现了一种合并文件的方法。

using (var output = File.Create("output"))
{
    foreach (var file in new[] { "file1", "file2" })
    {
        using (var input = File.OpenRead(file))
        {
            input.CopyTo(output);
        }
    }
}

This answer was posted by: n8wrl 该答案发布者:n8wrl

The structure of my text looks like this: 我的文字结构如下:

... ...

Sentence A 句子A

Important stuff 重要的东西

Sentence B 句子B

... ...

So I would need a way to search for "Sentence A" and "Sentence B" in the document and copy the lines between these two. 因此,我需要一种方法来在文档中搜索"Sentence A""Sentence B"并复制这两者之间的行。

Thanks for your help! 谢谢你的帮助!

Consider the options from this post: Fastest way to search string in large text file to locate what you want for the start and end sentence, use those positions (start of first and end of second) in a substring. 请考虑以下文章中的选项: 在大文本文件中搜索字符串以找到所需的开始和结束句子的最快方法,请在子字符串中使用这些位置(第一位的开始和第二位的结束)。

Make sure you test for the situations where the second sentence appears before the first, where it appears twice (do you want the text between the first sentence and the second occurrence of the second sentence?) and where there is no second sentence. 确保测试以下情况:第二句出现在第一句之前,第二句出现两次(是否要在第一句和第二句第二次出现之间的文本?)和第二句没有第二句。 Then consider similar scenarios for the first sentence (eg if it appears after the second sentence, if it appears more than once, and if it doesn't appear at all while the second sentence is present). 然后考虑第一个句子的类似情况(例如,如果它出现在第二个句子之后,是否出现多次,以及在出现第二个句子时根本没有出现)。

Assuming that both "Sentence A" and "Sentence B" are in whole lines you can try a simple Linq . 假设"Sentence A""Sentence B"都位于整行中 ,则可以尝试使用简单的Linq Let's extract the "important stuff" first: 让我们首先提取“重要的东西”:

private static IEnumerable<string> Staff(string file) 
{
    return File
        .ReadLines(file)
        .SkipWhile(line => line != "Sentence A")  // Skip until Sentence A found 
        .Skip(1)                                  // Skip Sentence A itself
        .TakeWhile(line => line != "Sentence B"); // Take until Sentence B found 
}

Then combine all the files into one: 然后将所有文件合并为一个:

string[] files = new[] 
{
    "file1", "file2", "file3"
};

var extracts = files.SelectMany(file => Staff(file));

finally, let's write all the extracts into the file: 最后,让我们将所有extracts写入文件:

File.WriteAllLines("output", extracts);

Edit: In case you have merged files ("important stuff" can appear several times) we have to implement FSM (Finite State Machine): 编辑:如果您合并了文件(“重要的东西”可能出现几次 ),我们必须实现FSM (有限状态机):

private static IEnumerable<string> Staff(string file) 
{
    bool important = false;

    foreach (string line in file.ReadLines(file)) 
    {
        if (important) 
            if (line == "Sentence B")
                important = false;
            else
                yield return line;
        else 
            important = line == "Sentence B"; 
    }
} 

Please, notice that we have to scan the entire file, that's why we should avoid file merging. 请注意,我们必须扫描整个文件,这就是为什么我们应避免文件合并。

You need something like this: 您需要这样的东西:

var sentenceA = "Sentence A";
var sentenceB = "Sentence B";
using (var output = System.IO.File.Create("output"))
{
    foreach (var file in new[] { "file1", "file2" })
    {
        using (var input = File.OpenRead(file))
        {
            var reader = new System.IO.StreamReader(input);
            var text = reader.ReadToEnd().Split(new string[] { Environment.NewLine }, StringSplitOptions.None).ToString();
            if (text.Contains(sentenceA) && text.Contains(sentenceB)) {
                output.Write(text.Substring(text.IndexOf(sentenceA), text.IndexOf(sentenceB) + sentenceB.Length));
            }
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM