简体   繁体   English

c# 使用正则表达式将多个.txt 文件合并为一个.txt 文件

[英]c# merging multiple .txt files into one .txt file with regex

I have a huge problem.我有一个大问题。 I need to merge two.txt files into third one, each line by line.我需要将 two.txt 文件逐行合并到第三个文件中。 I need before that to each line trim end of first file last two characters and to trim first two characters of second file.在此之前,我需要修剪第一个文件最后两个字符的每一行结尾,并修剪第二个文件的前两个字符。 Before merging I have to find matches on the end of first file with beginning of second file.在合并之前,我必须在第一个文件的末尾和第二个文件的开头找到匹配项。 Namely, first file should be left part of sentence, second file right part of sentence.即,第一个文件应该是句子的左边部分,第二个文件应该是句子的右边部分。 Example:例子:

file 1 content (first line):文件1内容(第一行):

A very long ago, our Milky Way had a truly eventful life: between很久以前,我们的银河系过着真正多事的生活:介于

file 2 content (first line):文件2内容(第一行):

eventful life: between about 13 and 8 billion years ago, it lived hard and fast, merging with other galaxies and consuming a lot of hydrogen to form stars.多事的生命:大约在 13 到 80 亿年前,它生活得艰难而快速,与其他星系合并并消耗大量氢形成恒星。

file 3 content should be (first line):文件 3 的内容应该是(第一行):

A very long ago, our Milky Way had a truly eventful life: between about 13 and 8 billion years ago, it lived hard and fast, merging with other galaxies and consuming a lot of hydrogen to form stars.很久以前,我们的银河系经历了一段真正多事的生活:大约在 13 到 80 亿年前,它的生活艰难而快速,与其他星系合并并消耗大量氢形成恒星。

So, in short, I have to in this example of text to trim "eventful life: between " from second file and to trim last two of first file and last two of second and to finally merge texts from first and second file into third.因此,简而言之,我必须在这个文本示例中从第二个文件中修剪“多事的生活:之间”,并修剪第一个文件的最后两个文件和第二个文件的最后两个文件,最后将第一个文件和第二个文件中的文本合并到第三个文件中。 Thanks in advance!提前致谢!

namespace ConsoleApplication1

class Program
    {
        
static void Main()
        {
            
            
                
                string[] readleft = File.ReadAllLines(@"C:\Users\J\Desktop\files\left.txt");
                string[] readright = File.ReadAllLines(@"C:\Users\J\Desktop\files\right.txt");
                using (StreamReader swo = new StreamReader(@"C:\Users\J\Desktop\files\left.txt"))
                {
                    //left file is first, second is right and third is output
                    using (StreamReader swot = new StreamReader(@"C:\Users\J\Desktop\files\right.txt"))
                    { 
                        for (int x = 0; x < readleft.Length || x < readright.Length; x++)
                        {
                            Console.WriteLine("{0}{1}",swo.ReadLine(),swot.ReadLine());
                            Match m = Regex.Match(swo.ToString(), swot.ToString());
                            if(m.Success)
                                Console.WriteLine("Found '{0}' at position '{1}'", m.Value.ToString(), m.Index);
                        }
                    }
                }
}
    }

BTW, why does it showing me "System.IO.StreamReader" instead of matching?顺便说一句,为什么它显示我“System.IO.StreamReader”而不是匹配?

Here's a simple way that it's easy to understand.这里有一个简单易懂的方法。

var left = "A very long ago, our Milky Way had a truly eventful life: between";
var right = "eventful life: between about 13 and 8 billion years ago, it lived hard and fast, merging with other galaxies and consuming a lot of hydrogen to form stars.";

int common = 0;
for(int i = 1; i < Math.Min(left.Length, right.Length); i++)//Please note that a binary search would be much faster
{
    var partToCheck = right.Substring(0,i);
    if(left.EndsWith(partToCheck))
    {
        common = i;
        Console.WriteLine($"Common part: '{right.Substring(0,i)}'");
        break;
    }
}
var merged = left + right.Substring(common);

This prints这打印

Common part: 'eventful life: between'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM