简体   繁体   English

C#比较两个文件的正则表达式问题

[英]C# comparing two files regex problem

what I'm trying to do is open a huge list of files (about 40k records, and match them on a line in a file that contains 2 millions records. And if my line from file A matches a line in file B write out that line. 我想做的是打开一个巨大的文件列表(大约4万条记录,并将它们与包含200万条记录的文件中的一行匹配。如果文件A中的行与文件B中的一行匹配,则写出线。

File A contains a bunch of files without extensions and file B contains full file paths including extensions. 文件A包含一堆不带扩展名的文件,文件B包含带扩展名的完整文件路径。

i'm using this but i cant get it to go... 我正在使用它,但我无法打开它...

string alphaFilePath = (@"C:\\Documents and Settings\\g\\Desktop\\Arrp\\Find\\natst_ready.txt"); 字符串alphaFilePath =(@“ C:\\ Documents and Settings \\ g \\ Desktop \\ Arrp \\ Find \\ natst_ready.txt”);

            List<string> alphaFileContent = new List<string>();

            using (FileStream fs = new FileStream(alphaFilePath, FileMode.Open))
            using (StreamReader rdr = new StreamReader(fs))
            {
                while (!rdr.EndOfStream)
                {
                    alphaFileContent.Add(rdr.ReadLine());
                }
            }

            string betaFilePath = @"C:\Documents and Settings\g\Desktop\Arryup\Find\eble.txt";

            StringBuilder sb = new StringBuilder();

            using (FileStream fs = new FileStream(betaFilePath, FileMode.Open))
            using (StreamReader rdr = new StreamReader(fs))
            {
                while (!rdr.EndOfStream)
                {
                    string betaFileLine = rdr.ReadLine();
string matchup = Regex.Match(alphaFileContent, @"(\\)(\\)(\\)(\\)(\\)(\\)(\\)(\\)(.*)(\.)").Groups[9].Value;
                    if (alphaFileContent.Equals(matchup))
                    {
                        File.AppendAllText(@"C:\array_tech.txt", betaFileLine);
                    }
                }
            }

This doesnt work because the alphafilecontent is a single line only and i'm having a hard time figuring out how to get my regex to work on the file that contains all the file paths (Betafilepath) 这是行不通的,因为alphafilecontent仅是一行,我很难弄清楚如何让我的正则表达式在包含所有文件路径(Betafilepath)的文件上工作

here is a sample of the beta file path. 这是beta文件路径的示例。

C:\\arres_i\\Grn\\Ora\\SEC\\DBZ_EX1\\Nes\\001\\DZO-EX00001.txt C:\\ arres_i \\ GRN \\奥拉\\ SEC \\ DBZ_EX1 \\ Nes的\\ 001 \\ DZO-EX00001.txt

Here is the line i'm trying to compare from my alpha DZO-EX00001 这是我要与我的alpha DZO-EX00001比较的行

Use System.IO.Path.GetFileNameWithoutExtension instead of a regular expression. 使用System.IO.Path.GetFileNameWithoutExtension而不是正则表达式。

    static void Compare(string alpha, string beta)
    {
        HashSet<string> alphaContent = new HashSet<string>();
        StreamReader reader = new StreamReader(alpha);
        while (!reader.EndOfStream)
            alphaContent.Add(reader.ReadLine());
        reader.Close();

        reader = new StreamReader(beta);
        while (!reader.EndOfStream)
        {
            string fullpath = reader.ReadLine();
            string filename = Path.GetFileNameWithoutExtension(fullpath);
            if (alphaContent.Contains(filename))
            {
                File.AppendAllText(@"C:\array_tech.txt", fullpath);
            }
        }
        reader.Close();
    }

So, you read in all the lines in beta, and have the whole thing stored in a string, beta . 因此,您阅读了beta中的所有行,并将整个内容存储在字符串beta

Next, you read a line from alpha and have DZO-EX00001 stored in a string, alpha . 接下来,您从alpha读取一行,并将DZO-EX00001存储在字符串alpha

var pattern = @"^.*" + alpha + ".*$";
var match = Regex.Match(beta, pattern, RegexOptions.Multiline);

if(match.Success)
{
   string filepath = match.Value;
   // do stuff
}

You need to load all of the lines from beta first. 您需要首先从beta加载所有行。 Then you can check each line from alpha against it. 然后,您可以针对它检查alpha中的每一行。

You must specify RegexOptions.Multiline to check against all the lines in beta (so that the ^ and $ will match at each line instead of at the beginning and end of the whole string. 您必须指定RegexOptions.Multiline来检查beta中的所有行(这样^$将在每行匹配,而不是在整个字符串的开头和结尾匹配。

The pattern can be expanded if you need to be more specific; 如果需要更具体,可以扩展该模式。 as is, it just gets the first line that contains the filename. 照原样,它只获取包含文件名的第一行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM