简体   繁体   English

如何使用C#检查文本文件中是否包含多行字符串?

[英]How to check whether a multi line string contains in a text file using c#?

I have a text file that contains some HTML source code. 我有一个包含一些HTML源代码的文本文件。 I want to check whether does the text file contain a "given script". 我想检查文本文件是否包含“给定脚本”。 As an example: 举个例子:

let this is the script... 让这是脚本...

_siteid = "bac29411-930d-43b2-8aab-0ec92fb7ab24";    
_subscriberId = "03aab4ac-8f05-42d6-b51b-55f7abcdc092";

function comCC24StartFunctioning(scriptSource) 
{     
    var scrDynamicHeadAttr = document.createElement('script');    
    scrDynamicHeadAttr.setAttribute('src', scriptSource);    
    scrDynamicHeadAttr.setAttribute('type', 'text/javascript');    
    scrHeadAttr = document.getElementsByTagName('head')[0];    
    scrHeadAttr.insertBefore(scrDynamicHeadAttr, scrHeadAttr.firstChild);    
}

I wouldn't say this was a job for a regular expression, the problem is either too simple, or too complex, depending on what you mean by 'contains a given script'. 我不会说这是正则表达式的工作,根据您“包含给定脚本”的含义,问题要么太简单,要么太复杂。

If you mean does it contain a verbatim character for character match you can just use String.IndexOf . 如果您的意思是它包含用于字符匹配的逐字字符,则可以只使用String.IndexOf

If the script may be formatted differently you may be able to get away with removing all white space and then doing a String.IndexOf. 如果脚本的格式可能不同,则可以删除所有空白然后执行String.IndexOf。

But if you mean a script with the same behavior, but could potentially have different structure, variable names etc then you'd need to parse the HTML and javascript and analyse the syntax tree which would be immensely complicated. 但是,如果您的意思是脚本具有相同的行为,但是可能具有不同的结构,变量名等,那么您需要解析HTML和javascript并分析语法树,这将变得非常复杂。

An alternative may be to just look for a smaller invariant part of the script, for example just search for 'comCC24StartFunctioning', again with String.IndexOf. 另一种选择是只查找脚本的较小不变部分,例如再次使用String.IndexOf搜索“ comCC24StartFunctioning”。

If I understand correctly you just want to search the whole file for the exact snippet? 如果我理解正确,那么您只想在整个文件中搜索确切的代码段? Then the following should work: 然后,以下应该工作:

string fileName = "your HTML file location";
string textToSearch = "your-script-snippet";
bool fileContainsScript = System.IO.File.ReadAllText(fileName).Contains(textToSearch);

The easiest way would be using Contains() method of String class. 最简单的方法是使用String类的Contains()方法。 If there may be some extra spaces or line breaks, you could build a regex pattern based on your string and look for a match. 如果可能有一些多余的空格或换行符,则可以根据您的字符串构建一个正则表达式模式并寻找匹配项。 To do that, you'd have to escape all characters that regex engine considers "special", such as ()[].* etc. that could appear in scripts, and replace white characters in your pattern string by \\s* 为此,您必须转义正则表达式引擎认为“特殊”的所有字符,例如()[]。*等可能出现在脚本中的字符,并用\\ s *替换模式字符串中的白色字符。

string scriptToFind = ...
string fileToSearchText = ...

string patternToFind = Regex.Replace(@patternToFind, @"(\*|\.|\\|\(|\)|\[|\]|\{|\}|\+)",@"\$1"); // those aren't all special regex characters that need to be escaped
patternToFind = Regex.Replace(@scriptToFind, @"\s+",@"\s*");

bool isMatch = Regex.IsMatch(@fileToSearchText,@patternToFind);

for testing purposes: http://gskinner.com/RegExr/ 出于测试目的: http : //gskinner.com/RegExr/

One way could be to remove the line breaks from both the script and the HTML source code. 一种方法是从脚本和HTML源代码中删除换行符。 Then you basically have two strings and need to look whether one is part of the other. 然后,您基本上有两个字符串,并且需要查看一个字符串是否为另一个字符串的一部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM