简体   繁体   中英

How to check whether a multi line string contains in a text file using c#?

I have a text file that contains some HTML source code. I want to check whether does the text file contain a "given script". As an example:

let this is the script...

_siteid = "bac29411-930d-43b2-8aab-0ec92fb7ab24";    
_subscriberId = "03aab4ac-8f05-42d6-b51b-55f7abcdc092";

function comCC24StartFunctioning(scriptSource) 
{     
    var scrDynamicHeadAttr = document.createElement('script');    
    scrDynamicHeadAttr.setAttribute('src', scriptSource);    
    scrDynamicHeadAttr.setAttribute('type', 'text/javascript');    
    scrHeadAttr = document.getElementsByTagName('head')[0];    
    scrHeadAttr.insertBefore(scrDynamicHeadAttr, scrHeadAttr.firstChild);    
}

I wouldn't say this was a job for a regular expression, the problem is either too simple, or too complex, depending on what you mean by 'contains a given script'.

If you mean does it contain a verbatim character for character match you can just use String.IndexOf .

If the script may be formatted differently you may be able to get away with removing all white space and then doing a String.IndexOf.

But if you mean a script with the same behavior, but could potentially have different structure, variable names etc then you'd need to parse the HTML and javascript and analyse the syntax tree which would be immensely complicated.

An alternative may be to just look for a smaller invariant part of the script, for example just search for 'comCC24StartFunctioning', again with String.IndexOf.

If I understand correctly you just want to search the whole file for the exact snippet? Then the following should work:

string fileName = "your HTML file location";
string textToSearch = "your-script-snippet";
bool fileContainsScript = System.IO.File.ReadAllText(fileName).Contains(textToSearch);

The easiest way would be using Contains() method of String class. If there may be some extra spaces or line breaks, you could build a regex pattern based on your string and look for a match. To do that, you'd have to escape all characters that regex engine considers "special", such as ()[].* etc. that could appear in scripts, and replace white characters in your pattern string by \\s*

string scriptToFind = ...
string fileToSearchText = ...

string patternToFind = Regex.Replace(@patternToFind, @"(\*|\.|\\|\(|\)|\[|\]|\{|\}|\+)",@"\$1"); // those aren't all special regex characters that need to be escaped
patternToFind = Regex.Replace(@scriptToFind, @"\s+",@"\s*");

bool isMatch = Regex.IsMatch(@fileToSearchText,@patternToFind);

for testing purposes: http://gskinner.com/RegExr/

One way could be to remove the line breaks from both the script and the HTML source code. Then you basically have two strings and need to look whether one is part of the other.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM