简体   繁体   中英

Extracting only tags from html text file

I'm working on a steganography method which hides text withing html tags.
for example this tag: <heEAd> I have to extract every character within the tag and then
analyze the case of the letter if it is capital then the bit is set to 1 else 0 and I also want to check the end if it sees the matching closing /head tag


here is the code :

 WebClient client = new WebClient(); String htmlCode = client.DownloadString("url"); String Tags = ""; for(int i = 0; i < htmlCode.Length; i++){ if(htmlCode[i] ='<'){ if(htmlCode[i] = '>') continue; else{ Tags += htmlCode[i]; } } } 

That logic is terrible but how do I use IndexOf and lastIndexOf to get the desired
substring I tried to use that but I'm just missing something due to the lack of my knowledge about c#

I think you need to use REGEX.

I tried to do this once with Substring and i had much job. Latter i decided to use regex and it was easier than the first one.

var regex = new Regex(@"(?<=<head>).*(?=</head>)");
return regex.Matches(strInput);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM