简体   繁体   English

仅从html文本文件中提取标签

[英]Extracting only tags from html text file

I'm working on a steganography method which hides text withing html tags. 我正在研究一种隐藏带有html标签的文本的隐写术方法。
for example this tag: <heEAd> I have to extract every character within the tag and then 例如这个标签: <heEAd>我必须提取标签中的每个字符然后
analyze the case of the letter if it is capital then the bit is set to 1 else 0 and I also want to check the end if it sees the matching closing /head tag 分析字母的大小写,如果它是大写,那么该位设置为1,否则我也想检查结果是否看到匹配的结束/头标记


here is the code : 这是代码:

 WebClient client = new WebClient(); String htmlCode = client.DownloadString("url"); String Tags = ""; for(int i = 0; i < htmlCode.Length; i++){ if(htmlCode[i] ='<'){ if(htmlCode[i] = '>') continue; else{ Tags += htmlCode[i]; } } } 

That logic is terrible but how do I use IndexOf and lastIndexOf to get the desired 这个逻辑很糟糕,但我如何使用IndexOflastIndexOf来获得所需的
substring I tried to use that but I'm just missing something due to the lack of my knowledge about c# substring我试着用它,但由于缺乏对c#的了解,我只是遗漏了一些东西

I think you need to use REGEX. 我认为你需要使用REGEX。

I tried to do this once with Substring and i had much job. 我尝试用Substring做一次,我有很多工作。 Latter i decided to use regex and it was easier than the first one. 后来我决定使用正则表达式,它比第一个更容易。

var regex = new Regex(@"(?<=<head>).*(?=</head>)");
return regex.Matches(strInput);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM