简体   繁体   English

从源代码中剥离html标记

[英]Stripping html tags from source code

HTML = EntityUtils.toString(response.getEntity());
ResponseHandler<String> responseHandler = new BasicResponseHandler();
String ResponseBody = httpclient.execute(httppost, responseHandler);
table = ResponseBody.substring(ResponseBody.indexOf("<table border=\"1\" cellpadding=\"0\" width=\"100%\" cellspacing=\"0\">"));
table = table.substring(0, table.indexOf("</table>"));  

String htmlString = table;
String noHTMLString = htmlString.replaceAll("\\<.*?\\>", "");
noHTMLString = noHTMLString.replaceAll("\r", "<br/>");
noHTMLString = noHTMLString.replaceAll("\n", " ");
noHTMLString = noHTMLString.replaceAll("\'", "&#39;");
noHTMLString = noHTMLString.replaceAll("\"", "&quot;");

TextView WORK = (TextView) findViewById(R.id.HTML);
WORK.setText(htmlString); 

I am using regular expressions to extract the HTML code. 我正在使用正则表达式来提取HTML代码。 This is my code. 这是我的代码。 It seems correct but the table(substring) is what is being returned not the extracted text. 看起来是正确的但是表(substring)是返回的而不是提取的文本。 Does anybody know why??? 有谁知道为什么???

You have to use the new String object as source for your TextView. 您必须使用新的String对象作为TextView的源。 Change this: 改变这个:

WORK.setText(htmlString);

to the following: 以下内容:

WORK.setText(noHTMLString);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM