I need to remove HTTP headers from the parsed web pages in Java.
HTTP/1.1 404 Not Found
Date: Wed, 28 Oct 2009 14:10:05 GMT
Server: Apache/2.2.11 (Unix) mod_ssl/2.2.11 OpenSSL/0.9.8i DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
Last-Modified: Tue, 02 Jun 2009 17:40:52 GMT
ETag: "18ac11-d16-46b610b465100"
Accept-Ranges: bytes
Content-Length: 3350
Connection: close
Content-Type: text/html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head profile="http://gmpg.org/xfn/11">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
As shown above, first few lines are http headers. I need to get rid of them to process parsed pages, but then, I'm not sure on how to do it since headers vary in length and in contents.
Could anyone please help me with this?
您可以简单地获取例如<html
索引并对该字符串进行子字符串化。
text.substring(text.indexOf("<html"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.