简体   繁体   中英

Making question marks on PDF file readable

I've parsed a webpage with a URL that's basically a page that force-downloads the PDF that is on the page. With the ignorecontenttype() method from Jsoup I managed to display a whole bunch of text but it contains question marks in black ovals like this: Here is my code:

org.jsoup.nodes.Document document1 = null;
Connection.Response downloadPopUp = Jsoup.connect("https://www.capitaliq.com/ciqdotnet/login.aspx?redirect=%2fCIQDotNet%2fFilings%2fDocumentRedirector.axd%3fversionId%3d" + ID + "%26type%3dpdf%26forcedownload%3dtrue/login.php").userAgent("Chrome/44.0.2403.125")
     .method(Connection.Method.GET)
     .timeout(1000000)
     .ignoreContentType(true)
     .execute();
document1 = Jsoup.connect("https://www.capitaliq.com/ciqdotnet/login.aspx?redirect=%2fCIQDotNet%2fFilings%2fDocumentRedirector.axd%3fversionId%3d" + ID + "%26type%3dpdf%26forcedownload%3dtrue").userAgent("Chrome/44.0.2403.125")
     .data("cookieexists", "false")
     .data("myLogin$myUsername", "MyEmail")
     .data("myLogin$myPassword", "MyPassword")
     .data("myLogin$myLoginButton.x", "22")
     .data("myLogin$myLoginButton.y", "8")
     .data("__VIEWSTATE", viewState)
     .data("__EVENTVALIDATION", eventValidation)
     .data("myLogin$myEnableAutoLogin", "on")
     .timeout(1000000)
     .cookies(downloadPopUp.cookies())

<html>
<head>
</head>
<body>

%PDF-1.3 % 1 0 obj<>endobj 2 0 obj<>endobj 3 0 obj<>stream x ctem 6۶mWR mgǶmWl vŶ m Gݧ{ }O\\ s J ƶ 1['zf D¶ ; 9 F H L$0"ba!b ! sw075s" RQT / ?"D t47 ! &gt; l 6N cE% @dbn א ' U! ̍ ͍6 j"[ o ?"#[c Bsd vBБȀ d p3 â# 8X ;:~ L l s dKdncd l t } 9 ~KX m 휈 ʋ NfN v4 fٚ|K 9 o, N 6 DN o - ! 7 pv4 1 / VG o o _q Y K _R 郹 # ʄ ۦ ӷmSs D Ė v s8 +AT ƶ6V D FY[ Q Ϫ @ V k _#K 9 C 9[Y X7 / " #H:|w b n Q

Does anyone know how to make this HTML/PDF combination readable?

Put 'Content-Type: application/pdf' in html header (before sending any data). No HTML tags at all.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM