簡體   English   中英

使用Aspose.Pdf for Cloud將pdf轉換為html

[英]Convert pdf to html with Aspose.Pdf for Cloud

我在使用Aspose.Pdf-Cloud v1.0.9將pdf轉換為html時遇到問題。

碼:

public byte[] ConvertPdfToHtml(byte[] doc, string fileName)
        {
            var pdfApi = new PdfApi(ConfigurationManager.AppSettings["AsposeKey"],
                ConfigurationManager.AppSettings["AsposeSID"], ConfigurationManager.AppSettings["AsposeUrl"]);

            try
            {
                var apiResponse = pdfApi.PutConvertDocument("html", null,
                    Path.GetFileNameWithoutExtension(fileName) + ".html", doc);

                if (apiResponse != null && apiResponse.Status.Equals("Ok"))
                {
                    return apiResponse.ResponseStream;
                }

                throw new Exception("Couldn't convert pdf - " + fileName + " to HTML...");
            }
            catch (Exception ex)
            {
                NLogger.LogError("ConvertPdfToHtml - " + ex);
                throw;
            }
        }

看來,無論我上傳了什么內容(Adobe,selectPdf),我都會收到400錯誤的請求。 有人有運氣可以工作嗎?

到目前為止,Aspose.Words對於doc / docx到html的工作對我來說非常有用。

更新:登錄帳戶后,似乎在后台生成了一個錯誤:

錯誤:方法或操作未實現。方法:將文檔轉換為在線指定的格式。參數:format'html',url'',outPath'testadobe.html'

這可能是aspose sdk問題,我將嘗試與他們聯系,因為該方法在sdk上公開,並且與文檔完全符合我的要求,也需要它與pdf一起使用。

更新的代碼:

public byte[] ConvertPdfToHtml(byte[] doc, string fileName)
        {
            var pdfApi = new PdfApi(ConfigurationManager.AppSettings["AsposeKey"],
                ConfigurationManager.AppSettings["AsposeSID"], ConfigurationManager.AppSettings["AsposeUrl"]);
            var storageApi = new StorageApi(ConfigurationManager.AppSettings["AsposeKey"],
                ConfigurationManager.AppSettings["AsposeSID"], ConfigurationManager.AppSettings["AsposeUrl"]);

            try
            {
                storageApi.PutCreate(fileName, "", "", doc);

                var apiResponse = pdfApi.GetDocumentWithFormat(fileName, "html", "", "", Path.GetFileNameWithoutExtension(fileName) + ".html");

                if (apiResponse != null && apiResponse.Status.Equals("Ok"))
                {
                    var storageRes = storageApi.GetDownload(Path.GetFileNameWithoutExtension(fileName) + ".html", null, "");

                    var htmlDoc = ZipExtractor.ExtractHtmlFromZip(storageRes.ResponseStream,
                        Path.GetFileNameWithoutExtension(fileName) + ".html");

                    return htmlDoc;
                }

                throw new Exception("Couldn't convert pdf - " + fileName + " to HTML...");
            }
            catch (Exception ex)
            {
                NLogger.LogError("ConvertPdfToHtml - " + ex);
                throw;
            }
        }

后代的解壓縮功能:

public static byte[] ExtractHtmlFromZip(byte[] zipBytes, string fileName)
        {
            var zipStream = new MemoryStream(zipBytes);

            if(zipStream == null) throw new NullReferenceException("zipStream doesn't contain any bytes...");

            var archive = new ZipArchive(zipStream);

            foreach (var zipEntry in archive.Entries)
            {
                if (zipEntry.FullName == fileName)
                {
                    var fileStream = zipEntry.Open();
                    using (var ms = new MemoryStream())
                    {
                        fileStream.CopyTo(ms);
                        var bytes = ms.ToArray();
                        return bytes;
                    }
                }
                throw new FileNotFoundException("Couldn't find " + fileName + " in zip archive...");
            }

            throw new Exception("Oops... looks like this should've never been reached in ExtractHtmlFromZip");
        }

我們有兩個API將PDF文檔轉換為HTML。

  1. GET / v {版本} / pdf / {名稱}
  2. PUT / v {版本} / pdf /轉換

我建議您使用第一個。 以下cURL示例將幫助您理解API。

curl -v "http://api.aspose.cloud/v1.1/pdf/Sample.pdf?format=html&appSID=B01A15E5-1B83-4B9A-8EB3-0F2BFA6AC766&signature=hHUw2HKmLY6tQFEevDg52uOLKak" \
-X GET \
-H "Content-Type: application/json" \
-H "Accept: multipart/form-data" \
-o Sample_out.zip 

如您所見,我將輸出(-o)文件擴展名設置為.zip,而不是.html,原因是轉換后的文件包含多個文件(.html,.css,圖像文件),因此API將輸出文件。

此cURL示例使用Sample.pdf作為資源文件。

PS:我與Aspose合作擔任開發人員。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM