简体   繁体   English

基于 HTML 代码生成 PDF(iTextSharp、PDFSharp?)

[英]Generate PDF based on HTML code (iTextSharp, PDFSharp?)

Does the library PDFSharp can - like iTextSharp - generate PDF files * take into account HTML formatting * ? PDFSharp是否可以像iTextSharp一样生成 PDF 文件 *考虑 HTML 格式 * (bold (strong), spacing (br), etc.) (粗体(强)、间距(br)等)

Previously I used iTextSharp and roughly handled in such a way (code below):以前我使用iTextSharp并以这种方式粗略处理(代码如下):

 string encodingMetaTag = "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />";
 string htmlCode = "text <div> <b> bold </ b> or <u> underlined </ u> <div/>";

 var sr = new StringReader (encodingMetaTag + htmlCode);
 var pdfDoc = new Document (PageSize.A4, 10f, 10f, 10f, 0f);
 var = new HTMLWorker htmlparser (pdfDoc);
 PdfWriter.GetInstance (pdfDoc, HttpContext.Current.Response.OutputStream);
 pdfDoc.Open ();
 htmlparser.Parse (sr);
 pdfDoc.Close ();

incorporated into the appropriate HTML form to a PDF document dealt with the class object HTMLWorker .. so what with PDFSharp ?将相应的 HTML 表单合并到处理类对象HTMLWorker的 PDF 文档中。那么PDFSharp呢? Has PDFSharp similar solution ? PDFSharp 有类似的解决方案吗?

I know this question is old, but here's a clean way to do it...我知道这个问题很老,但这里有一个干净的方法来做到这一点......

You can use HtmlRenderer combined with PDFSharp to accomplish this:您可以使用HtmlRenderer联合PDFSharp做到这一点:

Bitmap bitmap = new Bitmap(1200, 1800);
Graphics g = Graphics.FromImage(bitmap);
HtmlRenderer.HtmlContainer c = new HtmlRenderer.HtmlContainer();
c.SetHtml("<html><body style='font-size:20px'>Whatever</body></html>");
c.PerformPaint(g);
PdfDocument doc = new PdfDocument();
PdfPage page = new PdfPage();
XImage img = XImage.FromGdiPlusImage(bitmap);
doc.Pages.Add(page);
XGraphics xgr = XGraphics.FromPdfPage(doc.Pages[0]);
xgr.DrawImage(img, 0, 0);
doc.Save(@"C:\test.pdf");
doc.Close();
        

Some people report that the final image looks a bit blurry, apparently due to automatic anti-aliasing.有些人报告说最终图像看起来有点模糊,显然是由于自动抗锯齿。 Here's a post message on how to fix that: http://forum.pdfsharp.com/viewtopic.php?f=2&t=1811&start=0这是有关如何解决此问题的帖子消息: http : //forum.pdfsharp.com/viewtopic.php?f=2&t=1811&start=0

不,PDFsharp 当前不包含解析 HTML 文件的代码。

Old question but none of above worked for me.老问题,但以上都不适合我。 Then i tried generatepdf method of HtmlRenderer in combination of pdfsharp .然后我结合pdfsharp尝试了HtmlRenderer 的generatepdf方法。 Hope it helps: You must install a nuget named HtmlRenderer.pdfsharp .希望它有所帮助:您必须安装一个名为HtmlRenderer.pdfsharp

var doc = TheArtOfDev.HtmlRenderer.PdfSharp.PdfGenerator.GeneratePdf("Your html in a string",PageSize.A4);
  PdfPage page = new PdfPage();
  XImage img = XImage.FromGdiPlusImage(bitmap);
  doc.Pages.Add(page);
  XGraphics xgr = XGraphics.FromPdfPage(doc.Pages[0]);
  xgr.DrawImage(img, 0, 0);
  doc.Save(Server.MapPath("test.pdf"));
  doc.Close();

In a project that I developed last year I used wkhtmltopdf ( http://wkhtmltopdf.org/ ) to generate pdf from html then I read the file and get back it to the user.在我去年开发的一个项目中,我使用 wkhtmltopdf ( http://wkhtmltopdf.org/ ) 从 html 生成 pdf 然后我读取文件并将其返回给用户。

It works fine for me and it could be an idea for you...它对我来说很好用,这对你来说可能是一个想法......

I know there is a really old question but I realize that there is no one saying actually an accurate method to render an HTML into a PDF.我知道有一个非常古老的问题,但我意识到实际上没有人说一种将 HTML 呈现为 PDF 的准确方法。 Based on my test I found out that you need the following code to successfully do it.根据我的测试,我发现您需要以下代码才能成功执行此操作。

Bitmap bitmap = new Bitmap(790, 1800);
Graphics g = Graphics.FromImage(bitmap);
XGraphics xg = XGraphics.FromGraphics(g, new XSize(bitmap.Width, bitmap.Height));
TheArtOfDev.HtmlRenderer.PdfSharp.HtmlContainer c = new TheArtOfDev.HtmlRenderer.PdfSharp.HtmlContainer();
c.SetHtml("Your html in a string here");

PdfDocument pdf = new PdfDocument();
PdfPage page = new PdfPage();
XImage img = XImage.FromGdiPlusImage(bitmap);
pdf.Pages.Add(page);
XGraphics xgr = XGraphics.FromPdfPage(pdf.Pages[0]);
c.PerformLayout(xgr);
c.PerformPaint(xgr);
xgr.DrawImage(img, 0, 0);
pdf.Save("test.pdf");

There is another way to do but you might have problems with the size.还有另一种方法,但您可能会遇到尺寸问题。

PdfDocument pdf = PdfGenerator.GeneratePdf(text, PageSize.A4);
pdf.Save("test.pdf");

Have you guys heard of this .你们听说过这个吗。 I might be answering very late but I thought it helps.我可能会很晚才回答,但我认为它有帮助。 It is very simple and works well.它非常简单并且运行良好。

var htmlContent = String.Format("<body>Hello world: {0}</body>", 
        DateTime.Now);
var htmlToPdf = new NReco.PdfGenerator.HtmlToPdfConverter();
var pdfBytes = htmlToPdf.GeneratePdf(htmlContent);

Edit : I came here with the question of converting HTML code to PDF using 'PDFSharp' and found out that 'PDFSharp' cannot do it then I found out about NReco and it worked for me so I felt it might help someone just like me.编辑:我带着使用“PDFSharp”将 HTML 代码转换为 PDF 的问题来到这里,发现“PDFSharp”无法做到这一点,然后我发现了 NReco 并且它对我有用,所以我觉得它可能会帮助像我这样的人。

If you only want a certain HTML string written to the PDF but not the rest, you can use the HtmlContainer from TheArtOfDev HtmlRenderer .如果您只想将某个 HTML 字符串写入 PDF 而不是其他字符串,您可以使用TheArtOfDev HtmlRenderer 中HtmlContainer This snippet uses V 1.5.1此代码段使用 V 1.5.1

using PdfSharp.Pdf;
using PdfSharp;
using PdfSharp.Drawing;
using TheArtOfDev.HtmlRenderer.PdfSharp;

//create a pdf document
using (PdfDocument doc = new PdfDocument())
{
    doc.Info.Title = "StackOverflow Demo PDF";

    //add a page
    PdfPage page = doc.AddPage();
    page.Size = PageSize.A4;

    //fonts and styles
    XFont font = new XFont("Arial", 10, XFontStyle.Regular);
    XSolidBrush brush = new XSolidBrush(XColor.FromArgb(0, 0, 0));

    using (XGraphics gfx = XGraphics.FromPdfPage(page))
    {
        //write a normal string
        gfx.DrawString("A normal string written to the PDF.", font, brush, new XRect(15, 15, page.Width, page.Height), XStringFormats.TopLeft);

        //write the html string to the pdf
        using (var container = new HtmlContainer())
        {
            var pageSize = new XSize(page.Width, page.Height);

            container.Location = new XPoint(15,  45);
            container.MaxSize = pageSize;
            container.PageSize = pageSize;
            container.SetHtml("This is a <b>HTML</b> string <u>written</u> to the <font color=\"red\">PDF</font>.<br><br><a href=\"http://www.google.nl\">www.google.nl</a>");

            using (var measure = XGraphics.CreateMeasureContext(pageSize, XGraphicsUnit.Point, XPageDirection.Downwards))
            {
                container.PerformLayout(measure);
            }

            gfx.IntersectClip(new XRect(0, 0, page.Width, page.Height));

            container.PerformPaint(gfx);
        }
    }

    //write the pdf to a byte array to serve as download, attach to an email etc.
    byte[] bin;
    using (MemoryStream stream = new MemoryStream())
    {
        doc.Save(stream, false);
        bin = stream.ToArray();
    }
}

If you need simple parsing in your app and you have control over html input, you can write your own library for this.如果您需要在您的应用程序中进行简单的解析并且您可以控制 html 输入,您可以为此编写自己的库。

I have created one in one of my projects, but unfortunately it cannot be shared yet due to custom features related to specific application.我在我的一个项目中创建了一个,但不幸的是,由于与特定应用程序相关的自定义功能,它还无法共享。

Basically, you need to follow following logic to implement basic HTML to PDF:基本上,您需要遵循以下逻辑来实现基本的 HTML 到 PDF:

  1. Simple HTML parsing of tags标签的简单 HTML 解析
  2. Create logic to recognize common styles ie bold, italic, left, centre etc and create PDFSharp class with these properties and assign to Para, which will be added as style attributes in HTML创建逻辑以识别常见样式,即粗体、斜体、左、中心等,并使用这些属性创建 PDFSharp 类并分配给 Para,这些属性将作为样式属性添加到 HTML 中
  3. Handle table tags and add rows and columns in PDF处理表格标签并在 PDF 中添加行和列
  4. Paragraph tags to add paragraphs.用于添加段落的段落标签。

I have given very broad overview of logic here based on my implementation.我根据我的实现在这里给出了非常广泛的逻辑概述。

You may be having much better idea :)你可能有更好的主意:)

You can also refer : Writing content of HTML table into PDF doc using iTextSharp in asp.net您还可以参考: 在 asp.net 中使用 iTextSharp 将 HTML 表格的内容写入 PDF 文档

HTML Renderer for PDF using PdfSharp can generate a PDF from an HTML HTML Renderer for PDF 使用 PdfSharp可以从 HTML 生成 PDF

  1. as an image, or作为图像,或
  2. as text作为文本

before inserting to the PDF.在插入到 PDF 之前。

To render as an image, please refer to the code from Diego answer.要呈现为图像,请参阅 Diego 回答中的代码。

To render as text, please refer code below:要呈现为文本,请参考以下代码:

static void Main(string[] args)
{
    string html = File.ReadAllText(@"C:\Temp\Test.html");
    PdfDocument pdf = PdfGenerator.GeneratePdf(html, PageSize.A4, 20, null, OnStylesheetLoad, OnImageLoadPdfSharp);
    pdf.Save(@"C:\Temp\Test.pdf");
}

public static void OnImageLoadPdfSharp(object sender, HtmlImageLoadEventArgs e)
{
    var imgObj = Image.FromFile(@"C:\Temp\Test.png");
    e.Callback(XImage.FromGdiPlusImage(imgObj));    
}

public static void OnStylesheetLoad(object sender, HtmlStylesheetLoadEventArgs e)
{
    e.SetStyleSheet = @"h1, h2, h3 { color: navy; font-weight:normal; }";
}

HTML code HTML代码

<html>
    <head>
        <title></title>
        <link rel="Stylesheet" href="StyleSheet" />      
    </head>
    <body>
        <h1>Images
            <img src="ImageIcon" />
        </h1>
    </body>
</html>

Unfortunately, HtmlRenderer is not an appropriate library to be used in a project based on .NET 5.0:不幸的是,HtmlRenderer 不适合在基于 .NET 5.0 的项目中使用:

System.IO.FileLoadException: 'Could not load file or assembly 'HtmlRenderer,
Version=1.5.0.6, Culture=neutral, PublicKeyToken=null'. The located assembly's 
manifest definition does not match the assembly reference. (0x80131040)'

Also, I found that the dependency package HtmlRender.PdfSharp has the following warning message:另外,我发现依赖包 HtmlRender.PdfSharp 有以下警告信息:

Package 'HtmlRenderer.PdfSharp 1.5.0.6' was restored using 
'.NETFramework,Version=v4.6.1, .NETFramework,Version=v4.6.2, 
.NETFramework,Version=v4.7, .NETFramework,Version=v4.7.1, 
.NETFramework,Version=v4.7.2, .NETFramework,Version=v4.8' instead of the project 
target framework 'net5.0'. This package may not be fully compatible with your project.

By the way, I managed to render HTML as PDF using another library IronPDF :顺便说一句,我设法使用另一个库IronPDF将 HTML 呈现为 PDF:

License.LicenseKey = "license key";
var renderer = new ChromePdfRenderer();
PdfDocument pdf = await renderer.RenderHtmlAsPdfAsync(youtHtml);
pdf.SaveAs("your html as pdf.pdf");

The line with License.LicenseKey is not necessary and you can remove it, but your pdf will be generated with the IronPDF watermark in the end of each page. License.LicenseKey行不是必需的,您可以将其删除,但您的 pdf 将在每页末尾生成 IronPDF 水印。 But IronPDF provides getting trial license key .但 IronPDF 提供了获取试用许可证密钥

I'll recommend you NReco.PdfGenerator because have free and paid license and its easy to install from nuget.我会向你推荐NReco.PdfGenerator,因为它有免费和付费的许可证,而且很容易从 nuget 安装。

Main page: https://www.nrecosite.com/pdf_generator_net.aspx主页: https : //www.nrecosite.com/pdf_generator_net.aspx

Documentation: https://www.nrecosite.com/doc/NReco.PdfGenerator/文档: https : //www.nrecosite.com/doc/NReco.PdfGenerator/

If you want create PDF from html file try:如果您想从 html 文件创建 PDF,请尝试:

String html = File.ReadAllText("main.html");
var htmlToPdf = new NReco.PdfGenerator.HtmlToPdfConverter();
htmlToPdf.GeneratePdf(html, null, "C:/Users/Tmp/Desktop/mapa.pdf");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM