简体   繁体   中英

Generate PDF based on HTML code (iTextSharp, PDFSharp?)

Does the library PDFSharp can - like iTextSharp - generate PDF files * take into account HTML formatting * ? (bold (strong), spacing (br), etc.)

Previously I used iTextSharp and roughly handled in such a way (code below):

 string encodingMetaTag = "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />";
 string htmlCode = "text <div> <b> bold </ b> or <u> underlined </ u> <div/>";

 var sr = new StringReader (encodingMetaTag + htmlCode);
 var pdfDoc = new Document (PageSize.A4, 10f, 10f, 10f, 0f);
 var = new HTMLWorker htmlparser (pdfDoc);
 PdfWriter.GetInstance (pdfDoc, HttpContext.Current.Response.OutputStream);
 pdfDoc.Open ();
 htmlparser.Parse (sr);
 pdfDoc.Close ();

incorporated into the appropriate HTML form to a PDF document dealt with the class object HTMLWorker .. so what with PDFSharp ? Has PDFSharp similar solution ?

I know this question is old, but here's a clean way to do it...

You can use HtmlRenderer combined with PDFSharp to accomplish this:

Bitmap bitmap = new Bitmap(1200, 1800);
Graphics g = Graphics.FromImage(bitmap);
HtmlRenderer.HtmlContainer c = new HtmlRenderer.HtmlContainer();
c.SetHtml("<html><body style='font-size:20px'>Whatever</body></html>");
c.PerformPaint(g);
PdfDocument doc = new PdfDocument();
PdfPage page = new PdfPage();
XImage img = XImage.FromGdiPlusImage(bitmap);
doc.Pages.Add(page);
XGraphics xgr = XGraphics.FromPdfPage(doc.Pages[0]);
xgr.DrawImage(img, 0, 0);
doc.Save(@"C:\test.pdf");
doc.Close();
        

Some people report that the final image looks a bit blurry, apparently due to automatic anti-aliasing. Here's a post message on how to fix that: http://forum.pdfsharp.com/viewtopic.php?f=2&t=1811&start=0

不,PDFsharp 当前不包含解析 HTML 文件的代码。

Old question but none of above worked for me. Then i tried generatepdf method of HtmlRenderer in combination of pdfsharp . Hope it helps: You must install a nuget named HtmlRenderer.pdfsharp .

var doc = TheArtOfDev.HtmlRenderer.PdfSharp.PdfGenerator.GeneratePdf("Your html in a string",PageSize.A4);
  PdfPage page = new PdfPage();
  XImage img = XImage.FromGdiPlusImage(bitmap);
  doc.Pages.Add(page);
  XGraphics xgr = XGraphics.FromPdfPage(doc.Pages[0]);
  xgr.DrawImage(img, 0, 0);
  doc.Save(Server.MapPath("test.pdf"));
  doc.Close();

In a project that I developed last year I used wkhtmltopdf ( http://wkhtmltopdf.org/ ) to generate pdf from html then I read the file and get back it to the user.

It works fine for me and it could be an idea for you...

I know there is a really old question but I realize that there is no one saying actually an accurate method to render an HTML into a PDF. Based on my test I found out that you need the following code to successfully do it.

Bitmap bitmap = new Bitmap(790, 1800);
Graphics g = Graphics.FromImage(bitmap);
XGraphics xg = XGraphics.FromGraphics(g, new XSize(bitmap.Width, bitmap.Height));
TheArtOfDev.HtmlRenderer.PdfSharp.HtmlContainer c = new TheArtOfDev.HtmlRenderer.PdfSharp.HtmlContainer();
c.SetHtml("Your html in a string here");

PdfDocument pdf = new PdfDocument();
PdfPage page = new PdfPage();
XImage img = XImage.FromGdiPlusImage(bitmap);
pdf.Pages.Add(page);
XGraphics xgr = XGraphics.FromPdfPage(pdf.Pages[0]);
c.PerformLayout(xgr);
c.PerformPaint(xgr);
xgr.DrawImage(img, 0, 0);
pdf.Save("test.pdf");

There is another way to do but you might have problems with the size.

PdfDocument pdf = PdfGenerator.GeneratePdf(text, PageSize.A4);
pdf.Save("test.pdf");

Have you guys heard of this . I might be answering very late but I thought it helps. It is very simple and works well.

var htmlContent = String.Format("<body>Hello world: {0}</body>", 
        DateTime.Now);
var htmlToPdf = new NReco.PdfGenerator.HtmlToPdfConverter();
var pdfBytes = htmlToPdf.GeneratePdf(htmlContent);

Edit : I came here with the question of converting HTML code to PDF using 'PDFSharp' and found out that 'PDFSharp' cannot do it then I found out about NReco and it worked for me so I felt it might help someone just like me.

If you only want a certain HTML string written to the PDF but not the rest, you can use the HtmlContainer from TheArtOfDev HtmlRenderer . This snippet uses V 1.5.1

using PdfSharp.Pdf;
using PdfSharp;
using PdfSharp.Drawing;
using TheArtOfDev.HtmlRenderer.PdfSharp;

//create a pdf document
using (PdfDocument doc = new PdfDocument())
{
    doc.Info.Title = "StackOverflow Demo PDF";

    //add a page
    PdfPage page = doc.AddPage();
    page.Size = PageSize.A4;

    //fonts and styles
    XFont font = new XFont("Arial", 10, XFontStyle.Regular);
    XSolidBrush brush = new XSolidBrush(XColor.FromArgb(0, 0, 0));

    using (XGraphics gfx = XGraphics.FromPdfPage(page))
    {
        //write a normal string
        gfx.DrawString("A normal string written to the PDF.", font, brush, new XRect(15, 15, page.Width, page.Height), XStringFormats.TopLeft);

        //write the html string to the pdf
        using (var container = new HtmlContainer())
        {
            var pageSize = new XSize(page.Width, page.Height);

            container.Location = new XPoint(15,  45);
            container.MaxSize = pageSize;
            container.PageSize = pageSize;
            container.SetHtml("This is a <b>HTML</b> string <u>written</u> to the <font color=\"red\">PDF</font>.<br><br><a href=\"http://www.google.nl\">www.google.nl</a>");

            using (var measure = XGraphics.CreateMeasureContext(pageSize, XGraphicsUnit.Point, XPageDirection.Downwards))
            {
                container.PerformLayout(measure);
            }

            gfx.IntersectClip(new XRect(0, 0, page.Width, page.Height));

            container.PerformPaint(gfx);
        }
    }

    //write the pdf to a byte array to serve as download, attach to an email etc.
    byte[] bin;
    using (MemoryStream stream = new MemoryStream())
    {
        doc.Save(stream, false);
        bin = stream.ToArray();
    }
}

If you need simple parsing in your app and you have control over html input, you can write your own library for this.

I have created one in one of my projects, but unfortunately it cannot be shared yet due to custom features related to specific application.

Basically, you need to follow following logic to implement basic HTML to PDF:

  1. Simple HTML parsing of tags
  2. Create logic to recognize common styles ie bold, italic, left, centre etc and create PDFSharp class with these properties and assign to Para, which will be added as style attributes in HTML
  3. Handle table tags and add rows and columns in PDF
  4. Paragraph tags to add paragraphs.

I have given very broad overview of logic here based on my implementation.

You may be having much better idea :)

You can also refer : Writing content of HTML table into PDF doc using iTextSharp in asp.net

HTML Renderer for PDF using PdfSharp can generate a PDF from an HTML

  1. as an image, or
  2. as text

before inserting to the PDF.

To render as an image, please refer to the code from Diego answer.

To render as text, please refer code below:

static void Main(string[] args)
{
    string html = File.ReadAllText(@"C:\Temp\Test.html");
    PdfDocument pdf = PdfGenerator.GeneratePdf(html, PageSize.A4, 20, null, OnStylesheetLoad, OnImageLoadPdfSharp);
    pdf.Save(@"C:\Temp\Test.pdf");
}

public static void OnImageLoadPdfSharp(object sender, HtmlImageLoadEventArgs e)
{
    var imgObj = Image.FromFile(@"C:\Temp\Test.png");
    e.Callback(XImage.FromGdiPlusImage(imgObj));    
}

public static void OnStylesheetLoad(object sender, HtmlStylesheetLoadEventArgs e)
{
    e.SetStyleSheet = @"h1, h2, h3 { color: navy; font-weight:normal; }";
}

HTML code

<html>
    <head>
        <title></title>
        <link rel="Stylesheet" href="StyleSheet" />      
    </head>
    <body>
        <h1>Images
            <img src="ImageIcon" />
        </h1>
    </body>
</html>

Unfortunately, HtmlRenderer is not an appropriate library to be used in a project based on .NET 5.0:

System.IO.FileLoadException: 'Could not load file or assembly 'HtmlRenderer,
Version=1.5.0.6, Culture=neutral, PublicKeyToken=null'. The located assembly's 
manifest definition does not match the assembly reference. (0x80131040)'

Also, I found that the dependency package HtmlRender.PdfSharp has the following warning message:

Package 'HtmlRenderer.PdfSharp 1.5.0.6' was restored using 
'.NETFramework,Version=v4.6.1, .NETFramework,Version=v4.6.2, 
.NETFramework,Version=v4.7, .NETFramework,Version=v4.7.1, 
.NETFramework,Version=v4.7.2, .NETFramework,Version=v4.8' instead of the project 
target framework 'net5.0'. This package may not be fully compatible with your project.

By the way, I managed to render HTML as PDF using another library IronPDF :

License.LicenseKey = "license key";
var renderer = new ChromePdfRenderer();
PdfDocument pdf = await renderer.RenderHtmlAsPdfAsync(youtHtml);
pdf.SaveAs("your html as pdf.pdf");

The line with License.LicenseKey is not necessary and you can remove it, but your pdf will be generated with the IronPDF watermark in the end of each page. But IronPDF provides getting trial license key .

I'll recommend you NReco.PdfGenerator because have free and paid license and its easy to install from nuget.

Main page: https://www.nrecosite.com/pdf_generator_net.aspx

Documentation: https://www.nrecosite.com/doc/NReco.PdfGenerator/

If you want create PDF from html file try:

String html = File.ReadAllText("main.html");
var htmlToPdf = new NReco.PdfGenerator.HtmlToPdfConverter();
htmlToPdf.GeneratePdf(html, null, "C:/Users/Tmp/Desktop/mapa.pdf");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM