简体   繁体   中英

Remove Javascript from PDF document using .NET Core

I have PDFs sent in from an external source that I want users to be able to view via a web service.

The PDFs are retrieved via a .NET Core service that gets them from the DB and outputs them as PDF files.

The problem is that malicious users can put JS in PDFs. Because they appear to the browser to some from the same origin the JS can execute XSS attacks on the rest of the application.

I don't need to retain any of the JS functionality, but I also want to keep as much of the PDFs as unchanged as possible.

Is there a way, using .NET Core, to strip JS out of PDFs and leave them otherwise unchanged?

Alternatively is there any way to specify not to execute any JS when opening PDF files embedded in webpages (for instance using <iframe src="file.pdf" or <object type="application/pdf" data="file.pdf" ). I can't rely on users having additional PDF extensions, it would need to work with the vanilla browser.

To remove all the Javascript from PDF you could start from removing all shared JavaScript. This is a special document-level collection of scripts. It is often used to define JavaScript functions available for other scripts in the document.

Then you could find all actions in the document and check type of each of the actions. For Javascript actions you could replace associated code with an empty string.

This task is definitely not an easy one. I recommend you to use a PDF library for this.

My company develops Docotic.Pdf library that can be used in .NET Standard / .NET Core and can help with your task.

The code below shows how to remove the JavaScript code from a PDF file using XFINIUM.PDF library:

public void RemoveDocumentJavascript(Stream inputStream, Stream outputStream)
{
    PdfFixedDocument doc = new PdfFixedDocument(inputStream);
    // Remove document level JS code
    doc.JavaScriptBlocks.Clear();

    RemoveDocumentActions(doc);

    // Remove JavaScript from annotations.
    for (int i = 0; i < doc.Pages.Count; i++)
    {
        for (int j = 0; j < doc.Pages[i].Annotations.Count; j++)
        {
            RemoveAnnotationActions(doc.Pages[i].Annotations[j]);
        }
    }

    // Remove Javascript from fields
    for (int i = 0; i < doc.Form.Fields.Count; i++)
    {
        RemoveFieldActions(doc.Form.Fields[i]);
    }

    doc.Save(outputStream);
}

private void RemoveDocumentActions(PdfFixedDocument doc)
{
    if (doc.OpenAction is PdfJavaScriptAction)
    {
        doc.OpenAction = null;
    }
    if (doc.BeforeCloseAction is PdfJavaScriptAction)
    {
        doc.BeforeCloseAction = null;
    }
    if (doc.BeforeSaveAction is PdfJavaScriptAction)
    {
        doc.BeforeSaveAction = null;
    }
    if (doc.AfterSaveAction is PdfJavaScriptAction)
    {
        doc.AfterSaveAction = null;
    }
    if (doc.BeforeSaveAction is PdfJavaScriptAction)
    {
        doc.BeforeSaveAction = null;
    }
    if (doc.AfterSaveAction is PdfJavaScriptAction)
    {
        doc.AfterSaveAction = null;
    }
    if (doc.BeforePrintAction is PdfJavaScriptAction)
    {
        doc.BeforePrintAction = null;
    }
    if (doc.AfterPrintAction is PdfJavaScriptAction)
    {
        doc.AfterPrintAction = null;
    }
}

private void RemoveAnnotationActions(PdfAnnotation annotation)
{
    if (annotation.PageOpen is PdfJavaScriptAction)
    {
        annotation.PageOpen = null;
    }
    if (annotation.PageClose is PdfJavaScriptAction)
    {
        annotation.PageClose = null;
    }
    if (annotation.PageVisible is PdfJavaScriptAction)
    {
        annotation.PageVisible = null;
    }
    if (annotation.PageInvisible is PdfJavaScriptAction)
    {
        annotation.PageInvisible = null;
    }
    if (annotation.MouseDown is PdfJavaScriptAction)
    {
        annotation.MouseDown = null;
    }
    if (annotation.MouseUp is PdfJavaScriptAction)
    {
        annotation.MouseUp = null;
    }
    if (annotation.MouseEnter is PdfJavaScriptAction)
    {
        annotation.MouseEnter = null;
    }
    if (annotation.MouseLeave is PdfJavaScriptAction)
    {
        annotation.MouseLeave = null;
    }
    PdfLinkAnnotation link = annotation as PdfLinkAnnotation;
    if ((link != null) && (link.Action is PdfJavaScriptAction))
    {
        link.Action = null;
    }
}

private void RemoveFieldActions(PdfField field)
{
    field.CalculateAction = null;
    field.FormatAction = null;
    field.KeyPressAction = null;
    field.ValidateAction = null;

    for (int i = 0; i < field.Widgets.Count; i++)
    {
        if (field.Widgets[i].Focus is PdfJavaScriptAction)
        {
            field.Widgets[i].Focus = null;
        }
        if (field.Widgets[i].Blur is PdfJavaScriptAction)
        {
            field.Widgets[i].Blur = null;
        }
    }
}

The library supports .NET Core and it is available on nuget.org (id: xfinium.pdf.netcore).
Unless you implement your own PDF parsing and saving code, you cannot implement this task without using a 3rd party library.

Disclaimer: I work for the company that develops XFINIUM.PDF library.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM