简体   繁体   中英

Is there any way to "sanitize" PDFs in C#?

What options are there for trying to remove viruses etc within a PDF? How is code hidden within PDFs? Can I remove at least the majority of it? I expect there to be holes still, but does it help to remove meta data like below?

Stream stream = new MemoryStream(_fileBytes);
PdfDocument document = new PdfDocument();
document.Save(stream, true);
document.Info.Author = "";
document.Info.CreationDate = new DateTime();
document.Info.Creator = "";
document.Info.Elements = new PdfDictionary.DictionaryElements();
document.Info.Internals = new PdfSharp.Pdf.Advanced.PdfObjectInternals();
document.Info.Keywords = "";

My concern is if a registered user uploads a bad PDF, then when other users download it from the server they will then get infected. Instead of trying to clean a PDF is there a better way?

One thing you should do is install anti virus software on your server and have it scan uploaded files, as dman2306 suggested.

In addition to that security necessity, you can remove elements from pdf files which could theoretically be vectors for attacks. The classic example is that pdf files allow javscript to be embedded and executed. This feature has been exploited by multiple types of malware.

So you can remove the PdfObjects that container javascript from the document. Annotations also allow for things like execution of programs if I recall, so you could remove those as well.

There are many different types of names that you might imagine could be potential vectors for attacks. A few that come to mind for me are automatic execution objects, and like I said, javascript objects. Removing these elements should help minimize risk in addition to the virus scan that was recommended.

I've been to a few places where they had odd requirements on this type of scenario. What they did was make it an attachment to an email and then send it to a certain mailbox. Exchange server is doing virus checks on attachments so this was the easiest cost effective way to accomplish that task. Then you programatically have something pick up the successful emails and download it.

Not saying this is the best solution because it is ugly but, it is one that has been used successfully to accomplish PDF virus checking.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM