[英]Removing Watermark from a PDF using iTextSharp
I added a watermark on pdf using Pdfstamper. 我使用Pdfstamper在pdf上添加了水印。 Here is the code:
这是代码:
for (int pageIndex = 1; pageIndex <= pageCount; pageIndex++)
{
iTextSharp.text.Rectangle pageRectangle = reader.GetPageSizeWithRotation(pageIndex);
PdfContentByte pdfData = stamper.GetUnderContent(pageIndex);
pdfData.SetFontAndSize(BaseFont.CreateFont(BaseFont.HELVETICA, BaseFont.CP1252,
BaseFont.NOT_EMBEDDED), watermarkFontSize);
PdfGState graphicsState = new PdfGState();
graphicsState.FillOpacity = watermarkFontOpacity;
pdfData.SetGState(graphicsState);
pdfData.SetColorFill(iTextSharp.text.BaseColor.BLACK);
pdfData.BeginText();
pdfData.ShowTextAligned(PdfContentByte.ALIGN_CENTER, "LipikaChatterjee",
pageRectangle.Width / 2, pageRectangle.Height / 2, watermarkRotation);
pdfData.EndText();
}
This works fine. 这很好。 Now I want to remove this watermark from my pdf.
现在,我想从pdf中删除该水印。 I looked into iTextSharp but was not able to get any help.
我调查了iTextSharp,但无法获得任何帮助。 I even tried to add watermark as layer and then delete the layer but was not able to delete the content of layer from the pdf.
我什至尝试将水印添加为图层,然后删除该图层,但无法从pdf中删除图层的内容。 I looked into iText for layer removal and found a class OCGRemover but I was not able to get an equivalent class in iTextsharp.
我调查了iText的图层删除过程,发现了一个OCGRemover类,但无法在iTextsharp中获得一个等效的类。
I'm going to give you the benefit of the doubt based on the statement "I even tried to add watermark as layer" and assume that you are working on content that you are creating and not trying to unwatermark someone else's content. 我将基于“我什至试图将水印添加为图层”的陈述,并假设您正在处理自己创建的内容,而不是尝试对其他人的内容取消加水印,来为您带来疑问的好处。
PDFs use Optional Content Groups (OCG) to store objects as layers. PDF使用可选内容组(OCG)将对象存储为图层。 If you add your watermark text to a layer you can fairly easily remove it later.
如果将水印文本添加到图层,则以后可以很轻松地将其删除。
The code below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.0. 下面的代码是一个针对iTextSharp 5.1.1.0的功能全面的C#2010 WinForms应用程序。 It uses code based on Bruno's original Java code found here .
它使用基于此处找到的Bruno原始Java代码的代码 。 The code is in three sections.
该代码分为三个部分。 Section 1 creates a sample PDF for us to work with.
第1部分创建了供我们使用的样本PDF。 Section 2 creates a new PDF from the first and applies a watermark to each page on a separate layer.
第2节从头开始创建新的PDF,并将水印应用于单独图层上的每个页面。 Section 3 creates a final PDF from the second but removes the layer with our watermark text.
第三部分从第二部分创建了最终的PDF,但是删除了带有水印文本的图层。 See the code comments for additional details.
有关其他详细信息,请参见代码注释。
When you create a PdfLayer
object you can assign it a name to appear within a PDF reader. 创建
PdfLayer
对象时,可以为其指定一个名称,使其出现在PDF阅读器中。 Unfortunately I can't find a way to access this name so the code below looks for the actual watermark text within the layer. 不幸的是,我找不到访问此名称的方法,因此下面的代码在图层中查找实际的水印文本。 If you aren't using additional PDF layers I would recommend only looking for
/OC
within the content stream and not wasting time looking for your actual watermark text. 如果您不使用其他PDF层,建议您仅在内容流中查找
/OC
,而不要浪费时间查找实际的水印文本。 If you find a way to look for /OC
groups by name please let me kwow! 如果您找到一种按名称查找
/OC
组的方法,请让我知道!
using System;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1 {
public partial class Form1 : Form {
public Form1() {
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e) {
string workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
string startFile = Path.Combine(workingFolder, "StartFile.pdf");
string watermarkedFile = Path.Combine(workingFolder, "Watermarked.pdf");
string unwatermarkedFile = Path.Combine(workingFolder, "Un-watermarked.pdf");
string watermarkText = "This is a test";
//SECTION 1
//Create a 5 page PDF, nothing special here
using (FileStream fs = new FileStream(startFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document(PageSize.LETTER)) {
using (PdfWriter witier = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
for (int i = 1; i <= 5; i++) {
doc.NewPage();
doc.Add(new Paragraph(String.Format("This is page {0}", i)));
}
doc.Close();
}
}
}
//SECTION 2
//Create our watermark on a separate layer. The only different here is that we are adding the watermark to a PdfLayer which is an OCG or Optional Content Group
PdfReader reader1 = new PdfReader(startFile);
using (FileStream fs = new FileStream(watermarkedFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (PdfStamper stamper = new PdfStamper(reader1, fs)) {
int pageCount1 = reader1.NumberOfPages;
//Create a new layer
PdfLayer layer = new PdfLayer("WatermarkLayer", stamper.Writer);
for (int i = 1; i <= pageCount1; i++) {
iTextSharp.text.Rectangle rect = reader1.GetPageSize(i);
//Get the ContentByte object
PdfContentByte cb = stamper.GetUnderContent(i);
//Tell the CB that the next commands should be "bound" to this new layer
cb.BeginLayer(layer);
cb.SetFontAndSize(BaseFont.CreateFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED), 50);
PdfGState gState = new PdfGState();
gState.FillOpacity = 0.25f;
cb.SetGState(gState);
cb.SetColorFill(BaseColor.BLACK);
cb.BeginText();
cb.ShowTextAligned(PdfContentByte.ALIGN_CENTER, watermarkText, rect.Width / 2, rect.Height / 2, 45f);
cb.EndText();
//"Close" the layer
cb.EndLayer();
}
}
}
//SECTION 3
//Remove the layer created above
//First we bind a reader to the watermarked file, then strip out a bunch of things, and finally use a simple stamper to write out the edited reader
PdfReader reader2 = new PdfReader(watermarkedFile);
//NOTE, This will destroy all layers in the document, only use if you don't have additional layers
//Remove the OCG group completely from the document.
//reader2.Catalog.Remove(PdfName.OCPROPERTIES);
//Clean up the reader, optional
reader2.RemoveUnusedObjects();
//Placeholder variables
PRStream stream;
String content;
PdfDictionary page;
PdfArray contentarray;
//Get the page count
int pageCount2 = reader2.NumberOfPages;
//Loop through each page
for (int i = 1; i <= pageCount2; i++) {
//Get the page
page = reader2.GetPageN(i);
//Get the raw content
contentarray = page.GetAsArray(PdfName.CONTENTS);
if (contentarray != null) {
//Loop through content
for (int j = 0; j < contentarray.Size; j++) {
//Get the raw byte stream
stream = (PRStream)contentarray.GetAsStream(j);
//Convert to a string. NOTE, you might need a different encoding here
content = System.Text.Encoding.ASCII.GetString(PdfReader.GetStreamBytes(stream));
//Look for the OCG token in the stream as well as our watermarked text
if (content.IndexOf("/OC") >= 0 && content.IndexOf(watermarkText) >= 0) {
//Remove it by giving it zero length and zero data
stream.Put(PdfName.LENGTH, new PdfNumber(0));
stream.SetData(new byte[0]);
}
}
}
}
//Write the content out
using (FileStream fs = new FileStream(unwatermarkedFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (PdfStamper stamper = new PdfStamper(reader2, fs)) {
}
}
this.Close();
}
}
}
As an extension to Chris's answer , a VB.Net class for removing a layer is included at the bottom of this post which should be a bit more precise. 作为Chris答案的扩展,在本文的底部包含一个VB.Net类,用于删除层,该类应该更精确一些。
OCGs
array in the OCProperties
dictionary in the file's catalog). OCGs
在阵列OCProperties
在文件的目录字典)。 This array contains indirect references to objects in the PDF file which contain the name /OC /{PagePropertyReference} BDC {Actual Content} EMC
so it can remove just these segments as appropriate /OC /{PagePropertyReference} BDC {Actual Content} EMC
实例,因此可以适当地删除这些段 The code then cleans up all the references as much as it can. 然后,代码将尽可能多地清理所有引用。 Calling the code might work as shown:
调用代码可能如下所示工作:
Public Shared Sub RemoveWatermark(path As String, savePath As String)
Using reader = New PdfReader(path)
Using fs As New FileStream(savePath, FileMode.Create, FileAccess.Write, FileShare.None)
Using stamper As New PdfStamper(reader, fs)
Using remover As New PdfLayerRemover(reader)
remover.RemoveByName("WatermarkLayer")
End Using
End Using
End Using
End Using
End Sub
Full class: 全班:
Imports iTextSharp.text
Imports iTextSharp.text.io
Imports iTextSharp.text.pdf
Imports iTextSharp.text.pdf.parser
Public Class PdfLayerRemover
Implements IDisposable
Private _reader As PdfReader
Private _layerNames As New List(Of String)
Public Sub New(reader As PdfReader)
_reader = reader
End Sub
Public Sub RemoveByName(name As String)
_layerNames.Add(name)
End Sub
Private Sub RemoveLayers()
Dim ocProps = _reader.Catalog.GetAsDict(PdfName.OCPROPERTIES)
If ocProps Is Nothing Then Return
Dim ocgs = ocProps.GetAsArray(PdfName.OCGS)
If ocgs Is Nothing Then Return
'Get a list of indirect references to the layer information
Dim layerRefs = (From l In (From i In ocgs
Select Obj = DirectCast(PdfReader.GetPdfObject(i), PdfDictionary),
Ref = DirectCast(i, PdfIndirectReference))
Where _layerNames.Contains(l.Obj.GetAsString(PdfName.NAME).ToString)
Select l.Ref).ToList
'Get a list of numbers for these layer references
Dim layerRefNumbers = (From l In layerRefs Select l.Number).ToList
'Loop through the pages
Dim page As PdfDictionary
Dim propsToRemove As IEnumerable(Of PdfName)
For i As Integer = 1 To _reader.NumberOfPages
'Get the page
page = _reader.GetPageN(i)
'Get the page properties which reference the layers to remove
Dim props = _reader.GetPageResources(i).GetAsDict(PdfName.PROPERTIES)
propsToRemove = (From k In props.Keys Where layerRefNumbers.Contains(props.GetAsIndirectObject(k).Number) Select k).ToList
'Get the raw content
Dim contentarray = page.GetAsArray(PdfName.CONTENTS)
If contentarray IsNot Nothing Then
For j As Integer = 0 To contentarray.Size - 1
'Parse the stream data looking for references to a property pointing to the layer.
Dim stream = DirectCast(contentarray.GetAsStream(j), PRStream)
Dim streamData = PdfReader.GetStreamBytes(stream)
Dim newData = GetNewStream(streamData, (From p In propsToRemove Select p.ToString.Substring(1)))
'Store data without the stream references in the stream
If newData.Length <> streamData.Length Then
stream.SetData(newData)
stream.Put(PdfName.LENGTH, New PdfNumber(newData.Length))
End If
Next
End If
'Remove the properties from the page data
For Each prop In propsToRemove
props.Remove(prop)
Next
Next
'Remove references to the layer in the master catalog
RemoveIndirectReferences(ocProps, layerRefNumbers)
'Clean up unused objects
_reader.RemoveUnusedObjects()
End Sub
Private Shared Function GetNewStream(data As Byte(), propsToRemove As IEnumerable(Of String)) As Byte()
Dim item As PdfLayer = Nothing
Dim positions As New List(Of Integer)
positions.Add(0)
Dim pos As Integer
Dim inGroup As Boolean = False
Dim tokenizer As New PRTokeniser(New RandomAccessFileOrArray(New RandomAccessSourceFactory().CreateSource(data)))
While tokenizer.NextToken
If tokenizer.TokenType = PRTokeniser.TokType.NAME AndAlso tokenizer.StringValue = "OC" Then
pos = CInt(tokenizer.FilePointer - 3)
If tokenizer.NextToken() AndAlso tokenizer.TokenType = PRTokeniser.TokType.NAME Then
If Not inGroup AndAlso propsToRemove.Contains(tokenizer.StringValue) Then
inGroup = True
positions.Add(pos)
End If
End If
ElseIf tokenizer.TokenType = PRTokeniser.TokType.OTHER AndAlso tokenizer.StringValue = "EMC" AndAlso inGroup Then
positions.Add(CInt(tokenizer.FilePointer))
inGroup = False
End If
End While
positions.Add(data.Length)
If positions.Count > 2 Then
Dim length As Integer = 0
For i As Integer = 0 To positions.Count - 1 Step 2
length += positions(i + 1) - positions(i)
Next
Dim newData(length) As Byte
length = 0
For i As Integer = 0 To positions.Count - 1 Step 2
Array.Copy(data, positions(i), newData, length, positions(i + 1) - positions(i))
length += positions(i + 1) - positions(i)
Next
Dim origStr = System.Text.Encoding.UTF8.GetString(data)
Dim newStr = System.Text.Encoding.UTF8.GetString(newData)
Return newData
Else
Return data
End If
End Function
Private Shared Sub RemoveIndirectReferences(dict As PdfDictionary, refNumbers As IEnumerable(Of Integer))
Dim newDict As PdfDictionary
Dim arrayData As PdfArray
Dim indirect As PdfIndirectReference
Dim i As Integer
For Each key In dict.Keys
newDict = dict.GetAsDict(key)
arrayData = dict.GetAsArray(key)
If newDict IsNot Nothing Then
RemoveIndirectReferences(newDict, refNumbers)
ElseIf arrayData IsNot Nothing Then
i = 0
While i < arrayData.Size
indirect = arrayData.GetAsIndirectObject(i)
If refNumbers.Contains(indirect.Number) Then
arrayData.Remove(i)
Else
i += 1
End If
End While
End If
Next
End Sub
#Region "IDisposable Support"
Private disposedValue As Boolean ' To detect redundant calls
' IDisposable
Protected Overridable Sub Dispose(disposing As Boolean)
If Not Me.disposedValue Then
If disposing Then
RemoveLayers()
End If
' TODO: free unmanaged resources (unmanaged objects) and override Finalize() below.
' TODO: set large fields to null.
End If
Me.disposedValue = True
End Sub
' TODO: override Finalize() only if Dispose(ByVal disposing As Boolean) above has code to free unmanaged resources.
'Protected Overrides Sub Finalize()
' ' Do not change this code. Put cleanup code in Dispose(ByVal disposing As Boolean) above.
' Dispose(False)
' MyBase.Finalize()
'End Sub
' This code added by Visual Basic to correctly implement the disposable pattern.
Public Sub Dispose() Implements IDisposable.Dispose
' Do not change this code. Put cleanup code in Dispose(ByVal disposing As Boolean) above.
Dispose(True)
GC.SuppressFinalize(Me)
End Sub
#End Region
End Class
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.