简体   繁体   English

无法从pdf提取cmyk颜色空间

[英]Unable to extract cmyk colorspaces from pdf

I'm trying to extract colorspace data from pdf. 我正在尝试从pdf中提取色彩空间数据。 I have a file with Pantone and CMYK colorspaces. 我有一个具有Pantone和CMYK色彩空间的文件。 When I extracted the colorspaces from PDF using any pdf library (I used pdfclown, pdfbox and icePdf), the output data consists only of Pantone colorspaces data but not even single info about CMYK colorspace. 当我使用任何pdf库(我使用pdfclown,pdfbox和icePdf)从PDF提取色彩空间时,输出数据仅包含Pantone色彩空间数据,甚至不包含有关CMYK色彩空间的单个信息。 I examined the file in CorelDraw software, when I clicked on the colorspace it shows the exact colorspace value like (PANTONE 3735 C, C 0 M 50 Y 50 K 0 etc). 我在CorelDraw软件中检查了文件,当我单击色彩空间时,它显示了准确的色彩空间值,例如(PANTONE 3735 C,C 0 M 50 Y 50 K 0等)。 How can I extract all the colorspaces present in a pdf (Pantone/CMYK) ? 如何提取pdf(Pantone / CMYK)中存在的所有色彩空间?

using (var file = new org.pdfclown.files.File(filePath))
{
       org.pdfclown.documents.Document document = file.Document;

       foreach (org.pdfclown.documents.Page page in document.Pages)
       {
             ContentScanner cs =  new ContentScanner(page); // Wraps the page contents into the scanner.

             System.Collections.Generic.List<org.pdfclown.documents.contents.colorSpaces.ColorSpace> list = cs.Contents.ContentContext.Resources.ColorSpaces.Values.ToList();
                    for (int i = 0; i < list.Count; i++)
                    {
                            // Print list of colorspaces available
                    }
        }
}

Sample PDF Document having CMYK and PANTONE Colors 具有CMYK和PANTONE颜色的样本 PDF文档

Output from ' pdfclown ' showing PANTONE and its alternative colorspaces: 来自“ pdfclown ”的输出 ,显示了PANTONE及其替代色空间:

屏幕截图

Original answer 原始答案

Unfortunately you don't show your code. 不幸的是,您没有显示代码。 But your screen shot looks like you merely look at the ColorSpace section of the page Resources . 但是您的屏幕快照看起来就像您仅查看参考资料页面的ColorSpace部分。 This does not suffice in a number of ways: 这不足以多种方式满足:

  • First of all, the colorspace resources are referenced by name from the content streams (cf. the Contents entry on your screen shot) to select colorspaces for stroking or filling. 首先,通过名称从内容流中引用颜色空间资源(请参见屏幕快照中的Contents条目),以选择要描边或填充的颜色空间。 But there are some predefined names that do not need to be described in the resources, cf. 但是有些预定义名称不需要在资源中描述,请参见。 the documentation of the CS operator: CS操作员的文档:

    Set the current colour space to use for stroking operations. 设置用于笔划操作的当前色彩空间。 The operand name shall be a name object. 操作数名称应为名称对象。 If the colour space is one that can be specified by a name and no additional parameters ( DeviceGray , DeviceRGB , DeviceCMYK , and certain cases of Pattern ), the name may be specified directly. 如果颜色空间是可以通过名称指定的颜色空间,并且没有其他参数( DeviceGrayDeviceRGBDeviceCMYK和某些Pattern的情况 ),则可以直接指定名称。 Otherwise, it shall be a name defined in the ColorSpace subdictionary of the current resource dictionary. 否则,它将是在当前资源字典的ColorSpace子词典中定义的名称。

    (ISO 32000-1, Table 74 – Colour Operators) (ISO 32000-1,表74 –颜色运算符)

    Thus, to check whether DeviceGray , DeviceRGB , or DeviceCMYK are used, you have to scan the content stream for color space selection operations ( CS or cs ) using these names. 因此,要检查是否使用DeviceGrayDeviceRGBDeviceCMYK ,必须使用这些名称扫描内容流以进行颜色空间选择操作( CScs )。

    Furthermore, there even are shortcut color selection operations which set either of those colorspaces and immediately select a color therein ( g , G , rg , RG , k , K ) for which you also have to scan the content stream. 此外,甚至还有快捷的颜色选择操作,可以设置这些颜色空间中的任何一个,并立即在其中选择一种颜色( gGrgRGkK ),您还必须为其扫描内容流。

    Eg in your page content stream you can find: 例如,在页面内容流中,您可以找到:

     0.3 0 1 0 k 

    and

     0.9 g 

    and multiple other occurrences of these operators. 以及这些运算符的其他多次出现。 Thus, at least DeviceGray and DeviceCMYK are in use (in addition to the resources you found). 因此,至少使用DeviceGrayDeviceCMYK (除了找到的资源之外)。

  • Furthermore, not all of the colorspaces you find in the Colorspace resource dictionary are necessarily actually used in the content. 此外,并不是所有你在色彩空间资源字典找到色彩空间的必然竟在内容使用 Thus, while scanning the content as above for uses of undeclared namespaces, you also have to scan for declared namespaces to ensure that they actually are used. 因此,在按上述方式扫描内容以查找未声明的名称空间时,还必须扫描已声明的名称空间以确保它们被实际使用。

  • You also have to look at other resources used from your content streams: 您还必须查看内容流中使用的其他资源:

    • The bitmap images (XObjects with Subtype value Image ), eg Im1 has ColorSpace DeviceCMYK and Im5 has ColorSpace DeviceRGB . 的位图图像(具有子类型图像 XObject的),例如具有图像Im1 的ColorSpace DeviceCMYKIM5具有的ColorSpace DeviceRGB。

      Again you have to make sure that the bitmaps actually are used in your content stream. 同样,您必须确保在您的内容流中实际使用了位图。

      Beware, JPEG2000 bitmaps may bring along their own colorspace definition in their own format! 注意,JPEG2000位图可能会以其自己的格式带来其自己的色彩空间定义!

    • Shadings, all Shadings in your PDF have ColorSpace DeviceCMYK . 阴影,PDF中的所有阴影都具有ColorSpace DeviceCMYK Again make sure they're actually used. 再次确保它们已被实际使用。

    • Form XObjects and Patterns have content streams and resources of their own. 表单XObject和模式具有自己的内容流和资源。 Don't forget deep-searching into their structure. 不要忘记深入研究它们的结构。 In your case, though, there are none. 但是,就您而言,没有。

    • Type 3 Fonts glyphs are defined via content streams and resources, they may also have their own colorspace. Type 3字体字形是通过内容流和资源定义的,它们也可能具有自己的色彩空间。 None are used in your file. 文件中未使用任何文件。

    • Transparency groups also may have a colorspace setting specifying among other things the colour space of the group as a whole when it in turn is painted as an object onto its backdrop . 透明度组还可以具有色彩空间设置,其中特别指定了当该组依次作为对象绘制在其背景上时的整体色彩空间

  • ... ...

Maybe I forgot 1 or 20 other places to look for relevant colorspace settings... 也许我忘记了其他1或20个地方来寻找相关的色彩空间设置...

For your file, though, already the places mentioned above show that in addition to your ColorSpace resources also DeviceGray , DeviceRGB , and DeviceCMYK are used in your PDF. 但是,对于您的文件,上面提到的位置已经表明,除了ColorSpace资源以外 ,PDF中还使用了DeviceGrayDeviceRGBDeviceCMYK

On the comments 关于评论

As you meanwhile have provided code and this code uses PDF Clown, I'll use it here, too. 正如您同时提供的代码和此代码使用PDF Clown一样,我也在这里使用它。 You can do equivalent stuff with PDF Box. 您可以使用PDF Box进行等效的处理。

Scan through a content stream 扫描内容流

A How to scan through a ContentStream ( checked the BaseDataObject of the 'Contents', it is like this ' [0] {cm [1, 0, 0, 1, 0, 0]}, 1 {gs [GS11]}' A如何扫描ContentStream(检查了Contents的BaseDataObject ,就像这样[[0] {cm [1,0,0,1,0,0]}, 1 {gs [GS11]}'

With PDF Clown you usually scan though a content stream using a ContentScanner . 使用PDF Clown,您通常使用ContentScanner扫描内容流。 And in your code you already have a ContentScanner cs . 并且在您的代码中,您已经有了ContentScanner cs Thus, simply call ScanForColorspaceUsage(cs) in your loop with ScanForColorspaceUsage defined like this: 因此,只需使用ScanForColorspaceUsage定义的ScanForColorspaceUsage在循环中调用ScanForColorspaceUsage(cs) ScanForColorspaceUsage

void ScanForColorspaceUsage(ContentScanner cs)
{
    while (cs.MoveNext())
    {
        ContentObject content = cs.Current;
        if (content is CompositeObject)
        {
            ScanForColorspaceUsage(cs.ChildLevel);
        }
        else if (content is SetFillColorSpace _cs)
        {
            Console.WriteLine("Used as fill color space: {0}", _cs.Name);
        }
        else if (content is SetDeviceCMYKFillColor _k)
        {
            Console.WriteLine("Used as fill color space: DeviceCMYK");
        }
        else if (content is SetDeviceGrayFillColor _g)
        {
            Console.WriteLine("Used as fill color space: DeviceGray");
        }
        else if (content is SetDeviceRGBFillColor _rg)
        {
            Console.WriteLine("Used as fill color space: DeviceRGB");
        }
        else if (content is SetStrokeColorSpace _CS)
        {
            Console.WriteLine("Used as stroke color space: {0}", _CS.Name);
        }
        else if (content is SetDeviceCMYKStrokeColor _K)
        {
            Console.WriteLine("Used as stroke color space: DeviceCMYK");
        }
        else if (content is SetDeviceGrayStrokeColor _G)
        {
            Console.WriteLine("Used as stroke color space: DeviceGray");
        }
        else if (content is SetDeviceRGBStrokeColor _RG)
        {
            Console.WriteLine("Used as stroke color space: DeviceRGB");
        }
    }
}

All colorspaces 所有色彩空间

B Whether the colorspace is used or not, I want to display all the Colorspaces available in the pdf and in the above document when I checked in CorelDraw it was displaying around 30-35 colorspaces as cmyk(in the second line of horizontal array of colorspaces) B无论是否使用色彩空间,当我在CorelDraw中检查时,我都想显示pdf和上述文档中所有可用的色彩空间,它显示为cmyk约30-35个色彩空间(在色彩空间水平阵列的第二行) )

Going through your document, whenever CMYK color is used, it is used via the DeviceCMYK color space, no special ICCBased one. 浏览文档时,每当使用CMYK颜色时,都会通过DeviceCMYK颜色空间使用它,而无需使用特殊的ICCBased Thus, only one CMYK colorspace is used in your PDF. 因此,PDF中仅使用一种CMYK颜色空间。

I don't have CorelDraw, so I cannot tell what exactly it shows you. 我没有CorelDraw,所以我无法告诉它确切显示了什么。 Or do you mean individual CMYK colors? 还是您指的是单独的CMYK颜色?

Learn deeper 深入学习

C Where can I learn deeper about these things to understand better? C我在哪里可以更深入地了解这些东西,以便更好地理解?

If by these things you mean how this all is represented in PDFs, the PDF specification might be a good reference. 如果您通过这些方式表示这一切在PDF中的表示方式,那么PDF规范可能是一个很好的参考。 The most current one, ISO 32000-2, is only available for money, eg from the ISO store, but the older one, ISO 32000-1, is also shared by Adobe for download as PDF32000_2008.pdf . 最新版本的ISO 32000-2仅可从ISO商店等单位购买,但较旧的版本ISO 32000-1也由Adobe共享,可下载为PDF32000_2008.pdf

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM