简体   繁体   English

读取图像IPTC数据

[英]Read image IPTC data

I'm having some trouble with reading out the IPTC data of some images, the reason why I want to do this, is because my client has all the keywords already in the IPTC data and doesn't want to re-enter them on the site. 我在读取某些图像的IPTC数据时遇到了一些麻烦,我之所以这样做是因为我的客户端已经拥有IPTC数据中的所有关键字,并且不希望重新输入它们。现场。

So I created this simple script to read them out: 所以我创建了这个简单的脚本来读出它们:

$size = getimagesize($image, $info);

if(isset($info['APP13'])) {
    $iptc = iptcparse($info['APP13']);

    print '<pre>';
        var_dump($iptc['2#025']);
    print '</pre>';
}

This works perfectly in most cases, but it's having trouble with some images. 这在大多数情况下都能很好地工作,但是它在某些图像上有问题。

Notice: Undefined index: 2#025 注意:未定义的索引:2#025

While I can clearly see the keywords in photoshop. 虽然我可以在photoshop中清楚地看到关键字。

Are there any decent small libraries that could read the keywords in every image? 是否有任何体面的小型库可以读取每个图像中的关键字? Or am I doing something wrong here? 或者我在这里做错了什么?

I have found that IPTC is almost always embedded as xml using the XMP format, and is often not in the APP13 slot. 我发现IPTC几乎总是使用XMP格式嵌入为xml,并且通常不在APP13插槽中。 You can sometimes get the IPTC info by using iptcparse($info['APP1']) , but the most reliable way to get it without a third party library is to simply search through the image file from the relevant xml string (I got this from another answer, but I haven't been able to find it, otherwise I would link!): 你有时可以使用iptcparse($info['APP1'])获取IPTC信息,但是在没有第三方库的情况下获得它的最可靠方法是从相关的xml字符串中搜索图像文件(我得到了这个)从另一个答案,但我没有找到它,否则我会链接!):

The xml for the keywords always has the form "<dc:subject>...<rdf:Seq><rdf:li>Keyword 1</rdf:li><rdf:li>Keyword 2</rdf:li>...<rdf:li>Keyword N</rdf:li></rdf:Seq>...</dc:subject>" 关键字的xml始终具有"<dc:subject>...<rdf:Seq><rdf:li>Keyword 1</rdf:li><rdf:li>Keyword 2</rdf:li>...<rdf:li>Keyword N</rdf:li></rdf:Seq>...</dc:subject>"

So you can just get the file as a string using file_get_contents(get_attached_file($attachment_id)) , use strpos() to find each opening ( <rdf:li> ) and closing ( </rdf:li> ) XML tag, and grab the keyword between them using substr() . 所以你可以使用file_get_contents(get_attached_file($attachment_id))将文件作为字符串获取,使用strpos()查找每个开头( <rdf:li> )和关闭( </rdf:li> )XML标记,然后抓取使用substr()之间的关键字。

The following snippet works for all jpegs I have tested it on. 以下代码段适用于我测试过的所有jpeg。 It will fill the array $keys with IPTC tags taken from an image on wordpress with id $attachment_id : 它将使用带有id $attachment_id wordpress上的图像中的IPTC标记填充数组$keys

$content = file_get_contents(get_attached_file($attachment_id));

// Look for xmp data: xml tag "dc:subject" is where keywords are stored
$xmp_data_start = strpos($content, '<dc:subject>') + 12;

// Only proceed if able to find dc:subject tag
if ($xmp_data_start != FALSE) {
    $xmp_data_end   = strpos($content, '</dc:subject>');
    $xmp_data_length     = $xmp_data_end - $xmp_data_start;
    $xmp_data       = substr($content, $xmp_data_start, $xmp_data_length);

    // Look for tag "rdf:Seq" where individual keywords are listed
    $key_data_start = strpos($xmp_data, '<rdf:Seq>') + 9;

    // Only proceed if able to find rdf:Seq tag
    if ($key_data_start != FALSE) {
        $key_data_end   = strpos($xmp_data, '</rdf:Seq>');
        $key_data_length     = $key_data_end - $key_data_start;
        $key_data       = substr($xmp_data, $key_data_start, $key_data_length);

        // $ctr will track position of each <rdf:li> tag, starting with first
        $ctr = strpos($key_data, '<rdf:li>');

        // Initialize empty array to store keywords
        $keys = Array();

        // While loop stores each keyword and searches for next xml keyword tag
        while($ctr != FALSE && $ctr < $key_data_length) {
            // Skip past the tag to get the keyword itself
            $key_begin = $ctr + 8;

            // Keyword ends where closing tag begins
            $key_end = strpos($key_data, '</rdf:li>', $key_begin);

            // Make sure keyword has a closing tag
            if ($key_end == FALSE) break;

            // Make sure keyword is not too long (not sure what WP can handle)
            $key_length = $key_end - $key_begin;
            $key_length = (100 < $key_length ? 100 : $key_length);

            // Add keyword to keyword array
            array_push($keys, substr($key_data, $key_begin, $key_length));

            // Find next keyword open tag
            $ctr = strpos($key_data, '<rdf:li>', $key_end);
        }
    }
} 

I have this implemented in a plugin to put IPTC keywords into WP's "Description" field, which you can find here . 我在插件中实现了这个功能,将IPTC关键字放入WP的“描述”字段中, 您可以在此处找到

I've seen a lot of weird IPTC problems. 我见过很多奇怪的IPTC问题。 Could be that you have 2 APP13 segments. 可能你有2个APP13段。 I noticed that, for some reasons, some JPEGs have multiple IPTC blocks. 我注意到,由于某些原因,一些JPEG有多个IPTC块。 It's possibly the problem with using several photo-editing programs or some manual file manipulation. 这可能是使用几个照片编辑程序或一些手动文件操作的问题。

Could be that PHP is trying to read the empty APP13 or even embedded "thumbnail metadata". 可能是PHP试图读取空的APP13甚至嵌入的“缩略图元数据”。

Could be also problem with segments lenght - APP13 or 8BIM have lenght marker bytes that might have wrong values. 段长度也可能有问题 - APP13或8BIM具有可能具有错误值的长度标记字节。

Try HEX editor and check the file "manually". 尝试HEX编辑器并“手动”检查文件。

ExifTool是非常强大的,如果你可以解决这个问题(从PHP看起来像?)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM