简体   繁体   English

UTF-8字符无法从PHP的JPEG IPTC数据正确显示

[英]UTF-8 characters not displaying properly from JPEG IPTC data in PHP

When reading the IPTC data from an image, UTF-8 accented characters are not displaying properly when reading them via PHP. 从图像读取IPTC数据时,通过PHP读取UTF-8重音字符时无法正确显示。

For example: é, ø and ü 例如:é,ø和ü

With a header content-type set as UTF8, instead of the character, I get the question mark in a black diamond. 将标头内容类型设置为UTF8,而不是字符,我得到了黑色菱形的问号。 If no content-type is set, then I get a dash character: — 如果未设置content-type,那么我得到一个破折号:

The following is the code being used to read the IPTC block: 以下是用于读取IPTC块的代码:

$file = '/path/to/image.jpg';
getimagesize($file, $info);
$iptc = iptcparse($info['APP13']);

I have also tried uploading the exact same image to a WordPress installation on the same server, and it properly strips the accented character and replaces it with it's basic latin equivalent. 我还尝试过将完全相同的图像上传到同一台服务器上的WordPress安装中,它会正确去除重音字符并将其替换为基本的拉丁等效字符。 I don't mind if this is the end result, I would just like to read the characters properly. 我不介意这是否是最终结果,我只是想正确阅读字符。

Any ideas on how to get the complete and correct data from the image? 关于如何从图像中获取完整和正确数据的任何想法?

Answering a bit late, but since I had the same problem displaying special characters as č š ž (which appear in Slovenian alphabet) I may aswell answer for future reference. 回答迟了一点,但是由于我遇到了与č š ž (显示在斯洛文尼亚字母中)的特殊字符相同的问题,所以我也可以作答以供将来参考。

Solution to this problem actually is not related to php, but to the IPTC data encoding. 实际上,此问题的解决方案与php不相关,而与IPTC数据编码相关。 By default most software that can write IPTC data will store it in plain ASCII. 默认情况下,大多数可以写入IPTC数据的软件都会将其存储为纯ASCII。 At first I've used Adobe Bridge - which actually displays all special characters as it should when you start tagging your images - but once you want to parse that data in PHP you will actually not see special characters. 最初,我使用过Adobe Bridge-在开始标记图像时实际上会显示所有特殊字符-但是,一旦要在PHP中解析该数据,您实际上将看不到特殊字符。 (I would have to check again this part, but the main catch is that two different encodings happen - one that encodes IPTC data on the image and one that displays that data in a program that can handle IPTC data - or something along this lines). (我将不得不再次检查这一部分,但主要问题是发生了两种不同的编码-一种在图像上编码IPTC数据,一种在可以处理IPTC数据的程序中显示该数据-或类似的东西) 。

To solve the problem I used a program called ExifTool which is an amazing piece of software and will let you manage almost any data on your image. 为了解决该问题,我使用了一个名为ExifTool的程序,它是一款了不起的软件,它将使您可以管理图像上的几乎所有数据。

Than I used it to convert all IPTC encodings to UTF-8 - and from then on I just had to retag images that had corrupt characters (which Adobe Bridge correctly displays but obviously does not save in correct encoding). 比起我用它来将所有IPTC编码转换为UTF-8 -从那时起,我只需要重新标记具有损坏字符的图像(Adobe Bridge可以正确显示,但显然不能以正确的编码保存)。

The command to accomplish this on all images in a folder is: 在文件夹中的所有图像上完成此操作的命令是:

exiftool -tagsfromfile @ -iptc:all -codedcharacterset=utf8

You may also want to download ExifTool GUI if you are not familiar working from cmd. 如果您不熟悉cmd的工作,则可能还需要下载ExifTool GUI

I haven't found any better program that could accomplish this same task faster. 我还没有找到更好的程序可以更快地完成此任务。

to set charset to utf8, use this code: 要将字符集设置为utf8,请使用以下代码:

$iptc = array(
  '1#090' => "\x1B%G" //utf8
);

change that part of code like this: 更改部分代码,如下所示:

// Convert the IPTC tags into binary code
$data = '';

foreach($iptc as $tag => $string) 
{
  $rec = substr($tag, 0,1);
  $tag = substr($tag, 2);
  $data .= iptc_make_tag($rec, $tag, $string);
}

// Embed the IPTC data
$content = iptcembed($data, $path);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM