简体   繁体   English

将十六进制字符串转换为二进制文件使其损坏且无法打开

[英]Converting hex string to binary file makes it corrupt and unable to open

When converting the hexadecimal value, a PDF file, the file is corrupted.转换十六进制值时,一个 PDF 文件,文件已损坏。
This is the partial hex content of a simple pdf file I want to convert:这是我要转换的简单 pdf 文件的部分十六进制内容:

0x255044462D312E370D0A25B5B5B5B50D0A312030206F626A0D0A3C3C2F547970652F436174

Full string: jsfiddle , pastebin完整字符串: jsfiddlepastebin

This question is a continuation of this question, where I said that I have to do a data migration between two programs that handle files differently.这个问题是这个问题的延续,我说我必须在两个以不同方式处理文件的程序之间进行数据迁移。 The source program stores the files hex encoded in the database.源程序将十六进制编码的文件存储在数据库中。

I could successfully extract and convert text files to binary files with the following code:我可以使用以下代码成功提取文本文件并将其转换为二进制文件:

file_put_contents(
    'document.pdf', 
    hex2bin(str_replace('0x', '', $hexPdfString))
);

But when I run this function on a pdf file or other binary file, it is corrupted.但是当我在 pdf 文件或其他二进制文件上运行这个 function 时,它已损坏。
My question is pretty much the same as this one but discussion over there was unfortunately discontinued.我的问题与这个问题几乎相同,但不幸的是,那里的讨论已经停止。

The result of hex decoding your string is corrupted because your string is incomplete , it only contains the first 65535 characters.十六进制解码字符串的结果已损坏,因为您的字符串不完整,它只包含前 65535 个字符。 After hex decoding one can see that the PDF is cut off inside a metadata stream:十六进制解码后,可以看到 PDF 在元数据 stream 中被截断:

20 0 obj
<</Type/Metadata/Subtype/XML/Length 3064>>
stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3.1-701">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""  xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
<pdf:Producer>Microsoft® Word 2019</pdf:Producer></rdf:Description>
<rdf:Description rdf:about=""  xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:creator><rdf:Seq><rdf:li>Samuel Gfeller</rdf:li></rdf:Seq></dc:creator></rdf:Description>
<rdf:Description rdf:about=""  xmlns:xmp="http://ns.adobe.com/xap/1.0/">
<xmp:CreatorTool>Microsoft® Word 2019</xmp:CreatorTool><xmp:CreateDate>2021-06-17T13:00:19+02:00</xmp:CreateDate><xmp:ModifyDate>2021-06-17T13:00:19+02:00</xmp:ModifyDate></rdf:Description>
<rdf:Description rdf:about=""  xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<xmpMM:DocumentID>uuid:C29344F5-3E78-414A-B4E3-775A853B1A0C</xmpMM:DocumentID><xmpMM:InstanceID>uuid:C29344F5-3E78-414A-B4E3-775A853B1A0C</xmpMM:InstanceID></rdf:Description>
                                                                                                    
                                                                                                    
                                                                          

The length 65535 of course is special, it's 0xFFFF.长度65535当然比较特殊,是0xFFFF。 Apparently some mechanism you used in retrieving that string could not handle strings longer than 65535 characters.显然,您在检索该字符串时使用的某些机制无法处理长度超过 65535 个字符的字符串。 Thus, you have to investigate the source of that string.因此,您必须调查该字符串的来源。

Considering the question you consider this question a continuation of, I'd assume that either the field in the MS SQL database you retrieve the data from is limited to 65535 bytes or your database value retrieval code cuts it down.考虑您认为这个问题是其延续的问题,我假设您从中检索数据的 MS SQL 数据库中的字段限制为 65535 字节,或者您的数据库值检索代码将其减少。

In the former case there'd be nothing you can do, the database contents simply would be incomplete.在前一种情况下,您无能为力,数据库内容只是不完整。 In the latter case you'd simply have to enable your database access code to handle long strings.在后一种情况下,您只需启用数据库访问代码即可处理长字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM