简体   繁体   English

编码将特殊字符csv编码为php

[英]encoding issue special character csv to php

So i've got this file ( http://mountainmarathon.ch/components/com_chronoconnectivity6/chronoconnectivity/uploads/20190814194827_classifica-cat-standard-3.csv ) which "should" be encoded in utf-8. 所以我有了这个文件( http://mountainmarathon.ch/components/com_chronoconnectivity6/chronoconnectivity/uploads/20190814194827_classifica-cat-standard-3.csv ),该文件应以utf-8编码。 When i try to read the contents via fgetcsv or file_get_contents i got those black diamonds with question marks for each ä,ö,ü character. 当我尝试通过fgetcsv或file_get_contents读取内容时,我得到了带有黑色问号的黑色菱形,每个ä,ö,ü字符都带有问号。

I already know that this is an encoding issue but as far as i can see everything is / should be utf-8 and utf-8 should be able to display ä,ö,ü, right? 我已经知道这是一个编码问题,但据我所知一切都是/应该是utf-8,而utf-8应该能够显示ä,ö,ü,对吗?

I have already checked a lot of possible solutions here but did not find any solution. 我已经在这里检查了很多可能的解决方案,但没有找到任何解决方案。 When i open the file with notepad++ i got the same strange problem with the diamonds (even when i try to change the encoding - then it changes to a rectangle). 当我用notepad ++打开文件时,菱形也出现了同样的奇怪问题(即使当我尝试更改编码时-也会变为矩形)。 - so its the file? -那么是文件吗?

nope then when i open the csv file on my iphone (inside mail app) the special chars ä,ö,ü are displayed correctly. 不,那么当我在iPhone(内部邮件应用程序)中打开csv文件时,特殊字符ä,ö,ü正确显示。

what i have tried so far was different mb_convert_encoding solutions from different stack overflow solutions but none of them worked. 到目前为止,我尝试过的是与来自不同堆栈溢出解决方案的mb_convert_encoding解决方案不同的解决方案,但没有一个起作用。

I think really something is not correct with this file but why the iPhone is able to render the content correctly? 我认为此文件确实有些不正确,但为什么iPhone能够正确呈现内容?

Can someone with more know how please check the file and tell me what i can do to import / use its content with PHP and get rid of those encoding issue? 可以让更多人知道如何检查该文件并告诉我如何使用PHP导入/使用其内容并摆脱那些编码问题吗?

Header is set to UTF-8 via header('Content-Type: text/html; charset=utf-8'); 标题通过header('Content-Type:text / html; charset = utf-8');设置为UTF-8。

in terminal "file -I file" returns UTF-8 在终端“文件-I文件”中返回UTF-8

i've tried two servers (my mamp with php7.3.1 & webserver with php7.x) 我试过两台服务器(我的php7.3.1和Webserver的php7.x)

I'm sorry but i do not post every link of every question i've checked here and on other platforms from the past three hours. 很抱歉,但是我没有发布过去三个小时来我在这里和其他平台上检查过的每个问题的每个链接。 And yes of course i have already checked plenty of infos and comments on php manual (fgetcsv, mb_encode / check , utf8_encode / decode... and so on) but did not found the needle which solves my issue. 是的,当然,我已经检查了很多有关php手册的信息和评论(fgetcsv,mb_encode / check,utf8_encode / decode等),但没有找到解决我问题的方法。

lastly i've checked my string (from file gets content) against this function: https://www.php.net/manual/de/function.mb-check-encoding.php#95289 which returns FALSE. 最后,我已经针对此函数检查了我的字符串(从文件获取内容): https : //www.php.net/manual/de/function.mb-check-encoding.php#95289返回FALSE。

and now nothing makes sense anymore. 现在没有任何意义了。

the code to reproduce is very simple: 要重现的代码非常简单:

$content = file_get_contents($url);
var_dump($content);

how can we display the special chars as ä,ö,ü and not as black diamonds with questionmarks. 我们如何将特殊字符显示为ä,ö,ü,而不是带有问号的黑色菱形。

Update 更新资料

Based on your analysis i have checked what exactly happens about file saving. 根据您的分析,我检查了有关文件保存的确切情况。

first: i receive the csv by email and as far as i can see it is in iso-8859-1 首先:我通过电子邮件收到csv,据我所知,它在iso-8859-1中

the iOS Scenario looks so: i open the mail in the mail app and display the csv directly inside the mail app --> all fine. iOS场景看起来是这样的:我在邮件应用程序中打开邮件,并直接在邮件应用程序内显示csv->一切正常。 Next i exported the file by mail app into my onedrive --> check to open the file on the phone --> all fine. 接下来,我通过邮件应用程序将文件导出到我的onedrive中->检查以在电话上打开文件->一切正常。 Now i am able to check for the charset on my mac via file -I and it is iso-8859-1 . 现在我可以通过文件-I在我的mac上检查字符集,它是iso-8859-1

When i am now try to use this file and php's utf8_encode --> all is good. 当我现在尝试使用此文件和php的utf8_encode->一切都很好。

So now i had to understand what went wrong before, for that here is the MacOS scenario: 所以现在我不得不了解以前出了什么问题,因为这是MacOS场景:

I open the (same) mail and save the same src file onto my harddrive, a quick check with file -I now gives me UTF-8 as charset. 我打开(相同)邮件,并将相同的src文件保存到我的硬盘驱动器中,对文件进行快速检查-我现在将UTF-8作为字符集。

On a windows machine with outlook, save file, open in notepad the characters are replaced: ä=>d, ü=>|, ... 在具有Outlook的Windows机器上,保存文件,在记事本中打开以下字符:ä=> d,ü=> |,...

I think right now - that the person which sends us this csv has to export the file as utf-8, for me it looks like its iso-8859-1 and the computers are do some weird stuff while saving the file is that possible? 我现在认为-向我们发送此csv的人必须将文件导出为utf-8,对我而言,它看起来像iso-8859-1,并且计算机在保存文件时做了一些奇怪的事情,这可能吗?

This response may be a bit meandering, but hope it provides useful info. 该响应可能有些曲折,但希望它能提供有用的信息。 I'm running these commands on an ubuntu workstation in a terminal window. 我正在终端窗口中的ubuntu工作站上运行这些命令。

I downloaded the file using Firefox. 我使用Firefox下载了文件。 The response headers did not specify any charset: 响应标头未指定任何字符集:

$ curl -sSL -D - http://mountainmarathon.ch/components/com_chronoconnectivity6/chronoconnectivity/uploads/20190814194827_classifica-cat-standard-3.csv -o /dev/null
HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Wed, 14 Aug 2019 21:24:00 GMT
Content-Type: text/html
Content-Length: 162
Connection: keep-alive
Keep-Alive: timeout=60
Location: http://www.mountainmarathon.ch/components/com_chronoconnectivity6/chronoconnectivity/uploads/20190814194827_classifica-cat-standard-3.csv
Strict-Transport-Security: max-age=63072000

HTTP/1.1 200 OK
Server: nginx
Date: Wed, 14 Aug 2019 21:24:00 GMT
Content-Type: text/csv
Content-Length: 39626
Connection: keep-alive
Keep-Alive: timeout=60
X-Content-Type-Options: nosniff
Last-Modified: Wed, 14 Aug 2019 19:48:27 GMT
ETag: "9aca-590190a7aa557"
Accept-Ranges: bytes
Strict-Transport-Security: max-age=63072000

If I inspect the beginning of the file, I do indeed see the weird characters you are talking about: 如果我检查文件的开头,确实可以看到您在说的怪异字符:

head -c 30 20190814194827_classifica-cat-standard-3.csv
11;1;102;Claudio;Br�ndli;198

That first weird character is represented by 3 bytes, ef bf bd : 第一个奇怪的字符由3个字节ef bf bd表示

$ head -c 30 20190814194827_classifica-cat-standard-3.csv | xxd
00000000: 3131 3b31 3b31 3032 3b43 6c61 7564 696f  11;1;102;Claudio
00000010: 3b42 72ef bfbd 6e64 6c69 3b31 3938       ;Br...ndli;198

That byte sequence corresponds to the UTF-8 replacement character , ie, the character used to replace problematic byte sequences. 该字节序列对应于UTF-8替换字符 ,即用于替换有问题的字节序列的字符。 This strongly suggests that the original file itself does not have the chars with umlauts that you want, but rather it contains the replacement character instead. 这强烈表明原始文件本身没有想要的带有变音符号的字符,而是包含了替换字符。

I've tried opening this file in a text editor (gedit) and in LibreOffice calc using numerous different encodings and the characters do not appear correctly in any combination of app and encoding that I've tried. 我尝试使用多种不同的编码在文本编辑器(gedit)和LibreOffice calc中打开此文件,并且在我尝试使用的应用程序和编码的任何组合中,字符均无法正确显示。

I put those 3 umlaut characters in a string and none of those strings matches that 3-byte string that is in your file: 我将这3个变音符号放在一个字符串中,这些字符串都不匹配文件中的3字节字符串:

$ echo "äöü" | xxd
00000000: c3a4 c3b6 c3bc 0a                        .......

To clarify, I believe a UTF-8 encoding of these characters maps as follows: 为了澄清起见,我相信这些字符映射的UTF-8编码如下:

ä = c3a4
ö = c3b6
ü = c3bc

I could be wrong here, but I think that remote website might actually contain the UTF-8 replacement character inside it? 我在这里可能是错误的,但我认为远程网站实际上可能在其中包含UTF-8替换字符? I wonder if the nginx server that's coughing up the file might be attempting to interpret this file's contents and failing? 我想知道正在处理文件的Nginx服务器是否正在尝试解释此文件的内容并失败? I tried setting up a PHP script to send accept-charset headers and it still gets the broken chars. 我尝试设置一个PHP脚本来发送accept-charset标头,但它仍然会获取损坏的字符。

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"http://www.mountainmarathon.ch/components/com_chronoconnectivity6/chronoconnectivity/uploads/20190814194827_classifica-cat-standard-3.csv");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$headers = [
    'Accept-Charset: utf-8',
    'Accept-Encoding: gzip, deflate',
    'Accept-Language: en-US,en;q=0.5',
    'Cache-Control: no-cache',
//  'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'
    'User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 12_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.2 Mobile/15E148 Safari/604.1'
Firefox/68.0'
];
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

$server_output = curl_exec ($ch);
file_put_contents("server-output.csv", $server_output);

curl_close ($ch);
echo "DONE\n";

To summarize, I think your original source file has already replaced the chars you want (ä, ö, ü, etc) with the generic UTF8 character used to signify a misunderstood byte sequence ( ). 总而言之,我认为您的原始源文件已经用通用的UTF8字符替换了您想要的字符(ä,ö,ü等),该字符用于表示被误解的字节序列(。)。 Either that or the CSV file is getting munged by the server that is coughing it up for some reason? 是由于某种原因导致服务器咳嗽的服务器还是CSV文件被割断了? Can you tell me more about viewing this file on your iPhone? 您能告诉我更多有关在iPhone上查看此文件的信息吗? Are you requesting it from that exact url with your iphone? 您是否要通过iPhone的确切网址请求它?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM