简体   繁体   English

使用JavaScript读写文件时的字符编码

[英]Character encoding when reading/writing a file with JavaScript

I'm currently having some issues with character encoding in client side JavaScript. 我目前在客户端JavaScript中遇到一些字符编码问题。 My basic program flow is this: client side JavaScript reads a local text file using the nifty FileReader. 我的基本程序流程是:客户端JavaScript使用漂亮的FileReader读取本地文本文件。 I then do a bunch of edits on the string, and then I try to offer a way for the user to download the new, altered file. 然后,我对字符串进行了大量编辑,然后尝试为用户提供一种下载新的,经过修改的文件的方法。 Here's my issue, the file I'm reading from is (according to Notepad++) encoded in ANSI, and the file I want to write also needs to be encoded in ANSI. 这是我的问题,我从中读取的文件(根据Notepad ++)是用ANSI编码的,而我要写入的文件也需要用ANSI编码。

When I try reading from the file like this: 当我尝试像这样从文件中读取时:

reader.readAsText(this.files[0], "ANSI");
...
cachedFile = e.target.result.split("\n");
console.log(cachedFile[179544]);

My result is something along the lines of this (the Î character isn't read properly): 我的结果是类似以下内容(Î字符无法正确读取):

name="�le-de-France" 

However, when I use ISO-8859-1 as the encoding parameter (a completely random choice), for some reason the result is correct: 但是,当我使用ISO-8859-1作为编码参数(完全随机选择)时,由于某种原因,结果是正确的:

name="Île-de-France" 

So there's a large chance I have no idea what's happening. 因此,我很有可能不知道发生了什么。 So I left it with ISO-8859-1 encoding, did my various edits, and then tried to prepare it for downloading. 因此,我将其保留为ISO-8859-1编码,进行了各种编辑,然后尝试准备进行下载。 I can't simply POST this to my server uncompressed and prepare a file for download because this file is, frankly, rather large (~14 mb). 我不能简单地将其解压缩后发布到我的服务器上,并准备要下载的文件,因为坦率地说,该文件很大(〜14 mb)。 It does however compress very nicely since it's plain text. 但是,由于它是纯文本格式,因此压缩效果非常好。 The issue is that any JavaScript compression library I've found (Like jszip which nicely lets you generate a file and stick it in a .zip) seems to maintain JavaScript's internal string encoding, which I believe is UTF-16. 问题是,我发现的任何JavaScript压缩库(就像jszip一样,可以很好地让您生成文件并将其粘贴到.zip中)似乎都维护了JavaScript的内部字符串编码,我相信它是UTF-16。 The .zip file is also encoded as base64 (which I just decoded on my PHP server). .zip文件也编码为base64(我刚刚在PHP服务器上将其解码)。 Doing this, of course, gives a final result of something like this: 当然,这样做会产生如下最终结果:

name="ÃŽle-de-France"

So here's my issue, I have a file encoded in ANSI, I parse it using ISO-8859-1, I edit it in UTF-16, and I need to find a way to get it back into ANSI and onto a person's desktop. 所以这是我的问题,我有一个用ANSI编码的文件,我使用ISO-8859-1对其进行了解析,并在UTF-16中对其进行了编辑,并且我需要找到一种方法将其重新导入ANSI并放到一个人的桌面上。 Is there a standard way to either convert the JavaScript string to ANSI before compressing it so I can just offer the compressed file to my user to download? 有没有一种标准方法可以在压缩之前将JavaScript字符串转换为ANSI,以便我可以将压缩文件提供给用户进行下载? Or is there a way to uncompress the string on the server side using PHP, convert to ANSI, and then offer it for download? 还是有一种方法可以使用PHP在服务器端解压缩字符串,转换为ANSI,然后将其提供下载? Just for reference, my current PHP code is simply this: 仅供参考,我当前的PHP代码就是这样:

<?php 

 $res = $_POST["saveString"];
 $maybe = base64_decode($res);
 header('Content-Type: application/download');
 header('Content-Disposition: attachment; filename="genSave.zip"');
 header("Content-Length: " . strlen($maybe));
 echo $maybe;

?>

My guess is [I will delete the answer if incorrect] that your encoding value is not valid. 我的猜测是[如果不正确,我将删除答案]您的编码值无效。 What Notepad calls "ANSI" is Windows-1252, nearly the same as ISO-8859-1. 记事本称为“ ANSI”的是Windows-1252,与ISO-8859-1几乎相同。 What do you expect "ANSI" to be other than ISO-8859-1? 您期望“ ANSI”不是ISO-8859-1吗? This should work: 这应该工作:

reader.readAsText(this.files[0], "iso-8859-1");
reader.readAsText(this.files[0], "windows-1252");

See the W3C spec for reference. 请参阅W3C规范以供参考。

I ended up with a slightly roundabout solution that is probably not nearly as efficient as it could be. 最后,我得到了一个略为about回的解决方案,该解决方案可能效率不如可能。 I put a UTF-16 encoded string of 12 million characters in a .zip file using JavaScript, POSTed it to my server encoded in base64, converted it back into a string, put it into a temporary file, opened that temporary file as a .zip file, unpacked it, converted it to ISO-8859-1, repacked it, then downloaded it to the client. 我使用JavaScript在.zip文件中放入了1200万个字符的UTF-16编码字符串,并将其发布到以base64编码的服务器中,将其转换回字符串,放入临时文件中,以。形式打开该临时文件。 zip文件,将其解压缩,将其转换为ISO-8859-1,重新打包,然后将其下载到客户端。

The final server side code was pretty simple, but unfortunately slow: 最终的服务器端代码非常简单,但不幸的是速度很慢:

<?php 

   $res = $_POST["saveString"];
   $zipInMem = base64_decode($res);

   $file = tempnam("tmp", "zip"); 
   file_put_contents ($file, $zipInMem);

   $zip = zip_open($file);

   $zip_entry = zip_read($zip);

   zip_entry_open($zip, $zip_entry);

   $contents = utf8_decode(zip_entry_read($zip_entry, zip_entry_filesize($zip_entry)));

   $zip = new ZipArchive();
   $zip->open($file, ZipArchive::OVERWRITE);

   $zip->addFromString('genFile.eu4', $contents);
   $zip->close();

   header('Content-Type: application/zip');
   header('Content-Disposition: attachment; filename="genSave.zip"');
   header("Content-Length: " . filesize($file));

   readfile($file);

   unlink($file);

?>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM