简体   繁体   中英

Character encoding when reading/writing a file with JavaScript

I'm currently having some issues with character encoding in client side JavaScript. My basic program flow is this: client side JavaScript reads a local text file using the nifty FileReader. I then do a bunch of edits on the string, and then I try to offer a way for the user to download the new, altered file. Here's my issue, the file I'm reading from is (according to Notepad++) encoded in ANSI, and the file I want to write also needs to be encoded in ANSI.

When I try reading from the file like this:

reader.readAsText(this.files[0], "ANSI");
...
cachedFile = e.target.result.split("\n");
console.log(cachedFile[179544]);

My result is something along the lines of this (the Î character isn't read properly):

name="�le-de-France" 

However, when I use ISO-8859-1 as the encoding parameter (a completely random choice), for some reason the result is correct:

name="Île-de-France" 

So there's a large chance I have no idea what's happening. So I left it with ISO-8859-1 encoding, did my various edits, and then tried to prepare it for downloading. I can't simply POST this to my server uncompressed and prepare a file for download because this file is, frankly, rather large (~14 mb). It does however compress very nicely since it's plain text. The issue is that any JavaScript compression library I've found (Like jszip which nicely lets you generate a file and stick it in a .zip) seems to maintain JavaScript's internal string encoding, which I believe is UTF-16. The .zip file is also encoded as base64 (which I just decoded on my PHP server). Doing this, of course, gives a final result of something like this:

name="ÃŽle-de-France"

So here's my issue, I have a file encoded in ANSI, I parse it using ISO-8859-1, I edit it in UTF-16, and I need to find a way to get it back into ANSI and onto a person's desktop. Is there a standard way to either convert the JavaScript string to ANSI before compressing it so I can just offer the compressed file to my user to download? Or is there a way to uncompress the string on the server side using PHP, convert to ANSI, and then offer it for download? Just for reference, my current PHP code is simply this:

<?php 

 $res = $_POST["saveString"];
 $maybe = base64_decode($res);
 header('Content-Type: application/download');
 header('Content-Disposition: attachment; filename="genSave.zip"');
 header("Content-Length: " . strlen($maybe));
 echo $maybe;

?>

My guess is [I will delete the answer if incorrect] that your encoding value is not valid. What Notepad calls "ANSI" is Windows-1252, nearly the same as ISO-8859-1. What do you expect "ANSI" to be other than ISO-8859-1? This should work:

reader.readAsText(this.files[0], "iso-8859-1");
reader.readAsText(this.files[0], "windows-1252");

See the W3C spec for reference.

I ended up with a slightly roundabout solution that is probably not nearly as efficient as it could be. I put a UTF-16 encoded string of 12 million characters in a .zip file using JavaScript, POSTed it to my server encoded in base64, converted it back into a string, put it into a temporary file, opened that temporary file as a .zip file, unpacked it, converted it to ISO-8859-1, repacked it, then downloaded it to the client.

The final server side code was pretty simple, but unfortunately slow:

<?php 

   $res = $_POST["saveString"];
   $zipInMem = base64_decode($res);

   $file = tempnam("tmp", "zip"); 
   file_put_contents ($file, $zipInMem);

   $zip = zip_open($file);

   $zip_entry = zip_read($zip);

   zip_entry_open($zip, $zip_entry);

   $contents = utf8_decode(zip_entry_read($zip_entry, zip_entry_filesize($zip_entry)));

   $zip = new ZipArchive();
   $zip->open($file, ZipArchive::OVERWRITE);

   $zip->addFromString('genFile.eu4', $contents);
   $zip->close();

   header('Content-Type: application/zip');
   header('Content-Disposition: attachment; filename="genSave.zip"');
   header("Content-Length: " . filesize($file));

   readfile($file);

   unlink($file);

?>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM