I'm currently having some issues with character encoding in client side JavaScript. My basic program flow is this: client side JavaScript reads a local text file using the nifty FileReader. I then do a bunch of edits on the string, and then I try to offer a way for the user to download the new, altered file. Here's my issue, the file I'm reading from is (according to Notepad++) encoded in ANSI, and the file I want to write also needs to be encoded in ANSI.
When I try reading from the file like this:
reader.readAsText(this.files[0], "ANSI");
...
cachedFile = e.target.result.split("\n");
console.log(cachedFile[179544]);
My result is something along the lines of this (the Î character isn't read properly):
name="�le-de-France"
However, when I use ISO-8859-1 as the encoding parameter (a completely random choice), for some reason the result is correct:
name="Île-de-France"
So there's a large chance I have no idea what's happening. So I left it with ISO-8859-1 encoding, did my various edits, and then tried to prepare it for downloading. I can't simply POST this to my server uncompressed and prepare a file for download because this file is, frankly, rather large (~14 mb). It does however compress very nicely since it's plain text. The issue is that any JavaScript compression library I've found (Like jszip which nicely lets you generate a file and stick it in a .zip) seems to maintain JavaScript's internal string encoding, which I believe is UTF-16. The .zip file is also encoded as base64 (which I just decoded on my PHP server). Doing this, of course, gives a final result of something like this:
name="ÃŽle-de-France"
So here's my issue, I have a file encoded in ANSI, I parse it using ISO-8859-1, I edit it in UTF-16, and I need to find a way to get it back into ANSI and onto a person's desktop. Is there a standard way to either convert the JavaScript string to ANSI before compressing it so I can just offer the compressed file to my user to download? Or is there a way to uncompress the string on the server side using PHP, convert to ANSI, and then offer it for download? Just for reference, my current PHP code is simply this:
<?php
$res = $_POST["saveString"];
$maybe = base64_decode($res);
header('Content-Type: application/download');
header('Content-Disposition: attachment; filename="genSave.zip"');
header("Content-Length: " . strlen($maybe));
echo $maybe;
?>
My guess is [I will delete the answer if incorrect] that your encoding value is not valid. What Notepad calls "ANSI" is Windows-1252, nearly the same as ISO-8859-1. What do you expect "ANSI" to be other than ISO-8859-1? This should work:
reader.readAsText(this.files[0], "iso-8859-1");
reader.readAsText(this.files[0], "windows-1252");
See the W3C spec for reference.
I ended up with a slightly roundabout solution that is probably not nearly as efficient as it could be. I put a UTF-16 encoded string of 12 million characters in a .zip file using JavaScript, POSTed it to my server encoded in base64, converted it back into a string, put it into a temporary file, opened that temporary file as a .zip file, unpacked it, converted it to ISO-8859-1, repacked it, then downloaded it to the client.
The final server side code was pretty simple, but unfortunately slow:
<?php
$res = $_POST["saveString"];
$zipInMem = base64_decode($res);
$file = tempnam("tmp", "zip");
file_put_contents ($file, $zipInMem);
$zip = zip_open($file);
$zip_entry = zip_read($zip);
zip_entry_open($zip, $zip_entry);
$contents = utf8_decode(zip_entry_read($zip_entry, zip_entry_filesize($zip_entry)));
$zip = new ZipArchive();
$zip->open($file, ZipArchive::OVERWRITE);
$zip->addFromString('genFile.eu4', $contents);
$zip->close();
header('Content-Type: application/zip');
header('Content-Disposition: attachment; filename="genSave.zip"');
header("Content-Length: " . filesize($file));
readfile($file);
unlink($file);
?>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.