I have an endpoint which receives a MultipartFile.
Resource upload(@PathVariable Integer id, @RequestParam MultipartFile file) throws IOException {
This file usually is a .csv
that I need to process every line and save the data.
But recently an user send a file with UTF-16 LE
encoding and this adds a lot of strange characters in the data.
I'd like to receive the file with any encoding and always force to my acceptable encoding, for example, UTF-8
, before process the file.
How can I do this?
After a few tests and search I found the solution.
To change the charset encode of a file I need to read and write the file applying the new target charset, but to create something generic which could receive any charset I need to identify the source charset.
To achieve that I add a dependency called UniversalDetector
:
<dependency>
<groupId>com.github.albfernandez</groupId>
<artifactId>juniversalchardet</artifactId>
<version>2.3.1</version>
</dependency>
Using it I could do this:
encoding = UniversalDetector.detectCharset(file.getInputStream());
if (encoding == null) {
//throw exception
}
And the method for transform the file:
private static void encodeFileInLatinAlphabet(InputStream source, String fromEncoding, File target) throws IOException {
try (BufferedReader reader = new BufferedReader(new InputStreamReader(source, fromEncoding));
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target),
StandardCharsets.ISO_8859_1))) {
char[] buffer = new char[16384];
int read;
while ((read = reader.read(buffer)) != -1)
writer.write(buffer, 0, read);
}
}
So I could receive any charset and encode in the desired charset.
Note: In my case I always need the file in ISO_8859_1
so that why in the method is fixed, but you could receive the target charset as a parameter.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.