[英]Reading from BigQuery and store data to Google storage (Special Character issue)
Reference: Can Google Data flow use existent VM and not temporary created ones?参考: 谷歌数据流可以使用现有的虚拟机而不是临时创建的虚拟机吗?
Code is working, but the issue is that when it saves response from BigQuery to google storage all the Japanese characters are corrupted.代码正在运行,但问题是当它将 BigQuery 的响应保存到谷歌存储时,所有日语字符都已损坏。
PCollectionTuple QVCollections = rows.apply("FilterEmptyRows", ParDo.of(new FilterEmptyRowDoFn("TransactionId", "TransactionDateTime"))).apply("CreateQVFiles",ParDo.of(new TransactionToQVFilesDoFnJP())
.withOutputTags(BobShare.QVHeaders, TupleTagList.of(BobShare.QVEvents).and(BobShare.QVPayments)));
QVCollections.get(BobShare.QVEvents).apply("WriteQVEvents", TextIO.write().to(storagePath + CSV_OUTPUT_FOLDER + "events_" + timeSuffix).withoutSharding().withHeader(CSV_HEADER_EVENTS).withSuffix(".csv"));
QVCollections.get(BobShare.QVPayments).apply("WriteQVPayments", TextIO.write().to(storagePath + CSV_OUTPUT_FOLDER + "payments_" + timeSuffix).withoutSharding().withHeader(CSV_HEADER_PAYMENTS).withSuffix(".csv"));
QVCollections.get(BobShare.QVHeaders).apply("WriteQVHeaders", TextIO.write().to(storagePath + CSV_OUTPUT_FOLDER + "header_" + timeSuffix).withoutSharding().withHeader(CSV_HEADER_TRANSACTION).withSuffix(".csv"));
Based on what I have found, need to use .withCoder(StringUtf8Coder.of())
根据我的发现,需要使用
.withCoder(StringUtf8Coder.of())
In addition, this is what have tried (but working only locally - DirectRunner)此外,这是尝试过的(但只能在本地工作 - DirectRunner)
private static void uploadBlob(String project, String bucket, String filename, String localfile) {
String listFromCsv = readCsvFromLocalStorage(localfile);
Storage storage = StorageOptions.newBuilder().setProjectId(project).build().getService();
BlobId blobId = BlobId.of(bucket, filename);
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType("application/json").setContentEncoding(UTF_8).build();
try {
storage.create(blobInfo, listFromCsv.getBytes(UTF_8));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
private static String readCsvFromLocalStorage(String fileName) {
StringBuilder builder = new StringBuilder();
Path pathToFile = Paths.get(fileName);
try (BufferedReader br = Files.newBufferedReader(pathToFile,
StandardCharsets.UTF_8)) {
// read the first line from the text file
String line = br.readLine();
// loop until all lines are read
while (line != null) {
builder.append(line).append("\n");
line = br.readLine();
}
} catch (IOException ioe) {
ioe.printStackTrace();
}
return builder.toString();
}
private static void deleteLocalFile (String fileName)
{
try {
if (new File(fileName).delete()) {
System.out.println(fileName + " deleted.");
} else {
System.out.println(fileName + " could not be deleted.");
}
} catch (Exception e)
{
System.out.println(fileName + " could not be deleted.");
e.printStackTrace();
}
}
This is how data looks like (corrupted) : JAPANESE CHRACTERS这就是数据的样子(已损坏): JAPANESE CHRACTERS
Any suggestions?有什么建议? Any .... (((
任何 .... (((
You need to replace你需要更换
BufferedReader br = Files.newBufferedReader(pathToFile, StandardCharsets.UTF_8)) BufferedReader br = Files.newBufferedReader(pathToFile, StandardCharsets.UTF_8))
by经过
BufferedReader br = Files.newBufferedReader(pathToFile, Charset.forName("UTF-8")) BufferedReader br = Files.newBufferedReader(pathToFile, Charset.forName("UTF-8"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.