I have a script for scrapping some tweets and saving the results to Google BigQuery. When I see the stored data, special characters like ➕, ♂️, Ñ, áéíóú appear correctly, but when I try to import the data again to R they are replaced by some strange characters. Here's an example.
# Create df
id_tweet <- 1023985670224785408
tweet <- "◉ Neuroeducación y entornos digitales de aprendizaje: un paso obligado para educadores, pedagogos y psicólogos"
descripcion <- "Desde las alturas se ve todo de otra manera... ️ ➕ ♂️"
data <- data.frame(id, tweet, description)
# Save to Google BQ
library(bigrquery)
insert_upload_job("project-id", "dataset", "table", data , write_disposition = "WRITE_APPEND")
#Load from Gooble BQ
sql <- paste("SELECT *", "FROM", "`project-id.dataset.table`")
data <- query_exec(sql, project = "project-id", use_legacy_sql = FALSE)
My output is the following:
> data
id_tweet
283 1023985670224785408
tweet
283 ◉ Neuroeducación y entornos digitales de aprendizaje: un paso obligado para educadores, pedagogos y psicólogos
descripcion
283 Desde las alturas se ve todo de otra manera... ï¿½ï¿½ï¸ âž• ��<U+200D>â™‚ï¸ ï¿½ï¿½ ��
What I want is to keep the original format.
What should I do?
Thanks,
I tested a few things which may help.
Firstly, I saved the blank R script and ensured it was in UTF-8 encoding: File -> Save with Encoding -> UTF-8. Then saved just the special characters in your question in double quotes as a .csv (ie "➕, ♂️, Ñ, áéíóú"
). Then read in the csv with fileEncoding = "UTF-8"
, ie:
test <- read.csv("test.csv", fileEncoding = "UTF-8", header=FALSE, stringsAsFactors = FALSE)
Inside R Studio, test
returns:
# > test
# V1
# 1 \u2795, ♂️, Ñ, áéíóú
So all but the ➕ display nicely in R Studio. However, a lot of characters, even common ones like line breaks, and tabs etc display funnily in RStudio but normally when a file is written. These are no different.
When the csv is written (just using write.csv(test, 'test2.csv', row.names=FALSE)
), it displays perfectly as it did in the original csv (that's when opened in sublime text)
After all this, I would suggest ensuring your encoding is UTF-8, and perhaps trying to save the BQ output as a csv (if possible?) and inspecting it to see if the issue is coming from BQ or R. If it comes out of BQ correctly, then it should be simply a matter of changing the encoding in RStudio. But if it's not coming out of BQ as intended, then I'd suggest you need to change the datatype in BQ (to UTF-8)
After 6 months, I finally managed to solve this problem. Instead of using the function query_exec
i used bq_table_download
from the same package instead. This function solves the problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.