[英]Encoding issue with write.xlsx (openxlsx)
I use the write.xlsx()
function (from the openxlsx
package) to turn a list object into an excel spreadsheet, where each element of the list is converted into a "sheet" of the excel file.我使用
write.xlsx()
函数(来自openxlsx
包)将列表对象转换为 excel 电子表格,其中列表的每个元素都转换为 excel 文件的“工作表”。 In the past, this function has been incredibly useful, and I have never encountered any issues.过去,此功能非常有用,我从未遇到过任何问题。 It is my understanding that this package and function in particular does not need any particular java update on the computer in order for it to work.
我的理解是,这个包和功能尤其不需要在计算机上进行任何特定的 Java 更新即可使其工作。
However, recently I discovered that the function is producing error.但是,最近我发现该函数正在产生错误。 This is what it states in the console when I run the write.xlsx() for the list:
这是当我为列表运行 write.xlsx() 时它在控制台中的状态:
Error in gsub("&", "&", v, fixed = TRUE) :
input string 5107 is invalid UTF-8
I've identified the dataframes that are the cause of the issue, but I am not sure how to identify which part of the data frame is causing the error.我已经确定了导致问题的数据帧,但我不确定如何确定数据帧的哪一部分导致了错误。
I've even went ahead and used the enc2utf8()
function for all of the columns in this data frame in particular but I still encounter the error.我什至继续使用
enc2utf8()
函数特别针对此数据框中的所有列,但我仍然遇到错误。 I've used the substr()
function on the data frame itself, and it shows me the first n
characters of each column, though I do not see any obvious issues from the output.我在数据框本身上使用了
substr()
函数,它向我显示了每列的前n
字符,尽管我没有从输出中看到任何明显的问题。
I've even went ahead and used the install.packages()
function to re-download the openxlsx
package again, in case of any updates.我什至继续使用
install.packages()
函数再次重新下载openxlsx
包,以防万一。
Does anyone know how I would go about identifying the cause of the error?有谁知道我将如何确定错误的原因? Is it the function as it is written in the package?
它是包中写的功能吗? If the problem is in the encoding of the data itself, does the
enc2utf8()
not suffice to resolve the issue?如果问题出在数据本身的编码上,那么
enc2utf8()
是否不足以解决问题?
Thanks!谢谢!
I just had this same problem.我刚刚遇到了同样的问题。 Building on this question , you could replace all bad characters in the dataframe with either:
基于此问题,您可以将数据框中的所有坏字符替换为:
library(dplyr)
df %>%
mutate_if(is.character, ~gsub('[^ -~]', '', .))
for only character columns, or:仅用于字符列,或:
df %>%
mutate_all(~gsub('[^ -~]', '', .))
for all columns, and then export to XLSX with write.xlsx()
.对于所有列,然后使用
write.xlsx()
导出到 XLSX。
As far as finding the error, the number given points you to the problem (in your case, 5107).至于发现错误,给定的数字指出了问题(在你的情况下,5107)。 This appears to be counting the strings that are written to the file.
这似乎是在计算写入文件的字符串。 To find the particular data point that's the issue, this approach worked for me:
为了找到问题所在的特定数据点,这种方法对我有用:
Let's assume our data frame has 20 variables and 10 of them are character type.假设我们的数据框有 20 个变量,其中 10 个是字符类型。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.