简体   繁体   English

'embedded nul in string' 错误尝试从 R 中的不同子目录导入多个 .csv 文件

[英]'embedded nul in string' error trying to import multiple .csv files en masse from different subdirectories in R

I have a large number of csv data files that are located in many different subdirectories.我有大量位于许多不同子目录中的 csv 数据文件。 The files all have the same name and are differentiated by the subdirectory name.这些文件都具有相同的名称,并通过子目录名称进行区分。

I'm trying to find a way to import them all into r in such a way that the subdirectory name for each file populates a column in the datafile.我试图找到一种方法将它们全部导入到 r 中,这样每个文件的子目录名称就会填充数据文件中的一列。

I have generated a list of the files using list.files(), which I've called tto_refs.我已经使用 list.files() 生成了一个文件列表,我称之为 tto_refs。

head(tto_refs) 1 "210119/210115 2021-01-19 16-28-14/REF TTO-210119.D/REPORT01.CSV" "210122/210115 2021-01-22 14-49-41/REF TTO-210122.D/REPORT01.CSV"头(tto_refs) 1 "210119/210115 2021-01-19 16-28-14/REF TTO-210119.D/REPORT01.CSV" "210122/210115 2021-01-22 14-4102-TTO-210119- .D/REPORT01.CSV”
[3] "210127/210127 2021-01-27 09-39-15/REF TTO-210127_1.D/REPORT01.CSV" "210127/210127 2021-01-27 09-39-15/REF TTO-210127_2.D/REPORT01.CSV" [3] "210127/210127 2021-01-27 09-39-15/REF TTO-210127_1.D/REPORT01.CSV" "210127/210127 2021-01-27 09-39-1212DTO-210127_1.D/REPORT01.CSV" /REPORT01.CSV"
[5] "210127A/210127 2021-01-28 15-57-40/REF TTO-210127A_1.D/REPORT01.CSV" "210127A/210127 2021-01-28 15-57-40/REF TTO-210127A_2.D/REPORT01.CSV" [5] "210127A/210127 2021-01-28 15-57-40/REF TTO-210127A_1.D/REPORT01.CSV" "210127A/210127 2021-01-28 15-57-2001D/REPORT01.CSV" /REPORT01.CSV"

I tried a few different methods to import the data into r, but they all had errors related to 'embedded nul(s)'.我尝试了几种不同的方法将数据导入 r,但它们都有与“嵌入的 nul(s)”相关的错误。

For example, tbl <- tto_refs %>% map_df(~read.csv(.))例如,tbl <- tto_refs %>% map_df(~read.csv(.))

There were 50 or more warnings (use warnings() to see the first 50)有 50 个或更多警告(使用 warnings() 查看前 50 个)

warnings() Warning messages: 1: In read.table(file = file, header = header, sep = sep, ... : line 1 appears to contain embedded nulls 2: In read.table(file = file, header = header, sep = sep, ... : line 2 appears to contain embedded nulls warnings() 警告消息:1:在 read.table(file = file, header = header, sep = sep, ... : 第 1 行似乎包含嵌入的空值 2: 在 read.table(file = file, header = header , sep = sep, ... :第 2 行似乎包含嵌入的空值

etc.等等。

How can I get this data into R?如何将这些数据导入 R?

Edit: the .csv files are generated from Agilent Chemstation analytical software.编辑:.csv 文件由安捷伦化学工作站分析软件生成。

The data looks like this:数据如下所示: 在此处输入图片说明

This means that your files are not csv text files but contain binary data.这意味着您的文件不是csv 文本文件,而是包含二进制数据。

You need to use a binary or hex editor to open the files in question and try to work out how the binary data got there in order to see how it should be processed.您需要使用二进制或十六进制编辑器打开有问题的文件,并尝试找出二进制数据是如何到达那里的,以便了解应如何处理。

On a linux machine I would do this at the terminal with a command like:在 linux 机器上,我会在终端使用如下命令执行此操作:

od -Ax -txCz -w32 FILENAME

EDIT : I suspect that the illegal bytes are whatever is being converted to "?"编辑:我怀疑非法字节是转换为“?”的任何内容。 at the end of the line by excel.在 excel 行的末尾。 Do you expect this field to contain anything useful?你希望这个字段包含任何有用的东西吗? Since the rest of the data is csv then I suspect this is a bug in the tool that generated it.由于其余数据是 csv,所以我怀疑这是生成它的工具中的一个错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM