在 Spark R 中读取制表符分隔的文本文件

Question

I have a tab delimited file that is saved as a .txt with " " around the string variables.我有一个制表符分隔的文件，该文件保存为 .txt 文件，字符串变量周围带有“”。 The file can be found here .该文件可以在这里找到。

I am trying to read it into Spark-R (version 3.1.2), but cannot successfully bring it into the environment.我正在尝试将其读入 Spark-R（版本 3.1.2），但无法成功将其引入环境。 I've tried variations of the read.df code, like this:我尝试过read.df代码的变体，如下所示：

df <- read.df(path = "FILE.txt", header="True", inferSchema="True", delimiter = "\\t", encoding="ISO-8859-15")

df <- read.df(path = "FILE.txt", source = "txt", header="True", inferSchema="True", delimiter = "\\t", encoding="ISO-8859-15")

I have had success with bringing in CSVs with read.csv , but many of the files I have are over 10GB, and is not practical to convert them to CSV before bring them into Spark-R.我已经成功地使用read.csv引入了 CSV，但是我拥有的许多文件都超过 10GB，并且在将它们引入 Spark-R 之前将它们转换为 CSV 是不切实际的。

EDIT: When I run read.df I get a laundry list of errors, starting with this:编辑：当我运行read.df我得到了一个错误清单，从这个开始：

I am able to bring in csv files used in a previous project with both read.df and read.csv , so I don't think it's a java issue.我可以使用read.df和read.csv以前项目中使用的 csv 文件，所以我认为这不是 Java 问题。

Answer 1

If you don't need to specifically use Spark R, then base R read.table should work just fine for the .txt you provided.如果您不需要专门使用 Spark R，那么 base R read.table对于您提供的 .txt 应该可以正常工作。 Note that it is tab-delimited, and so this should be specified.请注意，它是制表符分隔的，因此应该指定它。

Something like this should work:这样的事情应该工作：

dat <- read.table("FILE.TXT",  
                  sep="\t",
                  header=TRUE)

在 Spark R 中读取制表符分隔的文本文件

问题描述

1 个解决方案

解决方案1
0 2021-07-13 19:28:47

在 Spark R 中读取制表符分隔的文本文件

问题描述

1 个解决方案

解决方案1 0 2021-07-13 19:28:47

解决方案1
0 2021-07-13 19:28:47