[英]Read tab delimited text file in Spark R
I have a tab delimited file that is saved as a .txt with " " around the string variables.我有一个制表符分隔的文件,该文件保存为 .txt 文件,字符串变量周围带有“”。 The file can be found here .
该文件可以在这里找到。
I am trying to read it into Spark-R (version 3.1.2), but cannot successfully bring it into the environment.我正在尝试将其读入 Spark-R(版本 3.1.2),但无法成功将其引入环境。 I've tried variations of the
read.df
code, like this:我尝试过
read.df
代码的变体,如下所示:
df <- read.df(path = "FILE.txt", header="True", inferSchema="True", delimiter = "\\t", encoding="ISO-8859-15")
df <- read.df(path = "FILE.txt", source = "txt", header="True", inferSchema="True", delimiter = "\\t", encoding="ISO-8859-15")
I have had success with bringing in CSVs with read.csv
, but many of the files I have are over 10GB, and is not practical to convert them to CSV before bring them into Spark-R.我已经成功地使用
read.csv
引入了 CSV,但是我拥有的许多文件都超过 10GB,并且在将它们引入 Spark-R 之前将它们转换为 CSV 是不切实际的。
EDIT: When I run read.df
I get a laundry list of errors, starting with this:编辑:当我运行
read.df
我得到了一个错误清单,从这个开始:
I am able to bring in csv files used in a previous project with both read.df
and read.csv
, so I don't think it's a java issue.我可以使用
read.df
和read.csv
以前项目中使用的 csv 文件,所以我认为这不是 Java 问题。
If you don't need to specifically use Spark R, then base R read.table
should work just fine for the .txt you provided.如果您不需要专门使用 Spark R,那么 base R
read.table
对于您提供的 .txt 应该可以正常工作。 Note that it is tab-delimited, and so this should be specified.请注意,它是制表符分隔的,因此应该指定它。
Something like this should work:这样的事情应该工作:
dat <- read.table("FILE.TXT",
sep="\t",
header=TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.