[英]Spark: Silently execute sc.wholeTextFiles
I am loading about 200k text files in Spark using input = sc.wholeTextFiles(hdfs://path/*)
I then run a println(input.count)
It turns out that my spark shell outputs a ton of text (which are the path of every file) and after a while it just hangs without returning my result. 我使用
input = sc.wholeTextFiles(hdfs://path/*)
在Spark中加载约20万个文本文件,然后运行println(input.count)
事实证明,我的println(input.count)
Shell输出大量文本(每个文件的路径),不久后它挂起而没有返回我的结果。
I believe this may be due to the amount of text outputted by wholeTextFiles
. 我相信这可能是由于
wholeTextFiles
输出的文本量wholeTextFiles
。 Do you know of any way to run this command silently? 您是否知道以任何方式静默运行此命令? or is there a better workaround?
还是有更好的解决方法?
Thanks! 谢谢!
How large are your files? 您的文件有多大? From the
wholeTextFiles
API : 从
wholeTextFiles
API中 :
Small files are preferred, large files are also allowable, but may cause bad performance.
小文件是首选,大文件也是允许的,但可能会导致性能下降。
In conf/log4j.properties
, you can suppress excessive logging, like this: 在
conf/log4j.properties
,您可以禁止过多的日志记录,如下所示:
# Set everything to be logged to the console
log4j.rootCategory=ERROR, console
That way, you'll get back only res
to the repl , just like in the Scala (the language) repl . 这样一来,你会得到只有
res
的REPL,就像在斯卡拉(语言)REPL。
Here are all other logging levels you can play with: log4j API . 这是您可以使用的所有其他日志记录级别: log4j API 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.