简体   繁体   English

Spark:静默执行sc.wholeTextFiles

[英]Spark: Silently execute sc.wholeTextFiles

I am loading about 200k text files in Spark using input = sc.wholeTextFiles(hdfs://path/*) I then run a println(input.count) It turns out that my spark shell outputs a ton of text (which are the path of every file) and after a while it just hangs without returning my result. 我使用input = sc.wholeTextFiles(hdfs://path/*)在Spark中加载约20万个文本文件,然后运行println(input.count)事实证明,我的println(input.count) Shell输出大量文本(每个文件的路径),不久后它挂起而没有返回我的结果。

I believe this may be due to the amount of text outputted by wholeTextFiles . 我相信这可能是由于wholeTextFiles输出的文本量wholeTextFiles Do you know of any way to run this command silently? 您是否知道以任何方式静默运行此命令? or is there a better workaround? 还是有更好的解决方法?

Thanks! 谢谢!

How large are your files? 您的文件有多大? From the wholeTextFiles API : wholeTextFiles API中

Small files are preferred, large files are also allowable, but may cause bad performance. 小文件是首选,大文件也是允许的,但可能会导致性能下降。

In conf/log4j.properties , you can suppress excessive logging, like this: conf/log4j.properties ,您可以禁止过多的日志记录,如下所示:

# Set everything to be logged to the console
log4j.rootCategory=ERROR, console

That way, you'll get back only res to the repl , just like in the Scala (the language) repl . 这样一来,你会得到只有resREPL,就像在斯卡拉(语言)REPL。

Here are all other logging levels you can play with: log4j API . 这是您可以使用的所有其他日志记录级别: log4j API

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM