简体   繁体   English

如何将Nutch 2.3数据转储到WARC文件中?

[英]How to dump Nutch 2.3 data into WARC file?

I need to dump data from Nutch 2.3 into a WARC file. 我需要将数据从Nutch 2.3转储到WARC文件中。 However, i couldn't find the necessary module. 但是,我找不到必要的模块。 Nutch 1.x had this capability. Nutch 1.x具有此功能。 I would like to know the proper way to do it. 我想知道正确的方法。

As you said, at the moment the WARC exporter module is not yet ported to the 2.x branch of Nutch, nevertheless porting the https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/tools/warc/WARCExporter.java module shoudln't be that hard. 正如您所说,目前WARC导出器模块尚未移植到Nutch的2.x分支,但是仍移植https://github.com/apache/nutch/blob/master/src/java/org/apache /nutch/tools/warc/WARCExporter.java模块不应该那么难。 As a general rule the 1.x branch of Nutch still is more used and better equiped than the 2.x branch (at least for now). 作为一般规则,Nutch的1.x分支比2.x分支(至少到目前为止)仍然使用更多,装备更完善。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM