繁体   English   中英

Pentaho Hadoop文件输入

[英]Pentaho Hadoop File Input

我正在尝试使用Pentaho Kettle (版本6.0.1.0-386 )从独立的Hadoop (默认配置的版本2.7.2 qith属性)中检索数据。 Pentaho和Hadoop不在同一台机器上,但是我拥有彼此的权限。

我创建了一个具有以下属性的新“ Hadoop文件输入”:

需要环境文件/文件夹通配符包括子文件夹url到文件NN

网址到文件的构建方式如下:$ {PROTOCOL}:// $ {USER}:$ {PASSWORD} @ $ {IP}:$ {PORT} $ {PATH_TO_FILE}

例如:hdfs:// hadoop:@the_ip:50010 / user / hadoop / red_libelium / Ikusi / libelium_waspmote_AC_2_libelium_waspmote / libelium_waspmote_AC_2_libelium_waspmote.txt

密码为空

我检查了该文件是否存在于HDFS中,并通过网络管理器并使用haddop命令行正确下载了该文件。

方案A)当我使用$ {PROTOCOL} = hdfs和$ {PORT} = 50010时,我在Pentaho和Hadoop控制台中都遇到错误:

Pentaho的:

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2016/04/05 15:23:46 - FileInputList - ERROR (version 6.0.1.0-386, build 1 from 2015-12-03 11.37.25 by buildguy) : org.apache.commons.vfs2.FileSystemEx
ception: Could not list the contents of folder "hdfs://hadoop@172.21.0.35:50010/user/hadoop/red_libelium/Ikusi/libelium_waspmote_AC_2_libelium_waspmot
e/libelium_waspmote_AC_2_libelium_waspmote.txt".
2016/04/05 15:23:46 - FileInputList -   at org.apache.commons.vfs2.provider.AbstractFileObject.getChildren(AbstractFileObject.java:1193)
2016/04/05 15:23:46 - FileInputList -   at org.pentaho.di.core.fileinput.FileInputList.createFileList(FileInputList.java:243)
2016/04/05 15:23:46 - FileInputList -   at org.pentaho.di.core.fileinput.FileInputList.createFileList(FileInputList.java:142)
2016/04/05 15:23:46 - FileInputList -   at org.pentaho.di.trans.steps.textfileinput.TextFileInputMeta.getTextFileList(TextFileInputMeta.java:1580)
2016/04/05 15:23:46 - FileInputList -   at org.pentaho.di.trans.steps.textfileinput.TextFileInput.init(TextFileInput.java:1513)
2016/04/05 15:23:46 - FileInputList -   at org.pentaho.di.trans.step.StepInitThread.run(StepInitThread.java:69)
2016/04/05 15:23:46 - FileInputList -   at java.lang.Thread.run(Thread.java:745)
2016/04/05 15:23:46 - FileInputList - Caused by: java.io.EOFException: End of File Exception between local host is: "EI001115/192.168.231.248"; destin
ation host is: "172.21.0.35":50010; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException
2016/04/05 15:23:46 - FileInputList -   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
2016/04/05 15:23:46 - FileInputList -   at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
2016/04/05 15:23:46 - FileInputList -   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
2016/04/05 15:23:46 - FileInputList -   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
2016/04/05 15:23:46 - FileInputList -   at com.sun.proxy.$Proxy70.getListing(Unknown Source)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTrans
latorPB.java:554)
2016/04/05 15:23:46 - FileInputList -   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2016/04/05 15:23:46 - FileInputList -   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
2016/04/05 15:23:46 - FileInputList -   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2016/04/05 15:23:46 - FileInputList -   at java.lang.reflect.Method.invoke(Method.java:606)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
2016/04/05 15:23:46 - FileInputList -   at com.sun.proxy.$Proxy71.getListing(Unknown Source)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
2016/04/05 15:23:46 - FileInputList -   at com.pentaho.big.data.bundles.impl.shim.hdfs.HadoopFileSystemImpl$9.call(HadoopFileSystemImpl.java:126)
2016/04/05 15:23:46 - FileInputList -   at com.pentaho.big.data.bundles.impl.shim.hdfs.HadoopFileSystemImpl$9.call(HadoopFileSystemImpl.java:124)
2016/04/05 15:23:46 - FileInputList -   at com.pentaho.big.data.bundles.impl.shim.hdfs.HadoopFileSystemImpl.callAndWrapExceptions(HadoopFileSystemImpl
.java:200)
2016/04/05 15:23:46 - FileInputList -   at com.pentaho.big.data.bundles.impl.shim.hdfs.HadoopFileSystemImpl.listStatus(HadoopFileSystemImpl.java:124)
2016/04/05 15:23:46 - FileInputList -   at org.pentaho.big.data.impl.vfs.hdfs.HDFSFileObject.doListChildren(HDFSFileObject.java:115)
2016/04/05 15:23:46 - FileInputList -   at org.apache.commons.vfs2.provider.AbstractFileObject.getChildren(AbstractFileObject.java:1184)
2016/04/05 15:23:46 - FileInputList -   ... 6 more
2016/04/05 15:23:46 - FileInputList - Caused by: java.io.EOFException
2016/04/05 15:23:46 - FileInputList -   at java.io.DataInputStream.readInt(DataInputStream.java:392)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1071)
2016/04/05 15:23:46 - FileInputList -   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
2016/04/05 15:23:48 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp

Hadoop的:

2016-04-05 14:22:56,045 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: fiware-hadoop:50010:DataXceiver error processing unknown operation  src: /192.168.231.248:62961 dst: /172.21.0.35:50010
java.io.IOException: Version Mismatch (Expected: 28, Received: 26738 )
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:60)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
        at java.lang.Thread.run(Thread.java:745)

场景其他)在其他情况下使用不同的por编号(50070、9000 ...),我只是从Pentaho那里得到错误,Hadoop独立服务器似乎没有收到任何请求。

阅读Pentaho的一些文档后,似乎大数据插件是从Hadoop v 2.2.x开始的,因为我正尝试连接到2.7.2。 可能是问题的根源吗? 更高版本是否有任何插件? 仅仅是我的HDFS文件网址错误?

谢谢大家的宝贵时间,任何提示都将受到欢迎。

我将亲自回答该问题,因为我已解决了该问题,而且该问题太大了,无法简单评论。

解决了该问题,并在Hadoop配置中进行了一些更改。

  1. 我在core-site.xml中更改了配置

从:

<property>
    <name>fs.default.name</name>
    <value>hdfs://hadoop:9000</value>
</property>

至:

<property>
    <name>fs.default.name</name>
    <value>hdfs://server_ip_address:8020</value>
</property>

由于端口9000出现问题,因此我最终更改为端口8020( 相关问题 )。

  1. 打开端口8020(以防万一您有防火墙规则)
  2. Pentaho Kettle转换网址将类似于: $ {PROTOCOL}:// $ {USER}:$ {PASSWORD} @ $ {HOST}:$ {PORT} $ {FILE_PATH}现在$ {PORT}将为8020。

这样,我就可以通过Pentaho转换预览来自HDFS的数据。

谢谢大家的时间。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM