[英]Pentaho's “Hadoop File Input” (Spoon) always displays error when trying to read a file from HDFS
I am new to Pentaho and Spoon and I am trying to process a file from a local Hadoop node with a "Hadoop file input" item in Spoon (Pentaho). 我是Pentaho和Spoon的新手,我试图通过Spoon(Pentaho)中带有“ Hadoop文件输入”项的本地Hadoop节点处理文件。 The problem is that every URI I have tried so far seems to be incorrect.
问题在于,到目前为止,我尝试过的每个URI似乎都不正确。 I don't know how to really connect to the HDFS from Pentaho.
我不知道如何真正从Pentaho连接到HDFS。
To make it clear, the correct URI is: 为了明确起见,正确的URI是:
hdfs://localhost:9001/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv
HDFS://本地主机:9001 /用户/数据/ prueba_concepto / ListadoProductos_2017_02_13-15_59_con_id.csv
I know it's the correct one because I tested it via command-line and it perfectly works: 我知道它是正确的,因为我通过命令行对其进行了测试,并且效果很好:
hdfs dfs -ls hdfs://localhost:9001/user/data/prueba_concepto/ListadoProductos_2017_02_13-15_59_con_id.csv
So, setting the environment field to "static", here are some of the URIs I have tried in Spoon: 因此,将环境字段设置为“静态”,以下是我在Spoon中尝试过的一些URI:
I even tried the solution Garci García gives here: Pentaho Hadoop File Input which is setting the port to 8020 and use the following uri: 我什至尝试了GarciGarcía在此提供的解决方案: Pentaho Hadoop File Input ,将端口设置为8020并使用以下uri:
And then changed it back to 9001 and tried the same technique: 然后将其更改回9001并尝试相同的技术:
But still nothing worked for me ... everytime I press Mostrar Fichero(s)... button (Show file(s)), an error pops saying that that file cannot be found. 但是对于我来说仍然没有任何效果……每当我按Mostrar Fichero ...按钮(显示文件)时,都会弹出一个错误消息,提示找不到该文件。
I added a "Hadoop File Input" image here. 我在这里添加了“ Hadoop File Input”图像。
Thank you. 谢谢。
Okey, so I actually solved this. Okey,所以我实际上解决了这个问题。
I had to add a new Hadoop Cluster from the tab "View" -> Right click on Hadoop Cluster -> New 我必须从“视图”选项卡中添加一个新的Hadoop群集->右键单击Hadoop群集->新建
There I had to input my HDFS Hadoop configuration: 在那里,我必须输入我的HDFS Hadoop配置:
After that, if you hit the "Test" button, some of the tests will fail. 此后,如果您单击“测试”按钮,则某些测试将失败。 I solved the second one by copying all the configuration properties I had in my LOCAL Hadoop configuration file ($LOCAL_HADOOP_HOME/etc/hadoop/core-site.xml) into the spoon's hadoop configuration file:
我将本地LOCAL Hadoop配置文件($ LOCAL_HADOOP_HOME / etc / hadoop / core-site.xml)中的所有配置属性复制到了勺子的hadoop配置文件中,从而解决了第二个问题:
data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/hdp25/core-site.xml
数据集成/插件/ Pentaho的,大数据的插件/ Hadoop的配置/ hdp25 /核心的site.xml
After that, I had to modify the data-integration/plugins/pentaho-big-data-plugin/plugin.properties and set the property "active.hadoop.configuration" to hdp25: 之后,我不得不修改data-integration / plugins / pentaho-big-data-plugin / plugin.properties并将属性“ active.hadoop.configuration”设置为hdp25:
active.hadoop.configuration=hdp25
active.hadoop.configuration = hdp25
Restart spoon and you're good to go. 重新启动汤匙,您就可以走了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.