[英]UnsupportedOperationException while creating a dataset manually using Java SparkSession
我正在嘗試在 JUnit 測試中從如下所示的字符串創建數據集。
SparkSession sparkSession = SparkSession.builder().appName("Job Test").master("local[*]")
.getOrCreate();
String some1_json = readFileAsString("some1.json");
String some2_json = readFileAsString("some2.json");
String id = "some_id";
List<String[]> rowStrs = new ArrayList<>();
rowStrs.add(new String[] {some_id, some1_json, some2_json});
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkSession.sparkContext());
JavaRDD<Row> rowRDD = javaSparkContext.parallelize(rowStrs).map(RowFactory::create);
StructType schema = new StructType(new StructField[]{
DataTypes.createStructField("id", DataTypes.StringType, false),
DataTypes.createStructField("some1_json", DataTypes.StringType, false),
DataTypes.createStructField("some2_json", DataTypes.StringType, false)});
Dataset<Row> datasetUnderTest = sparkSession.sqlContext().createDataFrame(rowRDD, schema);
datasetUnderTest.show();
但我在下面看到這個錯誤
java.lang.ExceptionInInitializerError
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:103)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog$lzycompute(BaseSessionStateBuilder.scala:133)
...
....
Caused by: java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:215)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2284)
...
...
我在這里想念什么? 我的主要方法工作正常,但這個測試失敗了。 看起來有些東西沒有從類路徑中正確讀取。
通過從與 Spark 相關的所有依賴項中排除以下依賴項來修復它
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
</exclusion>
</exclusions>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.