[英]Histogram with Spark Dataframe in Java
是否可以使用Java中的Spark 2.1从Dataset<Row>
表生成直方图数据框?
Example : I got a table in spark with table name as 'nation' having column as 'n_nationkey' which is Integer then this is how I did it: 示例:我在spark中得到了一个表,表名为“ nation”,表的列为“ n_nationkey”,它是Integer,这就是我的做法:
String query = "select n_nationkey from nation" ;
Dataset<Row> df = spark.sql(query);
JavaRDD<Integer> jdf = df.toJavaRDD().map(row -> row.getInt(0));
JavaDoubleRDD example = jdf.mapToDouble(y -> y);
Tuple2<double[], long[]> resultsnew = example.histogram(5);
In case the column have a double type, you simply replace some things as : 如果列为双精度类型,则只需将某些内容替换为:
JavaRDD<Double> jdf = df.toJavaRDD().map(row -> row.getDouble(0));
JavaDoubleRDD example = jdf.mapToDouble(y -> y);
Tuple2<double[], long[]> resultsnew = example.histogram(5);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.