简体   繁体   English

Java中带有Spark Dataframe的直方图

[英]Histogram with Spark Dataframe in Java

是否可以使用Java中的Spark 2.1从Dataset<Row>表生成直方图数据框?

  1. Convert the Dataset into JavaRDD where Datatype can be Integer, Double etc. using toJavaRDD().map() function. 使用toJavaRDD()。map()函数将数据集转换为JavaRDD,其中数据类型可以为Integer,Double等。
  2. Again Convert the JavaRDD to JavaDoubleRDD using mapToDouble function. 再次使用mapToDouble函数将JavaRDD转换为JavaDoubleRDD。
  3. Then you can apply histogram(int bucketcount) to get the histogram of the data. 然后,您可以应用直方图(int bucketcount)来获取数据的直方图。

Example : I got a table in spark with table name as 'nation' having column as 'n_nationkey' which is Integer then this is how I did it: 示例:我在spark中得到了一个表,表名为“ nation”,表的列为“ n_nationkey”,它是Integer,这就是我的做法:

String query = "select n_nationkey from nation" ;
Dataset<Row> df = spark.sql(query);
JavaRDD<Integer> jdf = df.toJavaRDD().map(row -> row.getInt(0));
JavaDoubleRDD example = jdf.mapToDouble(y -> y);
Tuple2<double[], long[]> resultsnew = example.histogram(5);

In case the column have a double type, you simply replace some things as : 如果列为双精度类型,则只需将某些内容替换为:

JavaRDD<Double> jdf = df.toJavaRDD().map(row -> row.getDouble(0));
JavaDoubleRDD example = jdf.mapToDouble(y -> y);
Tuple2<double[], long[]> resultsnew = example.histogram(5);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM