簡體   English   中英

如何將javaRDD轉換為數據集

[英]How to convert javaRDD to dataset

我嘗試使用spark將Oracle數據庫中的數據讀入dataset,然后將dataset轉換成javaRDD進行map操作,我的代碼只能存儲dataset Spark官方文檔顯示: http ://spark.apache.org/docs/latest /sql-programming-guide.html#inferring-the-schema-using-reflection


// Apply a schema to an RDD of JavaBeans to get a DataFrame
Dataset<Row> peopleDF = spark.createDataFrame(peopleRDD, Person.class);

我們的數據是從 Oracle 讀取的,如何定義 Person.class 來存儲將 rdd 轉換為數據集? 或者如何使用Java直接對數據集進行地圖操作? 當我的代碼是這樣的時候我該怎么辦

import java.math.BigDecimal;
import java.util.Properties;
import java.util.Random;
import org.apache.commons.lang.StringUtils;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
public class FlatMapTest {
    public static void main(String[] args) {
        SparkConf conf = new SparkConf();
        conf.set("spark.sql.warehouse.dir", "./spark-warehouse");
        SparkSession spark = SparkSession.builder().master("local[3]").config(conf)
                .appName("Java Spark SQL data sources example").getOrCreate();

        jdbcDataSource(spark);

    }

    public static void jdbcDataSource(SparkSession spark) {
        // 連接到數據庫,獲得DF對象,DF對象封裝了數據庫的表信息數據
        Dataset<Row> jdbcDF = spark.read().format("jdbc")
                .option("driver", "oracle.jdbc.driver.OracleDriver")
                .option("url", "jdbc:oracle:thin:@192.168.101.207:1521:orcl")
                .option("dbtable", "datamask")
                .option("user", "scott")
                .option("password", "tiger").load();
        /*
         * 創建臨時表 datamask
         */
        jdbcDF.createOrReplaceTempView("datamask");
        Dataset<Row> sqlDF = spark.sql("select * from datamask");


        JavaRDD<Object> resultRDD = sqlDF.toJavaRDD().map(
                new Function<Row, Object>() {

                    public String call(Row row) throws Exception {
                        Random ran = new Random(); 
                        int r = ran.nextInt(9001) + 1000;
                        /*
                         * 將每個字段的后4位替換為隨機數,替換規則可以自己設定
                         */
                        String userName = row.getAs("USER_NAME");
                        userName = StringUtils.replace(userName,
                                StringUtils.right(userName, 4), "" + r);

                        String loginName = row.getAs("LOGIN_NAME");// LOGIN_NAME為數據庫字段名,下同
                        loginName = StringUtils.replace(loginName,
                                StringUtils.right(loginName, 4), "" + r);
                        String countyCode = row.getAs("COUNTY_CODE");
                        countyCode = StringUtils.replace(countyCode,
                                StringUtils.right(countyCode, 4), "" + r);
                        String passwd = row.getAs("PASSWORD");
                        passwd = StringUtils.replace(passwd,
                                StringUtils.right(passwd, 4), "" + r);
                        String areaId = row.getAs("AREA_ID");
                        areaId = StringUtils.replace(areaId,
                                StringUtils.right(areaId, 4), "" + r);
                        String cityNo = row.getAs("CITY_NO");
                        cityNo = StringUtils.replace(cityNo,
                                StringUtils.right(cityNo, 4), "" + r);
                        String cardID = row.getAs("CARD_ID");
                        cardID = StringUtils.replace(cardID,
                                StringUtils.right(cardID, 4), "" + r);
                        String mobile = row.getAs("MOBILE");
                        mobile = StringUtils.replace(mobile,
                                StringUtils.right(mobile, 4), "" + r);
                        String email = row.getAs("EMAIL");
                        email = StringUtils.replace(email,
                                StringUtils.right(email, 4), "" + r);
                        BigDecimal big = row.getAs("QQ");
                        String qq = big.toString();
                        qq = StringUtils.replace(qq, StringUtils.right(qq, 4),
                                "" + r);
                        String addr = row.getAs("ADDR");
                        addr = StringUtils.replace(addr,
                                StringUtils.right(addr, 4), "" + r);
                        String birthday = row.getAs("BIRTHDAY");
                        birthday = StringUtils.replace(birthday,
                                StringUtils.right(birthday, 4), "" + r);
                        String birthday1 = row.getAs("BIRTHDAY1");
                        birthday1 = StringUtils.replace(birthday1,
                                StringUtils.right(birthday1, 4), "" + r);
                        String codeId = row.getAs("CODE_ID");
                        codeId = StringUtils.replace(codeId,
                                StringUtils.right(codeId, 4), "" + r);
                        String deptNo = row.getAs("DEPT_NO");
                        deptNo = StringUtils.replace(deptNo,
                                StringUtils.right(deptNo, 4), "" + r);
                        String newCode = row.getAs("NEW_CODE");
                        newCode = StringUtils.replace(newCode,
                                StringUtils.right(newCode, 4), "" + r);
                        String oldCode = row.getAs("OLD_CODE");
                        oldCode = StringUtils.replace(oldCode,
                                StringUtils.right(oldCode, 4), "" + r);
                        return loginName + "," + countyCode + "," + passwd
                                + "," + areaId + "," + cityNo + "," + cardID
                                + "," + mobile + "," + email + "," + qq + ","
                                + addr + "," + birthday + "," + "," + birthday1
                                + "," + codeId + "," + deptNo + "," + newCode
                                + "," + oldCode;
                    }

                });
        Dataset<Row> peopleDF = spark.createDataFrame(resultRDD, Object.class);

        String url2 = "jdbc:oracle:thin:@192.168.101.207:1521:orcl";  
        Properties connectionProperties2 = new Properties();  
        connectionProperties2.setProperty("user", "scott");// 設置用戶名  
        connectionProperties2.setProperty("password", "tiger");// 設置密碼  
        String table2 = "masked1";

        peopleDF.write().mode(SaveMode.Append)  
        .jdbc(url2, table2, connectionProperties2); 

    }
}

使用編碼器/模式將輸入數據幀轉換為數據集后,為了進行映射操作,您無需將數據集轉換回 RDD,您可以自行在數據集上進行映射,以下代碼可能對您有用

Dataset<InputType> inputDataSet =someDataFrame.as(Encoders.bean(InputType.class))

Dataset<OutputType> outputDataSet=inputDataSet.map(new MapFunction<InputType, OutputType>() {
            @Override
            public OutputType call(InputType  value) throws Exception {
                // TODO Auto-generated method stub
                val someOutPutVal:OutputType=null;
                return someOutPutVal;
            }
        }, Encoders.bean(OutputType.class));

如果你真的想使用 Person.class 進行轉換,那么你需要定義一個 pojo:創建一個名為 Person 的新 java 類,只為你想要的字段使用 getter 和 setter、hashcode 和 equals、toString() 方法包含在您的 Person 對象中。 就這么簡單,那么就可以應用spark.createDataFrame(peopleRDD, Person.class);

或者:當您獲得所有字段時,將其直接打包到一個 Person 中。

public String call(Row row) {} ... 直接在這里創建一個 Person 對象:

public Person call(Row row) {}  

那么你將有一個JavaRDD<Person> resultRDD .. 直接。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM