简体   繁体   中英

Get column data type of a column contained in a Apache spark data set

I am trying to find if there is a way to get the datatype of a column contained in a Apache spark data set using java? I have a data set which contains a column called SSN and I wrote this code to trim the data in that column :

Dataset<Row> trimmedOutput = trimInput.select(trim(trimInput.col("SSN")).as("SSN")

I am trying to get the data type of the SSN column to validate it against the expected type.

Can someone please help me?

I came here looking for the same answer :) Now looking at the API, this is one way I can figure:

public static String dataTypeString(Dataset<Row> dataset, String colName) {
        StructField[] fields = dataset.schema().fields();
        String dataType = null;
        for(StructField field: fields) {
            if(field.name().equals(colName)) {
                dataType =  field.dataType().typeName();
                break;
            }
        }
        return dataType;
    }

To know the datatype of the SSN column in the trimmedOutput dataset, use it like below:

dataTypeString(trimmedOutput, "SSN") 

There is also a similar method simpleString() that you can invoke instead of typeName(), API docs mention the difference between these two.

If your intention is to check if a column in a dataset is of a certain datatype and fail if that's not the case, the below code will help:

SchemaUtils.checkColumnType(holdoutResults.schema(), 
                            "SSN", 
                            DataTypes.StrringType, 
                           "Datatype Mismatch for column SSN");

The above invocation will check if the 'SSN' column if of type String and if not so, it will fail by showing the message that you passed as the last argument - "Datatype Mismatch for column SSN". This method is available only on the SchemUtils class from the ml library.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM