简体   繁体   中英

Check if a column exists in DF - Java Spark

I am trying to check if there is any method to see if a particular column exists in a Dataframe, and check it using Java Spark. I searched and ended up with suggestions related to Python, but nothing related to Java.

i am extracting this data from Mongo and trying to check certain columns exist or not. There is no schema validation available in the mongo db for this table.

The following is my Schema and i would like to check if they exist with my config of columns.

 |-- _id: string (nullable = true)
 |-- value: struct (nullable = true)
 |    |-- acctId: string (nullable = true)
 |    |-- conId: string (nullable = true)
 |    |-- dimensions: struct (nullable = true)
 |    |    |-- device: struct (nullable = true)
 |    |    |    |-- accountId: long (nullable = true)
 |    |    |    |-- addFreeTitleTime: timestamp (nullable = true)
 |    |    |    |-- build: string (nullable = true)
 |    |    |    |-- country: string (nullable = true)
 |    |    |    |-- countryOfResidence: string (nullable = true)
 |    |    |    |-- createDate: timestamp (nullable = true)
 |    |    |    |-- number: string (nullable = true)
 |    |    |    |-- FamilyName: string (nullable = true)
 |    |    |    |-- did: long (nullable = true)
 |    |    |    |-- deviceToken: string (nullable = true)
 |    |    |    |-- initialBuildNumber: string (nullable = true)
 |    |    |    |-- language: string (nullable = true)
 |    |    |    |-- major: integer (nullable = true)
 |    |    |    |-- minor: integer (nullable = true)
 |    |    |    |-- model: string (nullable = true)
 |    |    |    |-- modelDesc: string (nullable = true)
 |    |    |    |-- modelId: string (nullable = true)
 |    |    |    |-- modifyDate: timestamp (nullable = true)
 |    |    |    |-- preReg: integer (nullable = true)
 |    |    |    |-- retailer: string (nullable = true)
 |    |    |    |-- serialNumber: string (nullable = true)
 |    |    |    |-- softwareUpdateDate: timestamp (nullable = true)
 |    |    |    |-- softwareVersion: string (nullable = true)
 |    |    |    |-- sourceId: string (nullable = true)
 |    |    |    |-- timeZone: string (nullable = true)
 |    |    |-- location: struct (nullable = true)

Your inputs and suggestions would be of great value.

Thanks in Advance

sourceDF.printSchema
//  root
//  |-- category: string (nullable = true)
//  |-- tags: string (nullable = true)
//  |-- datetime: string (nullable = true)
//  |-- date: string (nullable = true)

  val cols = sourceDF.columns
//  cols: Array[String] = Array(category, tags, datetime, date)

  val IsFieldCategory = cols.filter(_ == "category")
//  IsFieldCategory: Array[String] = Array(category)

or

val isFieldTags = sourceDF.columns.contains("tags")
//  isFieldTags: Boolean = true

Yes, you can achieve this in Java by fetching all the columns of a Dataset and checking if the column you want exists or not. Giving sample example here:

Dataset<Object1> dataSet = spark.read().text("dataPath").as(Encoders.bean(Object1.class)); //load data in dataset
String[] columns = dataSet.columns(); // fetch all column names
System.out.println(Arrays.toString(columns).contains("columnNameToCheckFor")); //check if the column name we want to check exist in the array of columns.

Here I have used a very naive method to check if the column name exist in the array of columns, you can use any other method to perform this check.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM