Spark Scala Dataframe join and modification

Question

I have a table which has employee details and another table project which has the project details and employee id assigned.

Employee

EmployeeName|Id|Address|Assigned
Joan|101|xxxx|y

Project

ProjectCode|Number of days|Employee
XX1223|24|101

I have a csv file which will load the employee details in the employee table. While loading the employee details,

I need to identify if his employee id is assigned to the project table:
- if the employee id is available in the project table, insert y to Assigned in the Employee table.
- if not, insert n to Assigned in the Employee table.

I have a dataframe for Employee as, var employeeDF = Employee_TABLE And, var employeeAssignedDF = Employee_Join_Project

At the moment, I insert to Employee first then do the join and then update Employee again. But I can do the employeeDF.except(employeeAssignedDF) which will have a minimum number of rows.

Is it possible to change few of the data frame column alone?
I want to insert to the table only once, so when I join and do the except I should have all the records which can be inserted to DB. Is that feasible?

Thanks

Answer 1

You could try this, But not sure whether this could solve your problem or not -

val newDf = df.withColumn("Column", when(CONDITION, 'Y').otherwise('N'))

You could also use any method at the place of "when(CONDITION, 'Y')"

Spark Scala Dataframe join and modification

Question

Employee

Project

1 answers

solution1
0 2016-09-09 17:24:24

Spark Scala Dataframe join and modification

Question

Employee

Project

1 answers

solution1 0 2016-09-09 17:24:24

solution1
0 2016-09-09 17:24:24