简体   繁体   中英

Spark Scala Dataframe join and modification

I have a table which has employee details and another table project which has the project details and employee id assigned.

Employee

EmployeeName|Id|Address|Assigned
Joan|101|xxxx|y

Project

ProjectCode|Number of days|Employee
XX1223|24|101

I have a csv file which will load the employee details in the employee table. While loading the employee details,

  1. I need to identify if his employee id is assigned to the project table:
    • if the employee id is available in the project table, insert y to Assigned in the Employee table.
    • if not, insert n to Assigned in the Employee table.

I have a dataframe for Employee as, var employeeDF = Employee_TABLE And, var employeeAssignedDF = Employee_Join_Project

At the moment, I insert to Employee first then do the join and then update Employee again. But I can do the employeeDF.except(employeeAssignedDF) which will have a minimum number of rows.

  1. Is it possible to change few of the data frame column alone?
  2. I want to insert to the table only once, so when I join and do the except I should have all the records which can be inserted to DB. Is that feasible?

Thanks

You could try this, But not sure whether this could solve your problem or not -

val newDf = df.withColumn("Column", when(CONDITION, 'Y').otherwise('N'))

You could also use any method at the place of "when(CONDITION, 'Y')"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM