简体   繁体   English

从 Scala Spark 中的 RDD[type] 获取不同的行

[英]Get distinct rows from RDD[type] in scala Spark

Let say I have an RDD of format like this RDD[employee] and sample data as follows :-假设我有一个像这种 RDD[employee] 格式的 RDD,样本数据如下:-

FName,LName,Department,Salary
dubert,tomasz ,paramedic i/c,91080.00,
edwards,tim p,lieutenant,114846.00,
edwards,tim p,lieutenant,234846.00,
edwards,tim p,lieutenant,354846.00,
elkins,eric j,police,104628.00,
estrada,luis f,police officer,96060.00,
ewing,marie a,clerk,53076.00,
ewing,marie a,clerk,13076.00,
ewing,marie a,clerk,63076.00,
finn,sean p,firefighter,87006.00,
fitch,jordan m,law clerk,14.51
fitch,jordan m,law clerk,14.51

Expected Output :-预期输出:-

dubert,tomasz ,paramedic i/c,91080.00,
edwards,tim p,lieutenant,354846.00,
elkins,eric j,police,104628.00,
estrada,luis f,police officer,96060.00,
ewing,marie a,clerk,63076.00,
finn,sean p,firefighter,87006.00,
fitch,jordan m,law clerk,14.51

I want a single row of each based on distinct Fname我想要基于不同 Fname 的每一行

I think you want do something like that:我想你想做这样的事情:

df
.groupBy('Fname)
.agg(
  first('LName),
  first('Department),
  first('Salary)
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM