简体   繁体   English

如何使数据框成为案例类?

[英]How to make a dataframe into a case class?

Lots of documentations show that it's possible to go from a case class to a dataframe, but I haven't been able to find a good way of going from a dataframe to a case class. 许多文档显示,可以从案例类转到数据框,但是我一直没有找到从数据框转到案例类的好方法。

Let's say I have a dataframe with 50 columns, but would like to select out about 5 columns and make it into a new table. 假设我有一个包含50列的数据框,但想选择大约5列并将其放入新表中。 I could approach it this way: 我可以这样处理:

sqlContext.sql("select [1, 2, 3, 4, 5] from test").registerTempTable("newTable")

But the newTable will have some other columns like 6, 7 as a customized value (or 0 for now, but this column just doesn't exist in the test table). 但是newTable将有一些其他列,例如6、7作为自定义值(或现在为0,但是该列在测试表中不存在)。 To solve this, I tried to create a case class that looks like this: 为了解决这个问题,我试图创建一个如下的case类:

case class newTable(1, 2, 3, 4, 5, 6, 7)

In the end, I would want to extract column 1 through 5 from the test table, then input 6, 7 whatever I would like to. 最后,我想从测试表中提取第1至5列,然后输入6、7。 I just haven't found a good way of doing this. 我只是没有找到一个很好的方法来做到这一点。

you can use like this: 您可以这样使用:

dataframe.select($"1".as("1"), $"2".as("2"), $"3".as("3"), $"4".as("4"), $"5".as("5")).as[newTable]

Note : you should match the column name as the field name in your case class 注意:您应该在案例类中将列名与字段名匹配

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM