简体   繁体   English

数据框火花中不存在列名称

[英]column names not present in dataframe spark

I am currently working with spark streaming and getting data from my kafka in json. 我目前正在使用Spark Streaming并从json中的kafka获取数据。 I convert my rdd to dataframe and register it as a table. 我将rdd转换为数据帧并将其注册为表。 After doing that when I fire a query where the column name does not exists in the dataframe it throws an error like 在执行完此操作后,当我触发数据框中不存在列名的查询时,它将引发类似以下的错误

"'No such struct field currency in price, recipientId;'"

HEre is my query
val selectQuery = "lower(serials.brand) as brandname, lower(appname) as appname, lower(serials.pack) as packname, lower(serials.asset) as assetname, date_format(eventtime, 'yyyy-MM-dd HH:00:00') as eventtime, lower(eventname) as eventname, lower(client.OSName) as platform, lower(eventorigin) as eventorigin, meta.price as price, client.ip as ip, lower(meta.currency) as currency, cast(meta.total as int) as count"

Here is my dataframe
DataFrame[addedTime: bigint, appName: string, client: struct<ip:string>, eventName: string, eventOrigin: string, eventTime: string, geoLocation: string, location: string, meta: struct<period:string,total:string>, serials: struct<asset:string,brand:string,pack:string>, userId: string]>

Now my json is not strict and there are times some keys may not be present. 现在我的json并不严格,有时某些键可能不存在。 How can I safely bypass this exception if the keys or columns are not there in dataframe? 如果数据框中没有键或列,如何安全绕过此异常?

you can use df.columns to check columns. 您可以使用df.columns检查列。 There are many ways to get column name and datatype df.schema. 有多种获取列名称和数据类型df.schema的方法。 You can also log schema df.printSchema() 您还可以记录架构df.printSchema()

So the only way I found was to create json schema for your json and then use that schema to parse your json into datafrmae 因此,我发现的唯一方法是为json创建json模式,然后使用该模式将json解析为datafrmae

val df = sqlcontext.read.schema(schema).json(rdd) val df = sqlcontext.read.schema(schema).json(rdd)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM