Needs some help here. I am trying to read data from Hive/CSV. There is a column whose type is string and the value is json formatted string. It is something like this:
| Column Name A |
|----------------------------------------------------------|
|"{"key":{"data":{"key_1":{"key_A":[123]},"key_2":[456]}}}"|
How can I get the value of key_2 and insert it to a new column?
I tried to create a new function to the get value via Gson
private BigDecimal getValue(final String columnValue){
JsonObject jsonObject = JsonParser.parseString(columnValue).getAsJsonOBject();
return jsonObject.get("key").getAsJsonObject().get("key_1").getAsJsonObject().get("key_2").getAsJsonArray().get(0).getAsBigDecimal();
}
But how i can apply this method to the whole dataset?
I was trying to achieve something like this:
Dataset<Row> ds = souceDataSet.withColumn("New_column", getValue(sourceDataSet.col("Column Name A")));
But it cannot be done as the data types are different...
Could you please give any suggestions?
Thx! hx!
------------------Update---------------------
As @Mck suggested, I used get_json_object. As my value contains "
"{"key":{"data":{"key_1":{"key_A":[123]},"key_2":[456]}}}"
I used substring to removed "
and make the new string like this
{"key":{"data":{"key_1":{"key_A":[123]},"key_2":[456]}}}
Code for substring
DataSet<Row> dsA = sourceDataSet.withColumn("Column Name A",expr("substring(Column Name A, 2, length(Column Name A))"))
I used dsA.show()
and confirmed the dataset looks correct.
Then I used following code try to do it
Dataset<Row> ds = dsA.withColumn("New_column",get_json_object(dsA.col("Column Name A"), "$.key.data.key_2[0]"));
which returns null
.
However, if the data is this:
{"key":{"data":{"key_2":[456]}}}
I can get value 456.
Any suggestions why I get null? Thx for the help!
Use get_json_object
:
ds.withColumn(
"New_column",
get_json_object(
col("Column Name A").substr(lit(2), length(col("Column Name A")) - 2),
"$.key.data.key_2[0]")
).show(false)
+----------------------------------------------------------+----------+
|Column Name A |New_column|
+----------------------------------------------------------+----------+
|"{"key":{"data":{"key_1":{"key_A":[123]},"key_2":[456]}}}"|456 |
+----------------------------------------------------------+----------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.