I have a list of Map like this,
List<Map<String, Object>> myList = new ArrayList<>();
Map<String, Object> mp1 = new HashMap<>();
mp1.put("id", 1);
mp1.put("name", "John");
Map<String, Object> mp2 = new HashMap<>();
mp2.put("id", 2);
mp2.put("name", "Carte");
the key-value pairs the we are putting in the map are not fixed, we can have any dynamic key-value pairs(dynamic schema).
I want to convert it into spark dataframe. ( Dataset< Row > ).
+--+--------+
| id | name |
+--+--------+
| 1 | John |
+--+--------+
| 2 | Carte |
+--+--------+
How this can be achieved?
Note: As I said, the key-value pairs are dynamic, I can not create a java bean in advance and use this below syntax.
Dataset<Row> ds = spark.createDataFrame(myList, MyClass.class);
You can build rows and schema from the list of maps, then use spark.createDataFrame(rows: java.util.List[Row], schema: StructType)
to build your dataframe:
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.catalyst.expressions.GenericRow;
import org.apache.spark.sql.types.*;
...
public static Dataset<Row> buildDataframe(List<Map<String, Object>> listOfMaps, SparkSession spark) {
// extract columns name list
Set<String> columnSet = new HashSet<>();
for (Map<String, Object> elem: listOfMaps) {
columnSet.addAll(elem.keySet());
}
List<String> columns = new ArrayList<>(columnSet);
// build rows
List<Row> rows = new ArrayList<>();
for (Map<String, Object> elem : listOfMaps) {
List<Object> row = new ArrayList<>();
for (String key: columns) {
row.add(elem.get(key));
}
rows.add(new GenericRow(row.toArray()));
}
// build schema
List<StructField> fields = new ArrayList<>();
for (String column: columns) {
fields.add(new StructField(column, getDataType(column, listOfMaps), true, Metadata.empty()));
}
StructType schema = new StructType(fields.toArray(new StructField[0]));
// build dataframe from rows and schema
return spark.createDataFrame(rows, schema);
}
public static DataType getDataType(String column, List<Map<String, Object>> data) {
for (Map<String, Object> elem : data) {
if (elem.get(column) != null) {
return getDataType(elem.get(column));
}
}
return DataTypes.NullType;
}
public static DataType getDataType(Object value) {
if (value.getClass() == Integer.class) {
return DataTypes.IntegerType;
} else if (value.getClass() == String.class) {
return DataTypes.StringType;
// TODO add all other spark types (Long, Timestamp, etc...)
} else {
throw new IllegalArgumentException("unknown type for value " + value);
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.