I have a RDD looks like this
[((0, Row(event_type_new=u'ALERT|VEHICLE_HEALTH_DATA|CHANGE_IN_HEALTH|DTC|B109F|', day=u'Fri')), 0),
((1, Row(event_type_new=u'ALERT|VEHICLE_HEALTH_DATA|CHANGE_IN_HEALTH|DTC|B1115|HIGH MOUNTED STOP LAMP CONTROL', day=u'Sat')), 2)]
which has an index, a Row object ( event_type_new
and day
), followed by a prediction (integer). How can I create a DataFrame with 3 columns including event_type_new
, day
, and Prediction
.
I am using Spark 1.6.2 with PySpark API.
Thanks!
Transform your list into RDD first. Then map each element to Row
. You can transform list of Row
to dataframe easily using .toDF()
method
from pyspark.sql import Row
ls = [((0, Row(event_type_new=u'ALERT|VEHICLE_HEALTH_DATA|CHANGE_IN_HEALTH|DTC|B109F|', day=u'Fri')), 0),
((1, Row(event_type_new=u'ALERT|VEHICLE_HEALTH_DATA|CHANGE_IN_HEALTH|DTC|B1115|HIGH MOUNTED STOP LAMP CONTROL', day=u'Sat')), 2)]
ls_rdd = sc.parallelize(ls)
ls_row = ls_rdd.map(lambda x: Row(**{'day': str(x[0][1].day), 'event_type': str(x[0][1].event_type_new), 'prediction': int(x[1])}))
df = ls_row.toDF()
When you run df.show()
, it will look like this:
+---+--------------------+----------+
|day| event_type|prediction|
+---+--------------------+----------+
|Fri|ALERT|VEHICLE_HEA...| 0|
|Sat|ALERT|VEHICLE_HEA...| 2|
+---+--------------------+----------+
I assume that this a collected RDD
, because it looks like you got a list
with tuples of a combination of Row
and int
objects. You can get your desired output with the following:
from pyspark.sql import Row
lst = [((0, Row(event_type_new=u'ALERT|VEHICLE_HEALTH_DATA|CHANGE_IN_HEALTH|DTC|B109F|', day=u'Fri')), 0),
((1, Row(event_type_new=u'ALERT|VEHICLE_HEALTH_DATA|CHANGE_IN_HEALTH|DTC|B1115|HIGH MOUNTED STOP LAMP CONTROL', day=u'Sat')), 2)]
output = []
for row in lst:
vals = tuple(row[0][1]) + (row[1],)
fields = row[0][1].__fields__ + ['prediction']
row = Row(*vals)
row.__fields__ = fields
output.append(row)
df = sc.parallelize(output).toDF()
df.show()
You should get something like the following:
+---+--------------------+----------+
|day| event_type_new|prediction|
+---+--------------------+----------+
|Fri|ALERT|VEHICLE_HEA...| 0|
|Sat|ALERT|VEHICLE_HEA...| 2|
+---+--------------------+----------+
I hope this helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.