How to convert RDD into Dataframe with Pyspark?

Question

I have an RDD below, which I have received from a client. How can I convert this RDD into a Dataframe?

["Row(Moid=2, Tripid='11', Tstart='2007-05-28 08:53:14.040', Tend='2007-05-28 08:53:16.040', Xstart='9738.73', Ystart='103.246', Xend='9743.73', Yend='114.553')"]

Answer 1

Note: This is not really an answer, but I don't understand as to what OP is asking about. Writing this in comment section would not have been possible, but may be we can take it forward from here.

OP says that he/she receives an RDD (purportedly a single element) from Client -

["Row(Moid=2, Tripid='11', Tstart='2007-05-28 08:53:14.040', Tend='2007-05-28 08:53:16.040', Xstart='9738.73', Ystart='103.246', Xend='9743.73', Yend='114.553')"]

Now, OP wants to translate that to a DataFrame. To translate that, one has to de-string the Row object, but OP has to clarify what he needs.

from pyspark.sql import Row
rdd_from_client = [Row(Moid=2, Tripid='11', Tstart='2007-05-28 08:53:14.040', Tend='2007-05-28 08:53:16.040', Xstart='9738.73', Ystart='103.246', Xend='9743.73', Yend='114.553')]
df = sqlContext.createDataFrame(rdd_from_client)
df.show(truncate=False)
+----+-----------------------+------+-----------------------+-------+-------+-------+-------+
|Moid|Tend                   |Tripid|Tstart                 |Xend   |Xstart |Yend   |Ystart |
+----+-----------------------+------+-----------------------+-------+-------+-------+-------+
|2   |2007-05-28 08:53:16.040|11    |2007-05-28 08:53:14.040|9743.73|9738.73|114.553|103.246|
+----+-----------------------+------+-----------------------+-------+-------+-------+-------+

How to convert RDD into Dataframe with Pyspark?

Question

1 answers

solution1
0 2019-02-13 06:33:31

How to convert RDD into Dataframe with Pyspark?

Question

1 answers

solution1 0 2019-02-13 06:33:31

solution1
0 2019-02-13 06:33:31