I need to creat nested list .My txt data is like
(telophone number,time,delta time,lat,long)
...
0544144,23,86,40.761650,29.940929
0544147,23,104,40.768749,29.968599
0545525,20,86,40.761650,29.940929
0538333,21,184,40.764679,29.929543
05477900,21,204,40.773071,29.975010
0561554,23,47,40.764694,29.927397
...
also my code is
from pyspark import SparkContext
sc = SparkContext()
rdd_data = sc.textFile("data2.txt")
rdd_data_1 = rdd_data.map(lambda line: line.split(","))
tel0 = rdd_data_1.map(lambda line: int(line[0]))
time1 = rdd_data_1.map(lambda line: int(line[1]))
deltaTime2 = rdd_data_1.map(lambda line: int(line[2]))
lat3 = rdd_data_1.map(lambda line: float(line[3]))
lon4 = rdd_data_1.map(lambda line: float(line[4]))
tel0_list =tel0.collect()
time1_list =time1.collect()
deltaTime2_list =deltaTime2.collect()
lat3_list =lat3.collect()
lon4_list =lon4.collect()
As you can see each column have a mean ; telophone , time , delta time ,etc. But also each line must be use a list . If I want to see first telephone number ;
print tel0_list[0]
input:
0544144
It works as well. But I need to create each line list with it.
For example
Data[ ] list can be a lıst for each line . If I want to see data[1] , my input have to be like
(0544147,23,104,40.768749,29.968599)
How can I make it ?
Thanks
Since your text file is in a csv
format you can easily load it into a dataframe if you use Spark 2.x:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, DoubleType
spark = SparkSession.builder.getOrCreate()
schema = StructType([
StructField("tel", IntegerType(), True),
StructField("time", IntegerType(), True),
StructField("deltatime", IntegerType(), True),
StructField("lat", DoubleType(), True),
StructField("long", DoubleType(), True)
])
data = spark.read.csv("data2.txt", header=False, schema=schema)
Then you can access the data with:
>>> data.take(1)
[Row(tel=544144, time=23, deltatime=86, lat=40.76165, long=29.940929)]
Note: accessing data[1] in Spark does not make any sense since it is a distributed system.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.