简体   繁体   中英

Create a DataFrame from Lists

I try to create spark dataframe where I want to convert a list into a column.

Code:

def create_id(n):
    return ''.join(random.choice(string.ascii_lowercase + string.digits) for _ in range(50))

list_a = [create_id(25) for x in range(100)]
list_b = [create_id(25) for x in range(100)]

df = sc.parallelize([["a", list_a], ["b", list_b]]).toDF()

This results in

    _1                                                _2
0   a   [dv2vtdl3sobadlw1svs39emp2n9ogwzzek8b6gvug7xkp...
1   b   [kdv6b9ehqx1t8kbxd77ha8435bhduyxp0ilv6e09wpejx..

This will create 100 columns, not 100 rows:

df = sc.parallelize([list_a, list_b]).toDF()

Does anyone know how I can create a DataFrame with a two columns and 100 rows?

Using post Manually create a pyspark dataframe :

def create_id(n):
    return ''.join(random.choice(string.ascii_lowercase + string.digits) for _ in range(n))

list_a = [create_id(25) for _ in range(100)]
list_b = [create_id(25) for _ in range(100)]

df = spark.createDataFrame(zip(list_a,list_b), ['a', 'b'])

df.show()
+--------------------+--------------------+
|                   a|                   b|
+--------------------+--------------------+
|68blfnltq9fh81c4y...|3fl1wb5h2euy3sgd7...|
|ac37fb7qif71zzjpr...|xbqzzgiq9s6t5jiqm...|
|72rk28znzr6jjsi69...|5wvl528eg5y3p1lsk...|
|fioqnla3ijvl5769s...|1xvs2592uaxadv1o4...|
|7der8ld8fd6vl6g9d...|lrup85xitjz1uhsfl...|
|gycdap4hodaxxggw8...|h2oz370tzo6fnpke3...|
|ccvqcyzeynuks63pq...|iut82y2k1irfdvep1...|
|ngq29fnq2usghspgh...|z6j4mibrrjznoc9s8...|
|3qb6xyk5c1kbg0xq1...|l10ouv4w24d66e0ak...|
|u6dcvzede90xa7zz2...|hnh571t9szy0pwjrp...|
|3122g38k47jm24t7f...|tzbxlua574l88qtw1...|
|6pnva6ow83yxexqp1...|0nfj3v59b8jh0qv1g...|
|kl7xyftax3z32ot8o...|0sf6iyiyxpyvyd5kj...|
|36qwiiifgbzba4n8c...|xt4lpkjle8qynnlpo...|
|owsgb02rnov8qrhvw...|1zu4oisit25y2g14i...|
|bcmg0flh4d9tnvnjc...|7lfwx9kf7qens70p8...|
|6sdy1e8i3y1w0rtpr...|gw79bsrx8jlse6ixu...|
|83h5iq10clte1gcpr...|kblufuhlwabu7sv3u...|
|7g20ga0m756f0qsj7...|1fzo40vwtrp0kud8j...|
|07tw66i7dpcphczz1...|9a8c9ditp9dzomxh4...|
+--------------------+--------------------+
only showing top 20 rows

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM