簡體   English   中英

使用列表中的隨機值在 Pyspark 中創建數據框

[英]Create a dataframe in Pyspark using random values from a list

我需要將此代碼轉換為 PySpark 等效代碼。 我不能使用熊貓來創建數據框。

這就是我使用 Pandas 創建數據框的方式:

df['Name'] = np.random.choice(["Alex","James","Michael","Peter","Harry"], size=3)
df['ID'] = np.random.randint(1, 10, 3)
df['Fruit'] = np.random.choice(["Apple","Grapes","Orange","Pear","Kiwi"], size=3)

PySpark 中的數據框應如下所示:

df

Name   ID  Fruit
Alex   3   Apple
James  6   Grapes
Harry  5   Pear

我為 1 列嘗試了以下操作:

sdf1 = spark.createDataFrame([(k,) for k in ['Alex','James', 'Harry']]).orderBy(rand()).limit(6).show()
names = np.random.choice(["Alex","James","Michael","Peter","Harry"], size=3)
id = np.random.randint(1, 10, 3)
fruits = np.random.choice(["Apple","Grapes","Orange","Pear","Kiwi"], size=3)
columns = ['Name', 'ID', "Fruit"]
  
dataframe = spark.createDataFrame(zip(names, id, fruits), columns)

dataframe.show()

您可以先創建 pandas 數據幀,然后將其轉換為 Pyspark 數據幀。 或者您可以壓縮 3 個隨機 numpy 數組並創建像這樣的 spark 數據框:

import numpy as np

names = [str(x) for x in np.random.choice(["Alex", "James", "Michael", "Peter", "Harry"], size=3)]
ids = [int(x) for x in np.random.randint(1, 10, 3)]
fruits = [str(x) for x in np.random.choice(["Apple", "Grapes", "Orange", "Pear", "Kiwi"], size=3)]

df = spark.createDataFrame(list(zip(names, ids, fruits)), ["Name", "ID", "Fruit"])

df.show()

#+-------+---+------+
#|   Name| ID| Fruit|
#+-------+---+------+
#|  Peter|  8|  Pear|
#|Michael|  7|  Kiwi|
#|  Harry|  4|Orange|
#+-------+---+------+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM