[英]PySpark on Databricks getting Relative path in absolute URI: when trying to read in Json Files with DateStamps
[英]when trying to read a file in databricks i get IllegalArgumentException: Path must be absolute
我是 databricks 的新手,所以我嘗試使用 spark.read.option 讀取 .text 文件,如下面的代碼片段所示:
df = None
import pandas as pd
from pyspark.sql.functions import lit
for category in filtred_file_list:
data_files = os.listdir('HMP_Dataset/'+category)
for data_file in data_files:
print(data_file)
temp_df = spark.read.option('header', 'falso').option('delimiter'," ").csv("HMP_Dataset/"+category+"/"+data_file, schema=scheme)
temp_df =temp_df.withColumn('class', lit(category))
temp_df = temp_df.withColumn('source', lit(data_file))
if df is None :
df = temp_df
else :
df.union(temp_df)
不幸的是,我收到以下錯誤:
IllegalArgumentException: Path must be absolute: HMP_Dataset/Brush_teeth/Accelerometer-2011-04-11-13-28-18-brush_teeth-f1.txt
使用“file:/databricks/driver/HMP_Dataset/”+類別bla bla
而不是 "HMP_Dataset/"+category+"/ 等等...
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.