I have some s3 files as s3://test-shivi/blah1/blah1.parquet
, s3://test-shivi/blah2/blah2.parquet
, s3://test-shivi/blah3/NONE
Now I want to load all the parquet via spark such as
df = spark.read.parquet("s3a:///test-shivi/*.*.parquet", schema=spark_schema)
But as blah3
doesn't have a matching file, I am getting this error.
pyspark.sql.utils.AnalysisException: Path does not exist: s3:
How can I safeguard/ skip those dirs that don't have any matching files?
Looks like the problem is that your path / wildcard pattern is wrong. Use this instead:
df = spark.read.parquet("s3a://test-shivi/*/*.parquet", schema=spark_schema)
If blah3
doesn't contain a parquet file, it won't match the pattern. That won't cause any issue.
But be careful with leading slashes: s3a:///
is wrong, it has to be s3a://{bucket}/
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.