I am quite new here. I will try to be clear.
I have created a hdf5 file with pytables and i have filled it with data. Then, i have uploaded my file from the /tmp/ directory of my aws cluster to a S3 bucket using this code:
s3_client.upload_file(local_file_key, aws_bucket_name, aws_file_key)
I have downloaded the same hdf5 file from S3 and store it again in the /tmp/ directory of my aws cluster using this code:
s3_client.download_file(aws_bucket_name, aws_file_key, another_local_file_key)
Until there, there no issue. The problem appears when i want to read the uploaded file.
tables.open_file(another_local_file_key)
File "H5F.c", line 604, in H5Fopen
unable to open file
File "H5Fint.c", line 1087, in H5F_open
unable to read superblock
File "H5Fsuper.c", line 277, in H5F_super_read
file signature not found
End of HDF5 error back trace
Unable to open/create file '/tmp/from_aws_dataset.hdf5'
Then, i have made some verifications in the shell of my cluster.
[user@cluster_ip_address tmp$] file my_dataset.hdf5
returns
my_dataset.hdf5: Hierarchical Data Format (version 5) data
But [user@cluster_ip_address tmp$] file from_aws_dataset.hdf5
returns
from_aws_dataset.hdf5: data
And in my python code,
tables.is_pytables_file('/tmp/from_aws_dataset.hdf5')
returns None
boto3 version: '1.4.7', python version: 2.7, tables version: '3.4.2', h5py version: '2.7.1'
Could someone help me, please?
My first guess would be that the file was transferred in text mode. The HDF5 file signature was designed to detect that sort of munging.
Have you tried using boto3's uploadfileobj() method instead of upload_file()? It looks like the former is for binary files like HDF5. It's unclear from the boto docs if the latter implies text.
with open("myfile.h5", "rb") as f:
s3.upload_fileobj(f, "bucket-name", "key-name")
It also looks like you can specify binary transfers explicitly using the put() method, like so:
s3.Object('mybucket', 'myfile.h5').put(Body=open('/tmp/myfile.h5', 'rb'))
The HDF5 file signature is documentedhere , if you are interested. Just scroll down a little to the first field of the superblock where is says 'Format Signature'.
Old post but in the spirit of trying to close out questions...
Can you try and manually download the file via the AWS S3 console, and read it directly in Python. If that fails, then I would guess it that you are uploading the file incorrectly. If it works, can you try to download the file using this command:
conn = boto.connect_s3('<<YOUR KEY ID>>','<<YOUR SECRET ACCESS KEY>>') #Make Connection
bucket = conn.get_bucket(THE_NAME_OF_YOUR_BUCKET) # Get bucket object
k = Key(bucket,FILE_AND_PATH) #Get Key object of file
k.get_contents_to_filename(LOCAL_PATH_TO_SAVE) #Saves the file to local. Should save and preserve everything
Have a look at this, it is quite useful: https://techietweak.wordpress.com/2016/05/16/file-handling-in-aws-s3-with-python-boto-library/
For me this worked:
import boto3
s3 = boto3.resource('s3', region_name)
bucket = s3.Bucket(bucket_name)
with open(hdf5_file, 'rb') as f:
bucket.Object(key).put(Body=f)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.