problem reading HDF5 on s3

20 views (last 30 days)
Benjamin Dichter
Benjamin Dichter on 22 Feb 2022
Commented: Benjamin Dichter on 23 Feb 2022
>> H5F.open('https://dandiarchive.s3.amazonaws.com/blobs/58c/537/58c53789-eec4-4080-ad3b-207cf2a1cac9')
Error using hdf5lib2
Unable to access
'https://dandiarchive.s3.amazonaws.com/blobs/58c/537/58c53789-eec4-4080-ad3b-207cf2a1cac9'.
The specified URL scheme is invalid.
Error in H5F.open (line 130)
file_id = H5ML.hdf5lib2('H5Fopen', filename, flags, fapl, is_remote);
This works when using the same URL with h5py in Python:
from h5py import File
file = File("https://dandiarchive.s3.amazonaws.com/blobs/58c/537/58c53789-eec4-4080-ad3b-207cf2a1cac9", "r", driver="ros3")
file.keys()
<KeysViewHDF5 ['acquisition', 'analysis', 'file_create_date', 'general', 'identifier', 'processing', 'session_description', 'session_start_time', 'specifications', 'stimulus', 'timestamps_reference_time', 'units']>

Answers (1)

Yongjian Feng
Yongjian Feng on 23 Feb 2022
In python, you seem to use read_only flag ("r"). Maybe you want to try:
H5F.open('https://dandiarchive.s3.amazonaws.com/blobs/58c/537/58c53789-eec4-4080-ad3b-207cf2a1cac9', 'H5F_ACC_RDONLY')
  1 Comment
Benjamin Dichter
Benjamin Dichter on 23 Feb 2022
That does not appear to be the issue. The documentation for H5F indicates that it should be possible to include only the s3 path:
file_id = H5F.open(URL) opens the hdf5 file at a remote location
for read-only access and returns the file identifier, file_id.
Also, this use-case is demonstrated in an example:
Example: Open a file in Amazon S3 in read-only mode with
default file access properties.
H5F.close(fid);
I tried some optional arguments, which did not help:
>> H5F.open('https://dandiarchive.s3.amazonaws.com/blobs/58c/537/58c53789-eec4-4080-ad3b-207cf2a1cac9', 'H5F_ACC_RDONLY')
Not enough input arguments.
Error in H5F.open (line 130)
file_id = H5ML.hdf5lib2('H5Fopen', filename, flags, fapl, is_remote);
>> H5F.open('https://dandiarchive.s3.amazonaws.com/blobs/58c/537/58c53789-eec4-4080-ad3b-207cf2a1cac9', 'H5F_ACC_RDONLY', 'H5P_DEFAULT')
Error using hdf5lib2
Unable to access
'https://dandiarchive.s3.amazonaws.com/blobs/58c/537/58c53789-eec4-4080-ad3b-207cf2a1cac9'. The
specified URL scheme is invalid.
Error in H5F.open (line 130)
file_id = H5ML.hdf5lib2('H5Fopen', filename, flags, fapl, is_remote);
It looks like MATLAB is doing validation on the input, ensuring that the path starts with s3://, which mine does not.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by