dev-tools:s3_object_storage

This is an old revision of the document!


S3 Object Storage

The object storage is internally hosted via minIO. There is a web-interface which allows browsing through the store. Data can also be downloaded via the minIO client.

Web-Interface: https://minio.strg1.lan/login

minIO client setup

Install instructions: https://min.io/docs/minio/linux/reference/minio-mc.html

S3 API URL: https://s3.strg1.lan

The CLI-tool first requires the set-up of an alias for the storage. For doing so, you need an access key, which can be created via the web-interface.

You can then either directly download a credentials.json, or do it yourself with the given accessKey and secretKey. NOTE: The URL provided by the web-interface is wrong. You need to specify the S3 API URL specified above.

Assuming you have the mc command available in the command-line, and the credentials in credentials.json, you can setup an alias with the name “strg1” via: mc alias import strg1 credentials.json

If everything worked out, you should be able to ls the storage:

mc ls strg1
Failed to verify certificate

If you get the following error, you still need to add the certificate (see bottom of the page) to your system.

 mc: <ERROR> Unable to list folder. Get "https://s3.strg1.lan/": tls: failed to verify certificate: x509: certificate signed by unknown authority 

For adding the certificate:

1. write the certificate it to a file.

2. move it to the path where certificates are stored on your OS. For OpenSuse Leap 15.6:

/etc/pki/trust/anchors/

For Ubuntu it should be:

/usr/local/share/ca-certificates/

3.Run the following command to update the certificates (same command on Ubuntu or OpenSuse)

update-ca-certificates

.

{"url":"https://s3.strg1.lan", 
 "accessKey":"YOURACCESSKEY",
 "secretKey":"YOURSECRETKEY",
 "api":"s3v4",
 "path":"auto"}

Accessing the Buckets from python

For accessing the storage in python you can either use the minio-client provided by the `minio` package, or the `s3fs` package together with `boto3`. The latter has wider support, and is e.g. required for using zarr.

An important part is that the ssl certificate of the storage is self signed, so a certificate file has to be provided.

Note: For storing secrets, NEVER save them in your code directly, as the code will be pushed to repositories. Instead, create a file .env in your project root, where you store your credentials, and later load them in python using the `dotenv` package.

Example .env file:

STORAGE_ACCESS_KEY=...
STORAGE_SECRET_KEY=...
ENDPOINT=s3.strg1.lan
ENDPOINT_FULL=https://s3.strg1.lan

This assumes a ca.crt.cer file is available.

For this snippet, the following dependency was used in the pyproject.toml file:

 
dotenv = "^0.9.9"
minio = "^7.2.15"
 

Code Snippet

  1. from pathlib import Path
  2.  
  3. import minio
  4. import urllib3
  5. import os
  6. from dotenv import load_dotenv
  7.  
  8. # Specify the path to your custom CA certificate
  9. # Set the environment variable for the custom CA certificate
  10. _root = Path(__file__).parent.parent
  11.  
  12. # Create a boto3 session with your custom configuration
  13.  
  14.  
  15. # load credentials from .env file
  16. load_dotenv()
  17. access_key_id = str(os.getenv("STORAGE_ACCESS_KEY"))
  18. secret_access_key = str(os.getenv("STORAGE_SECRET_KEY"))
  19. endpoint_url = str(os.getenv("ENDPOINT"))
  20. endpoint_url_full = str(os.getenv("ENDPOINT_FULL"))
  21.  
  22. ca_cert_path = _root/"ca.crt.cer"
  23. assert Path(ca_cert_path).is_file()
  24.  
  25. # S3
  26. # when making requests to endpoints with self-signed certs
  27. http_client = urllib3.PoolManager(
  28. cert_reqs="CERT_REQUIRED",
  29. ca_certs=_root/"ca.crt.cer",
  30. )
  31.  
  32. # create Minio client with custom http client
  33. # replace with boto3 client for AWS ?
  34. minio_client = minio.Minio(
  35. endpoint_url,
  36. secure=True,
  37. access_key=access_key_id,
  38. secret_key=secret_access_key,
  39. http_client=http_client,
  40. )
  41. print(minio_client.bucket_exists("rekonas-dataset-101-nights"))

This assumes a ca.crt.cer file is available.

For this snippet, the following dependency was used in the pyproject.toml file. Note: There are weird dependency issues in poetry with s3fs and boto3. The dependency below should however resolve those.

dotenv = "^0.9.9"
s3fs = {extras = ["boto3"], version = ">=2023.12.0"}

Code Snippet

  1. import os
  2. from pathlib import Path
  3. from dotenv import load_dotenv
  4.  
  5. import s3fs
  6.  
  7. # load credentials from .env file
  8. load_dotenv()
  9. access_key_id = os.getenv("STORAGE_ACCESS_KEY")
  10. secret_access_key = os.getenv("STORAGE_SECRET_KEY")
  11. endpoint_url = os.getenv("ENDPOINT")
  12. endpoint_url_full = os.getenv("ENDPOINT_FULL")
  13.  
  14.  
  15. # Specify the path to your custom CA certificate
  16. ca_cert_path = "ca.crt.cer"
  17. assert Path(ca_cert_path).is_file()
  18.  
  19.  
  20. # Create s3fs filesystem with custom cert
  21. fs = s3fs.S3FileSystem(
  22. client_kwargs={'endpoint_url': endpoint_url_full,
  23. 'verify': str(ca_cert_path)},
  24. key=access_key_id,
  25. secret=secret_access_key,
  26. use_ssl=True,
  27. )
  28.  
  29. # sanity check, exchange with some other bucket of interest
  30. assert fs.exists("rekonas-dataset-nch-sleep-databank")
  31.  
  32. # Create zarr store and group within a bucket
  33. # import zarr
  34. # store = s3fs.S3Map(root='test-bucket-fabricio-zarr', s3=fs)
  35. # z = zarr.group(store=store, path="test_group")

Dependencies:

dotenv = "^0.9.9"
obstore = "^0.6.0"
zarr = ">=3.0.8" 
  1. import os
  2. import ssl
  3. from pathlib import Path
  4.  
  5. import obstore as obs
  6. import zarr.storage # noqa: F401
  7. from dotenv import load_dotenv # noqa: F401
  8. from obstore.store import (
  9. MemoryStore, # noqa: F401
  10. S3Store,
  11. )
  12. from zarr.storage import ObjectStore # noqa: F401
  13.  
  14. # load credentials from .env file
  15. load_dotenv()
  16. access_key_id = str(os.getenv("STORAGE_ACCESS_KEY"))
  17. secret_access_key = str(os.getenv("STORAGE_SECRET_KEY"))
  18. endpoint_url = str(os.getenv("ENDPOINT"))
  19. endpoint_url_full = str(os.getenv("ENDPOINT_FULL"))
  20.  
  21. # Specify the path to your custom CA certificate
  22. ca_cert_path = "ca.crt.cer"
  23. assert Path(ca_cert_path).is_file()
  24.  
  25. # Create SSL context for custom certificate
  26. ssl_context = ssl.create_default_context()
  27. # ssl_context.check_hostname = False
  28. # ssl_context.verify_mode = ssl.CERT_NONE
  29. ssl_context.load_verify_locations(ca_cert_path)
  30.  
  31. ob_store = S3Store(
  32. "test-bucket-fabricio-zarr",
  33. endpoint=endpoint_url_full,
  34. access_key_id=access_key_id, # Should be access_key_id, not secret_access_key
  35. # Should be secret_access_key, not access_key_id
  36. secret_access_key=secret_access_key,
  37. virtual_hosted_style_request=False,
  38. region="ch-bsl-1", # Add the required region
  39. client_options={"allow_invalid_certificates": True},
  40. )
  41.  
  42. # ls to see files that exist in bucket
  43. list_of_files = obs.list(ob_store).collect()
  44.  
  45. # create a small array for testing zarr.
  46. store = zarr.storage.ObjectStore(store=ob_store)
  47. zarr.create_array(store=store, shape=(2,), dtype="float64")

This assumes that you have already set up a MLFLow server with a Minio artifact store. In addition, the client has to also provide the credentials to the Minio store, as it directly sends it to s3, not via the mlflow. You can do so by setting a few environment variables. Note that the code below assumes you have a local file called 'ca.crt.cer', and that you have the credentials in an env file that is loaded via python-dotenv.

print("\nLoading environment variables from .env file...")
load_dotenv()
 
# --- Configuration: Bridge your .env to what MLflow/boto3 expects ---
os.environ["MLFLOW_TRACKING_URI"] = "https://mlflow.basel.lan"
os.environ["MLFLOW_S3_ENDPOINT_URL"] = "https://s3.strg1.lan"
os.environ["AWS_ACCESS_KEY_ID"] = os.getenv("STORAGE_ACCESS_KEY")
os.environ["AWS_SECRET_ACCESS_KEY"] = os.getenv("STORAGE_SECRET_KEY")
 
ca_cert_path = "ca.crt.cer"
if not os.path.exists(ca_cert_path):
    raise FileNotFoundError(f"Certificate file not found at: {ca_cert_path}")
os.environ["AWS_CA_BUNDLE"] = ca_cert_path
# uncomment this if you want to ignore an insecure TLS/unsigned certificate
#os.environ["MLFLOW_TRACKING_INSECURE_TLS"] = "true"
os.environ["MLFLOW_TRACKING_CLIENT_CERT_PATH"] = ca_cert_path

Below you can find the certificate as of May 06, 2025. Simply save it into a ca.crt.cer file to use as shown above.

-----BEGIN CERTIFICATE-----
MIIB+DCCAX+gAwIBAgIUNOXxe14mKQCbT9gKVouhzCD3TL0wCgYIKoZIzj0EAwQw
FTETMBEGA1UEAwwKUmVrb25hcyBDQTAeFw0yMzAyMjIxNDA0MzVaFw0zMzAyMTkx
NDA0MzVaMBUxEzARBgNVBAMMClJla29uYXMgQ0EwdjAQBgcqhkjOPQIBBgUrgQQA
IgNiAAR6Nija/wfPLwmX/KW2rsowfxbLIJ3JMTJmltFOqrul074ZQkVQWsyShp67
2GlehcDP+oLR7VJg8oCEIFDQYug00x2QWlnqDHMxkE0ZtN6vH5lq/RaUUf0hdYy3
eP6l+qijgY8wgYwwDAYDVR0TBAUwAwEB/zAdBgNVHQ4EFgQUWtnLStq5o/+O4B2Y
3Fsc11dadqwwUAYDVR0jBEkwR4AUWtnLStq5o/+O4B2Y3Fsc11dadqyhGaQXMBUx
EzARBgNVBAMMClJla29uYXMgQ0GCFDTl8XteJikAm0/YClaLocwg90y9MAsGA1Ud
DwQEAwIBBjAKBggqhkjOPQQDBANnADBkAjBVVoWkAHc2jQpkobopyGhS+bLDRjEm
3ZtGVo9Blvk0TNciDBSgeQ6onuAjorLP3/ICMD5G2CR4rmfCh6Ed+mag7wMlBQYf
1q5iT+kB7u9gG8lhIeB+1MT5JIeIK7ygmC6g/Q==
-----END CERTIFICATE-----

Issues with VPN: MTU Problems

In case you have issues accessing the minio store from VPN, check your MTU settings for the vpn:

 ip a | grep mtu

A working mtu for packages transmitted over the VPN via the docker bridge interface is 1360. If you see a different value, you may manually set the value with the following command. Note that this will not change the value `ip a` command.

Replace `<your_vpn_interface>`.

sudo iptables -I FORWARD -i docker0 -o <your_vpn_interface> -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1360

RKNS (Zarr V2) from Minio ZIP

# /// script
# requires-python = ">=3.8"
# dependencies = [
#     "boto3>=1.40.49",
#     "python-dotenv>=0.9.9",
#     "packaging>=25.0",
#     "rkns==0.6.2",
#     "s3fs[boto3]>=2023.12.0",
#     "typing-extensions>=4.15.0",
# ]
# ///
import os
from pathlib import Path
from dotenv import load_dotenv
from fsspec.implementations.zip import ZipFileSystem
from fsspec.mapping import FSMap
import s3fs
import zarr
import rkns

# load credentials from .env file
load_dotenv()
access_key_id = os.getenv("STORAGE_ACCESS_KEY")
secret_access_key = os.getenv("STORAGE_SECRET_KEY")
endpoint_url = os.getenv("ENDPOINT")
endpoint_url_full = os.getenv("ENDPOINT_FULL")


# Specify the path to your custom CA certificate
ca_cert_path = "ca.crt.cer"
assert Path(ca_cert_path).is_file()


# Create s3fs filesystem with custom cert
fs = s3fs.S3FileSystem(
    client_kwargs={"endpoint_url": endpoint_url_full, "verify": str(ca_cert_path)},
    key=access_key_id,
    secret=secret_access_key,
    use_ssl=True,
)

s3_path = "rekonas-dataset-shhs-rkns/sub-shhs200001_ses-01_task-sleep_eeg.rkns"

zip_fs = ZipFileSystem(fo=fs.open(s3_path, "rb"))
store = zarr.storage.FSStore(url='', fs=zip_fs)
rkns_obj = rkns.from_RKNS(store)
print(rkns_obj.tree)
  • dev-tools/s3_object_storage.1760546134.txt.gz
  • Last modified: 2025/10/15 16:35
  • by fabricio