This is an old revision of the document!
S3 Object Storage
The object storage is internally hosted via minIO. There is a web-interface which allows browsing through the store. Data can also be downloaded via the minIO client.
Web-Interface: https://minio.strg1.lan/login
minIO client setup
Install instructions: https://min.io/docs/minio/linux/reference/minio-mc.html
S3 API URL: https://s3.strg1.lan
The CLI-tool first requires the set-up of an alias for the storage. For doing so, you need an access key, which can be created via the web-interface.
You can then either directly download a credentials.json, or do it yourself with the given accessKey and secretKey. NOTE: The URL provided by the web-interface is wrong. You need to specify the S3 API URL specified above.
Assuming you have the mc command available in the command-line, and the credentials in credentials.json, you can setup an alias with the name “strg1” via:
mc alias import strg1 credentials.json
If everything worked out, you should be able to ls the storage:
mc ls strg1
Failed to verify certificate
If you get the following error, you still need to add the certificate (see bottom of the page) to your system.
mc: <ERROR> Unable to list folder. Get "https://s3.strg1.lan/": tls: failed to verify certificate: x509: certificate signed by unknown authority
For adding the certificate:
1. write the certificate it to a file.
2. move it to the path where certificates are stored on your OS. For OpenSuse Leap 15.6:
/etc/pki/trust/anchors/
For Ubuntu it should be:
/usr/local/share/ca-certificates/
3.Run the following command to update the certificates (same command on Ubuntu or OpenSuse)
update-ca-certificates
.
Example ''credentials.json'':
{"url":"https://s3.strg1.lan",
"accessKey":"YOURACCESSKEY",
"secretKey":"YOURSECRETKEY",
"api":"s3v4",
"path":"auto"}
Accessing the Buckets from python
For accessing the storage in python you can either use the minio-client provided by the `minio` package, or the `s3fs` package together with `boto3`. The latter has wider support, and is e.g. required for using zarr.
An important part is that the ssl certificate of the storage is self signed, so a certificate file has to be provided.
Note: For storing secrets, NEVER save them in your code directly, as the code will be pushed to repositories.
Instead, create a file .env in your project root, where you store your credentials, and later load them in python using the `dotenv` package.
Example .env file:
STORAGE_ACCESS_KEY=... STORAGE_SECRET_KEY=... ENDPOINT=s3.strg1.lan ENDPOINT_FULL=https://s3.strg1.lan
Example Using minio package:
This assumes a ca.crt.cer file is available.
For this snippet, the following dependency was used in the pyproject.toml file:
dotenv = "^0.9.9" minio = "^7.2.15"
Code Snippet
from pathlib import Path import minio import urllib3 import os from dotenv import load_dotenv # Specify the path to your custom CA certificate # Set the environment variable for the custom CA certificate _root = Path(__file__).parent.parent # Create a boto3 session with your custom configuration # load credentials from .env file load_dotenv() access_key_id = str(os.getenv("STORAGE_ACCESS_KEY")) secret_access_key = str(os.getenv("STORAGE_SECRET_KEY")) endpoint_url = str(os.getenv("ENDPOINT")) endpoint_url_full = str(os.getenv("ENDPOINT_FULL")) ca_cert_path = _root/"ca.crt.cer" assert Path(ca_cert_path).is_file() # S3 # when making requests to endpoints with self-signed certs http_client = urllib3.PoolManager( cert_reqs="CERT_REQUIRED", ca_certs=_root/"ca.crt.cer", ) # create Minio client with custom http client # replace with boto3 client for AWS ? minio_client = minio.Minio( endpoint_url, secure=True, access_key=access_key_id, secret_key=secret_access_key, http_client=http_client, ) print(minio_client.bucket_exists("rekonas-dataset-101-nights"))
Example Using s3fs and boto3 package (required for zarr v2):
This assumes a ca.crt.cer file is available.
For this snippet, the following dependency was used in the pyproject.toml file.
Note: There are weird dependency issues in poetry with s3fs and boto3.
The dependency below should however resolve those.
dotenv = "^0.9.9"
s3fs = {extras = ["boto3"], version = ">=2023.12.0"}
Code Snippet
import os from pathlib import Path from dotenv import load_dotenv import s3fs # load credentials from .env file load_dotenv() access_key_id = os.getenv("STORAGE_ACCESS_KEY") secret_access_key = os.getenv("STORAGE_SECRET_KEY") endpoint_url = os.getenv("ENDPOINT") endpoint_url_full = os.getenv("ENDPOINT_FULL") # Specify the path to your custom CA certificate ca_cert_path = "ca.crt.cer" assert Path(ca_cert_path).is_file() # Create s3fs filesystem with custom cert fs = s3fs.S3FileSystem( client_kwargs={'endpoint_url': endpoint_url_full, 'verify': str(ca_cert_path)}, key=access_key_id, secret=secret_access_key, use_ssl=True, ) # sanity check, exchange with some other bucket of interest assert fs.exists("rekonas-dataset-nch-sleep-databank") # Create zarr store and group within a bucket # import zarr # store = s3fs.S3Map(root='test-bucket-fabricio-zarr', s3=fs) # z = zarr.group(store=store, path="test_group")
Example Using obstore (required for zarr v3):
Dependencies:
dotenv = "^0.9.9" obstore = "^0.6.0" zarr = ">=3.0.8"
import os import ssl from pathlib import Path import obstore as obs import zarr.storage # noqa: F401 from dotenv import load_dotenv # noqa: F401 from obstore.store import ( MemoryStore, # noqa: F401 S3Store, ) from zarr.storage import ObjectStore # noqa: F401 # load credentials from .env file load_dotenv() access_key_id = str(os.getenv("STORAGE_ACCESS_KEY")) secret_access_key = str(os.getenv("STORAGE_SECRET_KEY")) endpoint_url = str(os.getenv("ENDPOINT")) endpoint_url_full = str(os.getenv("ENDPOINT_FULL")) # Specify the path to your custom CA certificate ca_cert_path = "ca.crt.cer" assert Path(ca_cert_path).is_file() # Create SSL context for custom certificate ssl_context = ssl.create_default_context() # ssl_context.check_hostname = False # ssl_context.verify_mode = ssl.CERT_NONE ssl_context.load_verify_locations(ca_cert_path) ob_store = S3Store( "test-bucket-fabricio-zarr", endpoint=endpoint_url_full, access_key_id=access_key_id, # Should be access_key_id, not secret_access_key # Should be secret_access_key, not access_key_id secret_access_key=secret_access_key, virtual_hosted_style_request=False, region="ch-bsl-1", # Add the required region client_options={"allow_invalid_certificates": True}, ) # ls to see files that exist in bucket list_of_files = obs.list(ob_store).collect() # create a small array for testing zarr. store = zarr.storage.ObjectStore(store=ob_store) zarr.create_array(store=store, shape=(2,), dtype="float64")
Example Using mlflow
This assumes that you have already set up a MLFLow server with a Minio artifact store. In addition, the client has to also provide the credentials to the Minio store, as it directly sends it to s3, not via the mlflow. You can do so by setting a few environment variables. Note that the code below assumes you have a local file called 'ca.crt.cer', and that you have the credentials in an env file that is loaded via python-dotenv.
print("\nLoading environment variables from .env file...") load_dotenv() # --- Configuration: Bridge your .env to what MLflow/boto3 expects --- os.environ["MLFLOW_TRACKING_URI"] = "https://mlflow.basel.lan" os.environ["MLFLOW_S3_ENDPOINT_URL"] = "https://s3.strg1.lan" os.environ["AWS_ACCESS_KEY_ID"] = os.getenv("STORAGE_ACCESS_KEY") os.environ["AWS_SECRET_ACCESS_KEY"] = os.getenv("STORAGE_SECRET_KEY") ca_cert_path = "ca.crt.cer" if not os.path.exists(ca_cert_path): raise FileNotFoundError(f"Certificate file not found at: {ca_cert_path}") os.environ["AWS_CA_BUNDLE"] = ca_cert_path # uncomment this if you want to ignore an insecure TLS/unsigned certificate #os.environ["MLFLOW_TRACKING_INSECURE_TLS"] = "true" os.environ["MLFLOW_TRACKING_CLIENT_CERT_PATH"] = ca_cert_path
SSL Certificate
Below you can find the certificate as of May 06, 2025.
Simply save it into a ca.crt.cer file to use as shown above.
-----BEGIN CERTIFICATE----- MIIB+DCCAX+gAwIBAgIUNOXxe14mKQCbT9gKVouhzCD3TL0wCgYIKoZIzj0EAwQw FTETMBEGA1UEAwwKUmVrb25hcyBDQTAeFw0yMzAyMjIxNDA0MzVaFw0zMzAyMTkx NDA0MzVaMBUxEzARBgNVBAMMClJla29uYXMgQ0EwdjAQBgcqhkjOPQIBBgUrgQQA IgNiAAR6Nija/wfPLwmX/KW2rsowfxbLIJ3JMTJmltFOqrul074ZQkVQWsyShp67 2GlehcDP+oLR7VJg8oCEIFDQYug00x2QWlnqDHMxkE0ZtN6vH5lq/RaUUf0hdYy3 eP6l+qijgY8wgYwwDAYDVR0TBAUwAwEB/zAdBgNVHQ4EFgQUWtnLStq5o/+O4B2Y 3Fsc11dadqwwUAYDVR0jBEkwR4AUWtnLStq5o/+O4B2Y3Fsc11dadqyhGaQXMBUx EzARBgNVBAMMClJla29uYXMgQ0GCFDTl8XteJikAm0/YClaLocwg90y9MAsGA1Ud DwQEAwIBBjAKBggqhkjOPQQDBANnADBkAjBVVoWkAHc2jQpkobopyGhS+bLDRjEm 3ZtGVo9Blvk0TNciDBSgeQ6onuAjorLP3/ICMD5G2CR4rmfCh6Ed+mag7wMlBQYf 1q5iT+kB7u9gG8lhIeB+1MT5JIeIK7ygmC6g/Q== -----END CERTIFICATE-----
Issues with VPN: MTU Problems
In case you have issues accessing the minio store from VPN, check your MTU settings for the vpn:
ip a | grep mtu
A working mtu for packages transmitted over the VPN via the docker bridge interface is 1360. If you see a different value, you may manually set the value with the following command. Note that this will not change the value `ip a` command.
Replace `<your_vpn_interface>`.
sudo iptables -I FORWARD -i docker0 -o <your_vpn_interface> -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1360