Show pageOld revisionsBacklinksBack to top This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. ====== S3 Object Storage ====== The object storage is internally hosted via minIO. There is a web-interface which allows browsing through the store. Data can also be downloaded via the minIO client. **Web-Interface**: [[https://minio.strg1.lan/login | https://minio.strg1.lan/login]] ====== minIO client setup ====== **Install instructions**: [[https://min.io/docs/minio/linux/reference/minio-mc.html | https://min.io/docs/minio/linux/reference/minio-mc.html]] **S3 API URL**: [[https://s3.strg1.lan | https://s3.strg1.lan]] The CLI-tool first requires the set-up of an alias for the storage. For doing so, you need an access key, which can be created via the web-interface. You can then either directly download a credentials.json, or do it yourself with the given accessKey and secretKey. **NOTE**: The URL provided by the web-interface is wrong. You need to specify the S3 API URL specified above. Assuming you have the ''mc'' command available in the command-line, and the credentials in ''credentials.json'', you can setup an alias with the name "strg1" via: ''mc alias import strg1 credentials.json'' If everything worked out, you should be able to ''ls'' the storage: <code>mc ls strg1</code> == Failed to verify certificate == If you get the following error, you still need to add the certificate (see bottom of the page) to your system. <code> mc: <ERROR> Unable to list folder. Get "https://s3.strg1.lan/": tls: failed to verify certificate: x509: certificate signed by unknown authority </code> For adding the certificate: 1. write the certificate it to a file. 2. move it to the path where certificates are stored on your OS. For OpenSuse Leap 15.6: <code>/etc/pki/trust/anchors/</code> For Ubuntu it should be: <code>/usr/local/share/ca-certificates/</code> 3.Run the following command to update the certificates (same command on Ubuntu or OpenSuse) <code>update-ca-certificates</code> . ===== Example ''credentials.json'': ===== <code> {"url":"https://s3.strg1.lan", "accessKey":"YOURACCESSKEY", "secretKey":"YOURSECRETKEY", "api":"s3v4", "path":"auto"} </code> ====== Accessing the Buckets from python ====== For accessing the storage in python you can either use the minio-client provided by the `minio` package, or the `s3fs` package together with `boto3`. The latter has wider support, and is e.g. required for using zarr. An important part is that the ssl certificate of the storage is self signed, so a certificate file has to be provided. **Note: For storing secrets, NEVER save them in your code directly, as the code will be pushed to repositories.** Instead, create a file ''.env'' in your project root, where you store your credentials, and later load them in python using the `dotenv` package. Example ''.env'' file: <code> STORAGE_ACCESS_KEY=... STORAGE_SECRET_KEY=... ENDPOINT=s3.strg1.lan ENDPOINT_FULL=https://s3.strg1.lan </code> ===== Example Using minio package: ===== This assumes a ca.crt.cer file is available. For this snippet, the following dependency was used in the ''pyproject.toml'' file: <code> dotenv = "^0.9.9" minio = "^7.2.15" </code> **Code Snippet** <code Python [enable_line_numbers="true"]> from pathlib import Path import minio import urllib3 import os from dotenv import load_dotenv # Specify the path to your custom CA certificate # Set the environment variable for the custom CA certificate _root = Path(__file__).parent.parent # Create a boto3 session with your custom configuration # load credentials from .env file load_dotenv() access_key_id = str(os.getenv("STORAGE_ACCESS_KEY")) secret_access_key = str(os.getenv("STORAGE_SECRET_KEY")) endpoint_url = str(os.getenv("ENDPOINT")) endpoint_url_full = str(os.getenv("ENDPOINT_FULL")) ca_cert_path = _root/"ca.crt.cer" assert Path(ca_cert_path).is_file() # S3 # when making requests to endpoints with self-signed certs http_client = urllib3.PoolManager( cert_reqs="CERT_REQUIRED", ca_certs=_root/"ca.crt.cer", ) # create Minio client with custom http client # replace with boto3 client for AWS ? minio_client = minio.Minio( endpoint_url, secure=True, access_key=access_key_id, secret_key=secret_access_key, http_client=http_client, ) print(minio_client.bucket_exists("rekonas-dataset-101-nights")) </code> ===== Example Using s3fs and boto3 package (required for zarr v2): ===== This assumes a ca.crt.cer file is available. For this snippet, the following dependency was used in the ''pyproject.toml'' file. **Note**: There are weird dependency issues in poetry with s3fs and boto3. The dependency below should however resolve those. <code> dotenv = "^0.9.9" s3fs = {extras = ["boto3"], version = ">=2023.12.0"} </code> **Code Snippet** <code Python [enable_line_numbers="true"]> import os from pathlib import Path from dotenv import load_dotenv import s3fs # load credentials from .env file load_dotenv() access_key_id = os.getenv("STORAGE_ACCESS_KEY") secret_access_key = os.getenv("STORAGE_SECRET_KEY") endpoint_url = os.getenv("ENDPOINT") endpoint_url_full = os.getenv("ENDPOINT_FULL") # Specify the path to your custom CA certificate ca_cert_path = "ca.crt.cer" assert Path(ca_cert_path).is_file() # Create s3fs filesystem with custom cert fs = s3fs.S3FileSystem( client_kwargs={'endpoint_url': endpoint_url_full, 'verify': str(ca_cert_path)}, key=access_key_id, secret=secret_access_key, use_ssl=True, ) # sanity check, exchange with some other bucket of interest assert fs.exists("rekonas-dataset-nch-sleep-databank") # Create zarr store and group within a bucket # import zarr # store = s3fs.S3Map(root='test-bucket-fabricio-zarr', s3=fs) # z = zarr.group(store=store, path="test_group") </code> ===== Example Using obstore (required for zarr v3): ===== Dependencies: <code> dotenv = "^0.9.9" obstore = "^0.6.0" zarr = ">=3.0.8" </code> <code Python [enable_line_numbers="true"]> import os import ssl from pathlib import Path import obstore as obs import zarr.storage # noqa: F401 from dotenv import load_dotenv # noqa: F401 from obstore.store import ( MemoryStore, # noqa: F401 S3Store, ) from zarr.storage import ObjectStore # noqa: F401 # load credentials from .env file load_dotenv() access_key_id = str(os.getenv("STORAGE_ACCESS_KEY")) secret_access_key = str(os.getenv("STORAGE_SECRET_KEY")) endpoint_url = str(os.getenv("ENDPOINT")) endpoint_url_full = str(os.getenv("ENDPOINT_FULL")) # Specify the path to your custom CA certificate ca_cert_path = "ca.crt.cer" assert Path(ca_cert_path).is_file() # Create SSL context for custom certificate ssl_context = ssl.create_default_context() # ssl_context.check_hostname = False # ssl_context.verify_mode = ssl.CERT_NONE ssl_context.load_verify_locations(ca_cert_path) ob_store = S3Store( "test-bucket-fabricio-zarr", endpoint=endpoint_url_full, access_key_id=access_key_id, # Should be access_key_id, not secret_access_key # Should be secret_access_key, not access_key_id secret_access_key=secret_access_key, virtual_hosted_style_request=False, region="ch-bsl-1", # Add the required region client_options={"allow_invalid_certificates": True}, ) # ls to see files that exist in bucket list_of_files = obs.list(ob_store).collect() # create a small array for testing zarr. store = zarr.storage.ObjectStore(store=ob_store) zarr.create_array(store=store, shape=(2,), dtype="float64") </code> ===== Example Using mlflow ===== This assumes that you have already set up a MLFLow server with a Minio artifact store. In addition, the client has to also provide the credentials to the Minio store, as it directly sends it to s3, not via the mlflow. You can do so by setting a few environment variables. Note that the code below assumes you have a local file called 'ca.crt.cer', and that you have the credentials in an env file that is loaded via python-dotenv. <code python> print("\nLoading environment variables from .env file...") load_dotenv() # --- Configuration: Bridge your .env to what MLflow/boto3 expects --- os.environ["MLFLOW_TRACKING_URI"] = "https://mlflow.basel.lan" os.environ["MLFLOW_S3_ENDPOINT_URL"] = "https://s3.strg1.lan" os.environ["AWS_ACCESS_KEY_ID"] = os.getenv("STORAGE_ACCESS_KEY") os.environ["AWS_SECRET_ACCESS_KEY"] = os.getenv("STORAGE_SECRET_KEY") ca_cert_path = "ca.crt.cer" if not os.path.exists(ca_cert_path): raise FileNotFoundError(f"Certificate file not found at: {ca_cert_path}") os.environ["AWS_CA_BUNDLE"] = ca_cert_path # uncomment this if you want to ignore an insecure TLS/unsigned certificate #os.environ["MLFLOW_TRACKING_INSECURE_TLS"] = "true" os.environ["MLFLOW_TRACKING_CLIENT_CERT_PATH"] = ca_cert_path </code> ===== SSL Certificate ===== Below you can find the certificate as of May 06, 2025. Simply save it into a ''ca.crt.cer'' file to use as shown above. <code> -----BEGIN CERTIFICATE----- MIIB+DCCAX+gAwIBAgIUNOXxe14mKQCbT9gKVouhzCD3TL0wCgYIKoZIzj0EAwQw FTETMBEGA1UEAwwKUmVrb25hcyBDQTAeFw0yMzAyMjIxNDA0MzVaFw0zMzAyMTkx NDA0MzVaMBUxEzARBgNVBAMMClJla29uYXMgQ0EwdjAQBgcqhkjOPQIBBgUrgQQA IgNiAAR6Nija/wfPLwmX/KW2rsowfxbLIJ3JMTJmltFOqrul074ZQkVQWsyShp67 2GlehcDP+oLR7VJg8oCEIFDQYug00x2QWlnqDHMxkE0ZtN6vH5lq/RaUUf0hdYy3 eP6l+qijgY8wgYwwDAYDVR0TBAUwAwEB/zAdBgNVHQ4EFgQUWtnLStq5o/+O4B2Y 3Fsc11dadqwwUAYDVR0jBEkwR4AUWtnLStq5o/+O4B2Y3Fsc11dadqyhGaQXMBUx EzARBgNVBAMMClJla29uYXMgQ0GCFDTl8XteJikAm0/YClaLocwg90y9MAsGA1Ud DwQEAwIBBjAKBggqhkjOPQQDBANnADBkAjBVVoWkAHc2jQpkobopyGhS+bLDRjEm 3ZtGVo9Blvk0TNciDBSgeQ6onuAjorLP3/ICMD5G2CR4rmfCh6Ed+mag7wMlBQYf 1q5iT+kB7u9gG8lhIeB+1MT5JIeIK7ygmC6g/Q== -----END CERTIFICATE----- </code> ====== Issues with VPN: MTU Problems ====== In case you have issues accessing the minio store from VPN, check your MTU settings for the vpn: <code> ip a | grep mtu </code> A working mtu for packages transmitted over the VPN via the docker bridge interface is 1360. If you see a different value, you may manually set the value with the following command. Note that this will **not** change the value `ip a` command. Replace `<your_vpn_interface>`. <code> sudo iptables -I FORWARD -i docker0 -o <your_vpn_interface> -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1360 </code> dev-tools/s3_object_storage.txt Last modified: 2025/10/15 16:37by fabricio