dev-tools:s3_object_storage

This is an old revision of the document!


S3 Object Storage

The object storage is internally hosted via minIO. There is a web-interface which allows browsing through the store. Data can also be downloaded via the minIO client.

Web-Interface: https://minio.strg1.lan/login

minIO client setup

Install instructions: https://min.io/docs/minio/linux/reference/minio-mc.html

S3 API URL: https://s3.strg1.lan

The CLI-tool first requires the set-up of an alias for the storage. For doing so, you need an access key, which can be created via the web-interface.

You can then either directly download a credentials.json, or do it yourself with the given accessKey and secretKey. NOTE: The URL provided by the web-interface is wrong. You need to specify the S3 API URL specified above.

Assuming you have the mc command available in the command-line, and the credentials in credentials.json, you can setup an alias with the name “strg1” via: mc alias import strg1 credentials.json

If everything worked out, you should be able to ls the storage:

mc ls strg1
{"url":"https://s3.strg1.lan", 
 "accessKey":"YOURACCESSKEY",
 "secretKey":"YOURSECRETKEY",
 "api":"s3v4",
 "path":"auto"}

Accessing the Buckets from python

For accessing the storage in python you can either use the minio-client provided by the `minio` package, or the `s3fs` package together with `boto3`. The latter has wider support, and is e.g. required for using zarr.

An important part is that the ssl certificate of the storage is self signed, so a certificate file has to be provided.

Note: For storing secrets, NEVER save them in your code directly, as the code will be pushed to repositories. Instead, create a file .env in your project root, where you store your credentials, and later load them in python using the `dotenv` package.

Example .env file:

STORAGE_ACCESS_KEY=...
STORAGE_SECRET_KEY=...
ENDPOINT=s3.strg1.lan
ENDPOINT_FULL=https://s3.strg1.lan

This assumes a ca.crt.cer file is available.

For this snippet, the following dependency was used in the pyproject.toml file:

 
dotenv = "^0.9.9"
minio = "^7.2.15"
 

Code Snippet

  1. from pathlib import Path
  2.  
  3. import minio
  4. import urllib3
  5. import os
  6. from dotenv import load_dotenv
  7.  
  8. # Specify the path to your custom CA certificate
  9. # Set the environment variable for the custom CA certificate
  10. _root = Path(__file__).parent.parent
  11.  
  12. # Create a boto3 session with your custom configuration
  13.  
  14.  
  15. # load credentials from .env file
  16. load_dotenv()
  17. access_key_id = str(os.getenv("STORAGE_ACCESS_KEY"))
  18. secret_access_key = str(os.getenv("STORAGE_SECRET_KEY"))
  19. endpoint_url = str(os.getenv("ENDPOINT"))
  20. endpoint_url_full = str(os.getenv("ENDPOINT_FULL"))
  21.  
  22. ca_cert_path = _root/"ca.crt.cer"
  23. assert Path(ca_cert_path).is_file()
  24.  
  25. # S3
  26. # when making requests to endpoints with self-signed certs
  27. http_client = urllib3.PoolManager(
  28. cert_reqs="CERT_REQUIRED",
  29. ca_certs=_root/"ca.crt.cer",
  30. )
  31.  
  32. # create Minio client with custom http client
  33. # replace with boto3 client for AWS ?
  34. minio_client = minio.Minio(
  35. endpoint_url,
  36. secure=True,
  37. access_key=access_key_id,
  38. secret_key=secret_access_key,
  39. http_client=http_client,
  40. )
  41. print(minio_client.bucket_exists("rekonas-dataset-101-nights"))

This assumes a ca.crt.cer file is available.

For this snippet, the following dependency was used in the pyproject.toml file. Note: There are weird dependency issues in poetry with s3fs and boto3. The dependency below should however resolve those.

dotenv = "^0.9.9"
s3fs = {extras = ["boto3"], version = ">=2023.12.0"}

Code Snippet

  1. import os
  2. from pathlib import Path
  3. from dotenv import load_dotenv
  4.  
  5. import s3fs
  6.  
  7. # load credentials from .env file
  8. load_dotenv()
  9. access_key_id = os.getenv("STORAGE_ACCESS_KEY")
  10. secret_access_key = os.getenv("STORAGE_SECRET_KEY")
  11. endpoint_url = os.getenv("ENDPOINT")
  12. endpoint_url_full = os.getenv("ENDPOINT_FULL")
  13.  
  14.  
  15. # Specify the path to your custom CA certificate
  16. ca_cert_path = "ca.crt.cer"
  17. assert Path(ca_cert_path).is_file()
  18.  
  19.  
  20. # Create s3fs filesystem with custom cert
  21. fs = s3fs.S3FileSystem(
  22. client_kwargs={'endpoint_url': endpoint_url_full,
  23. 'verify': str(ca_cert_path)},
  24. key=access_key_id,
  25. secret=secret_access_key,
  26. use_ssl=True,
  27. )
  28.  
  29. # sanity check, exchange with some other bucket of interest
  30. assert fs.exists("rekonas-dataset-nch-sleep-databank")
  31.  
  32. # Create zarr store and group within a bucket
  33. # import zarr
  34. # store = s3fs.S3Map(root='test-bucket-fabricio-zarr', s3=fs)
  35. # z = zarr.group(store=store, path="test_group")

Dependencies:

dotenv = "^0.9.9"
obstore = "^0.6.0"
zarr = ">=3.0.8" 
  1. import os
  2. import ssl
  3. from pathlib import Path
  4.  
  5. import obstore as obs
  6. import zarr.storage # noqa: F401
  7. from dotenv import load_dotenv # noqa: F401
  8. from obstore.store import (
  9. MemoryStore, # noqa: F401
  10. S3Store,
  11. )
  12. from zarr.storage import ObjectStore # noqa: F401
  13.  
  14. # load credentials from .env file
  15. load_dotenv()
  16. access_key_id = str(os.getenv("STORAGE_ACCESS_KEY"))
  17. secret_access_key = str(os.getenv("STORAGE_SECRET_KEY"))
  18. endpoint_url = str(os.getenv("ENDPOINT"))
  19. endpoint_url_full = str(os.getenv("ENDPOINT_FULL"))
  20.  
  21. # Specify the path to your custom CA certificate
  22. ca_cert_path = "ca.crt.cer"
  23. assert Path(ca_cert_path).is_file()
  24.  
  25. # Create SSL context for custom certificate
  26. ssl_context = ssl.create_default_context()
  27. # ssl_context.check_hostname = False
  28. # ssl_context.verify_mode = ssl.CERT_NONE
  29. ssl_context.load_verify_locations(ca_cert_path)
  30.  
  31. ob_store = S3Store(
  32. "test-bucket-fabricio-zarr",
  33. endpoint=endpoint_url_full,
  34. access_key_id=access_key_id, # Should be access_key_id, not secret_access_key
  35. # Should be secret_access_key, not access_key_id
  36. secret_access_key=secret_access_key,
  37. virtual_hosted_style_request=False,
  38. region="ch-bsl-1", # Add the required region
  39. client_options={"allow_invalid_certificates": True},
  40. )
  41.  
  42. # ls to see files that exist in bucket
  43. list_of_files = obs.list(ob_store).collect()
  44.  
  45. # create a small array for testing zarr.
  46. store = zarr.storage.ObjectStore(store=ob_store)
  47. zarr.create_array(store=store, shape=(2,), dtype="float64")

Below you can find the certificate as of May 06, 2025. Simply save it into a ca.crt.cer file to use as shown above.

-----BEGIN CERTIFICATE-----
MIIB+DCCAX+gAwIBAgIUNOXxe14mKQCbT9gKVouhzCD3TL0wCgYIKoZIzj0EAwQw
FTETMBEGA1UEAwwKUmVrb25hcyBDQTAeFw0yMzAyMjIxNDA0MzVaFw0zMzAyMTkx
NDA0MzVaMBUxEzARBgNVBAMMClJla29uYXMgQ0EwdjAQBgcqhkjOPQIBBgUrgQQA
IgNiAAR6Nija/wfPLwmX/KW2rsowfxbLIJ3JMTJmltFOqrul074ZQkVQWsyShp67
2GlehcDP+oLR7VJg8oCEIFDQYug00x2QWlnqDHMxkE0ZtN6vH5lq/RaUUf0hdYy3
eP6l+qijgY8wgYwwDAYDVR0TBAUwAwEB/zAdBgNVHQ4EFgQUWtnLStq5o/+O4B2Y
3Fsc11dadqwwUAYDVR0jBEkwR4AUWtnLStq5o/+O4B2Y3Fsc11dadqyhGaQXMBUx
EzARBgNVBAMMClJla29uYXMgQ0GCFDTl8XteJikAm0/YClaLocwg90y9MAsGA1Ud
DwQEAwIBBjAKBggqhkjOPQQDBANnADBkAjBVVoWkAHc2jQpkobopyGhS+bLDRjEm
3ZtGVo9Blvk0TNciDBSgeQ6onuAjorLP3/ICMD5G2CR4rmfCh6Ed+mag7wMlBQYf
1q5iT+kB7u9gG8lhIeB+1MT5JIeIK7ygmC6g/Q==
-----END CERTIFICATE-----

Issues with VPN: MTU Problems

In case you have issues accessing the minio store from VPN, check your MTU settings for the vpn:

 ip a | grep mtu

A working mtu for packages transmitted over the VPN via the docker bridge interface is 1360. If you see a different value, you may manually set the value with the following command. Note that this will not change the value `ip a` command.

Replace `<your_vpn_interface>`.

sudo iptables -I FORWARD -i docker0 -o <your_vpn_interface> -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1360
  • dev-tools/s3_object_storage.1753100176.txt.gz
  • Last modified: 2025/07/21 12:16
  • by rekonas