# model-validation-operator **Repository Path**: Sigstore/model-validation-operator ## Basic Information - **Project Name**: model-validation-operator - **Description**: Kubernetes controller to validate AI models - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-05-09 - **Last Updated**: 2026-01-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Model Validation Controller This project is a proof of concept based on the [sigstore/model-transperency-cli](https://github.com/sigstore/model-transparency). It offers a Kubernetes/OpenShift operator designed to validate AI models before they are picked up by actual workload. This project provides a webhook that adds an initcontainer to perform model validation. The operator uses a custom resource to define how the models should be validated, such as utilizing [Sigstore](https://www.sigstore.dev/) or public keys. ### Features - Model Validation: Ensures AI models are validated before they are used by workloads. - Webhook Integration: A webhook automatically injects an initcontainer into pods to perform the validation step. - Custom Resource: Configurable `ModelValidation` custom resource to specify how models should be validated. - Supports methods like [Sigstore](https://www.sigstore.dev/), pki or public key validation. ### Prerequisites - Kubernetes 1.29+ or OpenShift 4.16+ - Proper configuration for model validation (e.g., Sigstore, public keys) - A signed model (e.g. check the `testdata` or `examples` folder) ### Installation The operator can be installed via [kustomize](https://kustomize.io/) using different deployment overlays. #### Production Deployment For production environments with cert-manager integration: ```bash kubectl apply -k https://raw.githubusercontent.com/sigstore/model-validation-operator/main/config/overlays/production # or local kubectl apply -k config/overlays/production ``` #### Testing Deployment For testing environments with manual certificate management: ```bash kubectl apply -k https://raw.githubusercontent.com/sigstore/model-validation-operator/main/config/overlays/testing # or local kubectl apply -k config/overlays/testing ``` #### Development Deployment For development environments, deploying the operator without the webhook integration: ```bash kubectl apply -k https://raw.githubusercontent.com/sigstore/model-validation-operator/main/config/overlays/development # or local kubectl apply -k config/overlays/development ``` #### OLM Deployment For OpenShift/OLM environments: ```bash kubectl apply -k https://raw.githubusercontent.com/sigstore/model-validation-operator/main/config/overlays/olm # or local kubectl apply -k config/overlays/olm ``` #### Uninstall To uninstall the operator, use the same overlay you used for installation: ```bash kubectl delete -k config/overlays/production ``` ### Configuration Structure The operator uses a kustomize based, overlay configuration structure, aiming to separate generated content from environment specific content: ``` config/ ├── crd/ # Custom Resource Definitions ├── rbac/ # RBAC permissions ├── webhook/ # Webhook configuration ├── manager/ # Controller manager deployment ├── manifests/ # OLM manifests ├── components/ # Reusable components │ ├── webhook/ # Webhook service component │ ├── certmanager/ # Certificate manager component │ ├── manual-tls/ # Manual TLS configuration │ ├── metrics-port/ # Metrics configuration │ └── webhook-replacements/ # Webhook configuration replacements └── overlays/ # Environment-specific overlays ├── production/ # Production (cert-manager) ├── development/ # Development (operator only, no webhooks) ├── testing/ # Testing (manual, self-signed certs) └── olm/ # OpenShift/OLM ``` #### Certificate Management The operator supports different certificate management approaches: 1. **Production**: Uses cert-manager for automatic certificate management - **⚠️ Important**: The default cert-manager configuration uses self-signed certificates - For production environments, you should configure cert-manager with a proper CA issuer 2. **Development**: Does not use certificates, there are no webhook configurations in this overlay 3. **Testing**: Uses manual, self-signed certificate management for testing scenarios 4. **OLM**: Uses OLM's built-in certificate management for OpenShift deployments #### Running the Webhook Server Locally The webhook server requires TLS certificates. When you run the operator locally, certificates will be generated automatically: ```bash make run ``` This command will start the webhook server on https://localhost:9443, using the generated certs. ### Known limitations The project is at an early stage and therefore has some limitations. - There is no validation or defaulting for the custom resource. - The validation is namespace scoped and cannot be used across multiple namespaces. - There are no status fields for the custom resource. - The model and signature path must be specified, there is no auto discovery. - TLS certificates used by the webhook are self generated. ### Usage First, a ModelValidation CR must be created as follows: ```yaml apiVersion: ml.sigstore.dev/v1alpha1 kind: ModelValidation metadata: name: demo spec: config: sigstoreConfig: certificateIdentity: "https://github.com/sigstore/model-validation-operator/.github/workflows/sign-model.yaml@refs/tags/v0.0.2" certificateOidcIssuer: "https://token.actions.githubusercontent.com" model: path: /data/tensorflow_saved_model signaturePath: /data/tensorflow_saved_model/model.sig ``` Pods in the namespace that have the label `validation.ml.sigstore.dev/ml: ""` will be validated using the specified ModelValidation CR. It should be noted that this does not apply to subsequently labeled pods. ```diff apiVersion: v1 kind: Pod metadata: name: whatever-workload + labels: + validation.ml.sigstore.dev/ml: "demo" spec: restartPolicy: Never containers: - name: whatever-workload image: nginx ports: - containerPort: 80 volumeMounts: - name: model-storage mountPath: /data volumes: - name: model-storage persistentVolumeClaim: claimName: models ``` ### Examples The example folder contains example files for testing the operator. #### Prerequisites for Examples Before running the examples, create a namespace for testing (separate from the operator namespace): ```bash kubectl create namespace testing ``` **Important**: Do not deploy examples in the operator namespace (e.g., `model-validation-operator-system`). The operator namespace has the label `validation.ml.sigstore.dev/ignore: "true"` which prevents the webhook from processing pods in that namespace. #### Example Files - **prepare.yaml**: Contains a persistent volume claim and a job that downloads a signed test model. ```bash kubectl apply -f https://raw.githubusercontent.com/sigstore/model-validation-operator/main/examples/prepare.yaml -n testing # or local kubectl apply -f examples/prepare.yaml -n testing ``` - **verify.yaml**: Contains a model validation manifest for the validation of this model and a demo pod, which is provided with the appropriate label for validation. ```bash kubectl apply -f https://raw.githubusercontent.com/sigstore/model-validation-operator/main/examples/verify.yaml -n testing # or local kubectl apply -f examples/verify.yaml -n testing ``` - **unsigned.yaml**: Contains an example of a pod that would fail validation (for testing purposes). ```bash kubectl apply -f https://raw.githubusercontent.com/sigstore/model-validation-operator/main/examples/unsigned.yaml -n testing # or local kubectl apply -f examples/unsigned.yaml -n testing ``` After the example installation, the logs of the generated job should show a successful download: ```bash $ kubectl logs -n testing job/download-extract-model Connecting to github.com (140.82.121.3:443) Connecting to objects.githubusercontent.com (185.199.108.133:443) saving to '/data/tensorflow_saved_model.tar.gz' tensorflow_saved_mod 44% |************** | 3983k 0:00:01 ETA tensorflow_saved_mod 100% |********************************| 8952k 0:00:00 ETA '/data/tensorflow_saved_model.tar.gz' saved ./ ./model.sig ./variables/ ./variables/variables.data-00000-of-00001 ./variables/variables.index ./saved_model.pb ./fingerprint.pb ``` The operator logs should show that a pod has been modified: ```bash $ kubectl logs -n model-validation-operator-system deploy/model-validation-controller-manager time=2025-01-20T22:13:05.051Z level=INFO msg="Starting webhook server on :9443" time=2025-01-20T22:13:47.556Z level=INFO msg="new request, path: /mutate-v1-pod" time=2025-01-20T22:13:47.557Z level=INFO msg="Execute webhook" time=2025-01-20T22:13:47.560Z level=INFO msg="Search associated Model Validation CR" pod=whatever-workload namespace=testing time=2025-01-20T22:13:47.591Z level=INFO msg="construct args" time=2025-01-20T22:13:47.591Z level=INFO msg="found sigstore config" ``` Finally, the test pod should be running and the injected initcontainer should have been successfully validated. ```bash $ kubectl logs -n testing whatever-workload model-validation INFO:__main__:Creating verifier for sigstore INFO:tuf.api._payload:No signature for keyid f5312f542c21273d9485a49394386c4575804770667f2ddb59b3bf0669fddd2f INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c INFO:__main__:Verifying model signature from /data/model.sig INFO:__main__:all checks passed ``` In case the workload is modified, is not executed: ```bash ERROR:__main__:verification failed: the manifests do not match ```