Automation Engineering

Zero Trust Automation

The architecture is signed off — now it has to survive production. We build the automation that keeps a Zero Trust deployment trustworthy after go-live: idempotent provisioning, continuous configuration assurance, just-in-time access, and forensic-grade incident tooling — built on the Zscaler OneAPI SDK and running inside your own CI and SIEM.

A Zero Trust architecture is a living system. Policies drift, connectors fail over, access is granted and must be revoked, and when an incident lands, someone has to reconstruct exactly what happened — and prove it.

Arduwyn engineers the operational layer that standard deployments leave manual. The toolkit below is drawn from real engagements: a coherent suite spanning provisioning, configuration assurance, access governance, and incident response. Every component is auditable, idempotent, and yours — there is no Arduwyn-hosted black box in the path.

The automation lifecycle

Four stages of Zero Trust operations — each one a place where manual process becomes risk.

Provision

Stand the ZPA estate up — and stand up its failover — from declared desired state, not console clicks.

Observe

Snapshot every tenant's configuration, detect drift, and stream a unified audit trail to your SIEM.

Govern

Replace standing access with time-boxed, approved grants that revoke themselves automatically.

Respond

Reconstruct an incident across every source — then preserve the evidence in a court-defensible form.

zpa-dr · bootstrap & failover rehearsal

$ python playbooks/run_all.py --config config/dr_topology.yaml
=== running deploy_connector_groups.py ===
=== running deploy_server_groups.py ===
=== running deploy_segment_groups.py ===
=== running deploy_app_segments.py ===
all playbooks converged. secondary DC is pre-staged and ready for failover.
 
$ python playbooks/failover.py --config config/dr_topology.yaml --dry-run
preflight_ok      connector_group  appconn-dc-west   healthy_connectors=3
snapshot          app_segment      *                 count=3
failover          app_segment      erp-web           dry_run=True
failover          app_segment      git-https         dry_run=True
failover          app_segment      workday           dry_run=True
failover complete (dry_run=True) snapshot=snapshots/3f9c1a.json cid=3f9c1a

The toolkit

Seven automations, grouped by lifecycle stage. Each runs in your infrastructure and is built on the Zscaler OneAPI SDK.

Provision

Bring the estate — and its disaster-recovery posture — up from desired state.

playbooks/ · config/

ZPA disaster-recovery topology

A two-datacenter ZPA estate declared in one YAML file, with a single-command failover between primary and secondary.

How it works

run_all.py bootstraps connector groups, server groups, segment groups, and app segments in dependency order — both datacenters staged from one config.
App segments are datacenter-agnostic; only their server-group binding flips. Failover swaps server_group_ids from the primary DC to the secondary.
failover.py pre-flights that at least two secondary connectors are authenticated, snapshots current bindings, then converges — writing every change to a JSONL audit log with a correlation ID, actor, and before/after state.
Rollback restores from the snapshot. --dry-run is mandatory rehearsal before any live cut-over.

zscaler-sdk-pythonYAML desired-stateJSONL audit

app_onboarding/

ServiceNow app onboarding

A self-service ServiceNow catalog item that provisions a ZPA application end to end — from request to live access policy — with no console clicks.

How it works

Manager and Zscaler-Admin approvals gate the request. On fulfilment, a ServiceNow business rule fires a GitHub repository_dispatch carrying the RITM payload.
GitHub Actions renders Terraform variables from the payload, runs fmt / validate / plan, and gates the plan through an OPA / conftest policy check before anything applies.
Terraform creates the segment group, application segment, and the access-policy rule binding the AD / SCIM group to the app — stamping the RITM number into every resource for traceability.
Resource IDs are written back into ServiceNow and the request closes itself. A GitHub Environment reviewer gate sits on top of the SNOW approvals.

ServiceNowGitHub ActionsTerraformOPA / conftest

Observe

Know the configuration, prove what it was, and see every action across the platform.

config_backup/

Configuration backup & drift detection

A daily, version-controlled snapshot of every tenant's configuration — and an alert the moment something changes that shouldn't have.

How it works

Exporters cover ZIA, ZPA, ZDX, ZWA, and ZIdentity. Each nightly run commits a snapshot to git — the repository becomes a tamper-evident configuration history.
Drift detection diffs consecutive snapshots, classifies each change by actor — separating Terraform and automation from human edits — and renders a unified-diff report.
Unexpected drift alerts to Slack and ships as events to Datadog. compare.py lets an analyst diff any two snapshot dates on demand.

GitHub Actionsgit-as-storeSlackDatadog

unified_audit/

Unified audit pipeline

Every Zscaler service's audit log, every tenant, normalized into one event shape and streamed to your SIEM.

How it works

A ten-minute cron collects audit entries from ZIA, ZPA, ZDX, ZWA, and ZIdentity through the Zscaler OneAPI.
Each entry is normalized to a single Datadog event schema — so a detection is written once, not five times in five dialects.
A per-tenant, per-product checkpoint guarantees exactly-once delivery. Runs never cancel mid-flight, so no audit event is dropped or double-counted.

OneAPIGitHub ActionsDatadogCheckpoint store

Govern

Grant access narrowly, with approval — and take it back automatically.

zpa_jit/

Just-in-time ZPA access

Standing access to sensitive ZPA-fronted applications, replaced by time-boxed grants requested in Slack and revoked on a timer.

How it works

The /zpa-jit slash command requests membership of a ZPA-synced IdP group for a stated duration and reason.
An approval card posts to a review channel; on approval, the user is added to the SCIM-synced group in Okta or Entra ID.
A revoke worker runs every five minutes and removes every expired grant. The grant database is the system of record — every state change emits a tagged Datadog event.

FlaskSlackOkta / Entra IDDatadog

Guardrails

Self-approval is hard-blocked.
Duration ceilings are enforced server-side, ignoring whatever the user typed in Slack.
Only pre-listed IdP groups are eligible — anything else is rejected, regardless of who asks.
The state change trail is the audit chain an IR analyst follows.

Respond

When something goes wrong, reconstruct it — and preserve it.

ir_timeline/

Incident-response timeline

A correlated, multi-source timeline of everything a subject did inside a time window — built in seconds, not an afternoon.

How it works

Pulls from the unified audit feed in Datadog, DLP incidents, ZDX, and ZPA access logs, then clusters related events and flags high-signal sources.
Renders to JSON, Markdown, and HTML in one run — analyst-readable and machine-ingestible at the same time.
Drives from a Slack command, or from a ZWA playbook via an HTTP action: a DLP incident or an admin role change triggers a timeline that is attached to the incident and pages on-call when high-signal events are present.

PythonDatadogZDX / DLP / ZPAZWA HTTP action

forensics_bundler/

Forensic evidence bundler

A court-defensible evidence package — signed, hash-verified, and reproducible byte for byte.

How it works

Collects a subject's Datadog audit logs, the configuration snapshots bracketing the incident window, and live identity, group, and accessible-app pulls into one archive.
Every file is SHA-256 hashed into a MANIFEST.json, signed with a detached Ed25519 signature, and journaled in an append-only chain-of-custody log.
The tarball is deterministic — fixed timestamps and ownership mean two bundlings of the same input are byte-identical. verify.py re-checks signature, hashes, and custody chain end to end.

PythonEd25519Deterministic tarOneAPI

Engineering principles

The non-negotiables every automation in the suite is built to. They are why this tooling is safe to point at production.

Idempotent by default

Every playbook converges to desired state. Re-running is safe — an unchanged resource records a no-op, never a duplicate.

Audited, always

Every state change is written with a correlation ID, the acting principal, and a full before/after — to a JSONL log, to Datadog, or both.

Dry-run before live

Mutating operations support a dry-run flag. Rehearsal against real configuration is mandatory practice before any production change.

Least privilege, no exceptions

Access is time-boxed and self-approval is impossible. Ceilings are enforced server-side — not in a UI a user can edit.

Forensic determinism

Evidence bundles are signed, hash-manifested, and byte-reproducible. A receiver verifies provenance without having to trust the sender.

Your infrastructure, not ours

Everything runs in your CI, your ServiceNow, your SIEM, on the Zscaler OneAPI SDK. There is no Arduwyn-hosted service in the path.

How we deliver

Automation goes into production the same way the architecture did — incrementally, with guardrails, and with a handover.

Scope & topology

We map your tenants, datacenters, IdP groups, and SIEM. The desired-state YAML and configuration files are written against your actual estate — not a reference template.

Pilot on one workflow

One automation goes in first — usually config backup or the audit pipeline, because they are read-only and prove the OneAPI integration with zero blast radius.

Roll out with guardrails

Mutating automations follow, each behind dry-run rehearsal, policy gates, and approval workflows. Your team reviews every plan before it applies.

Handover & runbooks

You own the repositories. We document every workflow, leave rollback procedures, and stay on call through the first failover drill and the first live incident.

Engage

Have a Zero Trust deployment that's still run by hand?

Tell us where the manual work — and the risk — lives. We respond within one business day with a scoped plan to automate it.

hello@arduwyn.com See engineering approach