Blog

May 28, 2026 | 10 min

Least Privilege Policy Drift and Runtime Risk

Christian Simko

No items found.

Least Privilege Policy Drift and Runtime Risk

Least privilege is a drafting exercise. Most teams treat it like a one-time act of engineering discipline. Architects sit down, model a workload, scope a role, commit the Terraform, and declare the identity fit for production. That part is real work. It is also the part that ages the fastest.

The problem starts the moment the identity is live. Access accumulates. Ownership blurs. Credentials persist past the person or project that requested them. Behavior drifts. The role that was tight on Monday has inherited two group memberships by Friday, a break-glass exception by the end of the sprint, and a pipeline-granted privilege by the next quarter. Nothing looks broken. Nothing alerts. The policy is still there. It is just no longer accurate.

This piece is about that gap. Specifically, the gap between what the policy was meant to allow and what the identity actually holds and uses today. Call that drift. Separately, call the exposure created by unused but still granted permissions runtime risk. Both are steady-state conditions in every environment larger than a toy. Neither is caught reliably by daily posture scans. And for non-human identities, which outnumber human identities roughly forty-five to one, the blast radius of getting this wrong is not theoretical.

How Drift Actually Happens

Drift is rarely the result of a single bad decision. It is an accretion. A thousand small, defensible moves, none of which individually felt wrong, produce an identity that no one would design on a clean sheet of paper. Understanding the mechanics matters, because the remediation depends on knowing which lever caused which motion.

Temporary Exceptions That Calcify

An engineer needs to read a new bucket for a two-week migration. A ticket is filed. The exception is granted with a note that it should be revoked after the cutover. The cutover slips. The ticket gets auto-closed by a stale bot. The exception stays. Multiply by every migration, every incident, every customer escalation. AC-6 least privilege is aspirational the moment exception workflows lack a revocation contract.

IaC Rollbacks That Leave Artifacts

Infrastructure-as-code is treated as the source of truth, but rollbacks and manual fixups during incidents routinely create state the IaC provider never re-converges on. A role version is applied out of band to get a deploy unstuck. The IaC run that follows does not own the attached policy, so it leaves it alone. The artifact persists, invisible to anyone reading the repository.

Group Cascades and Role Inheritance

Group-based access is efficient for humans. It is a disaster for non-human identities that get assigned to groups for operational convenience. A service account added to a group for one API call inherits everything the group membership implies. Because Kubernetes RBAC and comparable cloud models grant permissions additively, every new group membership is a one-way ratchet unless something actively reverses it.

Pipelines That Grant Themselves More

CI/CD systems authenticate as identities. When a pipeline needs to deploy to a new region, the fix is usually to add a permission to the pipeline's role. The previous permissions do not get pruned. Over time, the pipeline identity accumulates the union of every environment it has ever touched. The same pattern maps neatly to MITRE ATT\&CK Privilege Escalation when an attacker lands on the identity.

Break-Glass Roles Used for Non-Emergencies

Break-glass roles exist for good reason. They also get repurposed the moment a normal path is broken and shipping matters more than process. Once a break-glass role is used for a routine task, it becomes a routine tool. The audit trail exists. Nobody reads the audit trail.

The Separate Problem of Runtime Risk

Drift describes how a least privilege access control policy diverges from its recorded intent. Runtime risk describes the consequence. Every permission that is granted but unused is latent attack surface. If the identity is compromised, stolen, or simply used beyond its scope by a misconfigured automation, those unused permissions become usable.

A role may technically exist that has not been invoked in ninety days. The existence of that role is still a problem. MITRE ATT\&CK Valid Accounts is the most common initial access vector in the Verizon DBIR year after year. The attacker does not care that the permission has been dormant. They care that it works.

Signals That Actually Tell You Something

You cannot remediate what you cannot see. Posture snapshots describe a state at a point in time. Drift is a trajectory. The useful signals are about change and utilization, not configuration.

Role Assignment Delta

Compare the role definition at creation to the current definition. Track every permission added, the requester, the reason, the expected expiry. Absence of an expiry is itself a finding. The delta over thirty days is usually more interesting than the total.

Permission-to-Action Ratio

For each identity, count the distinct permissions granted. Count the distinct API actions actually invoked in the observation window. The ratio of granted to used is the blast radius multiplier. A value near one is earned. A value of fifty is a standing invitation.

Age of Last Use

For every permission on every identity, track the last time it was invoked. Permissions not used in the last thirty, sixty, and ninety days are the candidate revocation queue. Continuous monitoring under CA-7 is not a document. It is a data pipeline.

Unused Permissions by Risk Tier

Not all unused permissions are equal. An unused list action on a public bucket is noise. An unused delete on production data or an unused assume-role into a privileged account is urgent. Weight the queue by impact, not by count.

Why Posture Scans Miss Drift

Posture scans find misconfiguration. They are good at that. They are not, by design, a mechanism for comparing current state against intent. A scan that runs every twenty-four hours sees the current state of the world. It does not know whether that state reflects what the architect wrote down when the identity was created.

Intent is the missing variable. Without a recorded, machine-readable notion of what the identity was supposed to be allowed to do, a scanner cannot tell you that your policy has rotted. It can only tell you that your policy is valid syntactically and conforms to benchmarks. Those are different questions. CIS Critical Security Controls and benchmark frameworks are baselines, not reconciliation engines.

This is the core failure mode of treating security as compliance. Compliance asks whether the policy passes a rule. Reconciliation asks whether the policy still reflects what the system is supposed to do. The second question is harder, and it is the only one that matters for drift.

Fixing It Means Continuous, Not Quarterly

Quarterly access reviews are a relic of the human-identity era, and they were mediocre even then. For non-human identities at the density modern environments produce, they are a fiction. The fix is structural, not administrative.

Continuous Review With Automation

Review cannot be a meeting. It has to be a pipeline. Every change to an identity produces a diff. Every diff is evaluated against the identity's declared intent. Anything unexpected generates a ticket or, for well-understood patterns, a direct remediation.

Automated Downscoping

Unused permissions should be downscoped automatically, on a schedule, with an owner-notified dry-run window. This is the operational core of cloud permission management at NHI scale. The first pass is permissions not used in ninety days. The second pass is permissions not used in thirty days on low-risk surfaces. The third pass is risk-weighted. Humans approve the policy; the system enforces the rhythm.

Intent-to-Behavior Reconciliation

The mature control is to express intent in code at creation. What workload is this identity for. What data does it need to read or write. What external systems does it call. Then, continuously, compare that intent to the observed behavior. A workload declared to write to one queue that is suddenly writing to three is either a legitimate change with a missing update to intent, or an incident.

Retirement Workflows

Identities outlive their purpose constantly. The service that used the credential was deprecated eighteen months ago. The automation was moved to a new role. Nobody deleted the old one. A retirement workflow evaluates last-use, declared ownership, and dependency signals, and shuts down what is no longer used. Aligned with NIST CSF Govern, lifecycle is a first-class control, not an afterthought.

NHIs vs Humans: The Difference Actually Matters

Human access review assumes a person will show up for a meeting, look at a list, and attest that they still need something. None of that applies to an API key, a service account, a workload identity, or an AI agent. There is no person to ask. Ownership, when it exists, is a Slack handle or a team name. When the team reorganizes, the link breaks.

Non-human identities also behave more predictably than humans in the normal case, which is an advantage. A pipeline identity should do roughly the same thing every day. Deviations are easier to detect because the baseline is tighter. The disadvantage is volume. Roughly ninety-eight percent of identities in a typical enterprise are non-human. Anything that requires per-identity human attention does not scale.

The operational model has to invert. Humans define intent and approve exceptions. Machines measure behavior, compute drift, propose and execute remediation. NIST Zero Trust and the CISA Zero Trust Maturity Model both converge on this in the identity pillar, even if the documents are softer about who does what.

Where Token Security Fits

Token Security treats drift as the steady state and builds the reconciliation loop around it. The platform discovers non-human identities continuously across cloud, identity providers, secret vaults, CI/CD, Kubernetes, and AI platforms, and records intended purpose alongside observed behavior. Unused permissions, role assignment deltas, and ownership gaps are surfaced against the original intent, not against a generic benchmark. Automated remediation downscopes, retires, or routes for approval based on configured risk tiers. The practical effect is that least privilege stops being a drafting exercise and becomes an enforced property of the identity over its lifetime.

Final Thoughts

Drift is not an exception. It is the ordinary behavior of permission systems under normal operational pressure. Any program that treats it as a deviation to be cleaned up quarterly is already losing ground. The workable posture is continuous measurement against recorded intent, automated downscoping, and retirement by default. Tie it to the Risk Management Framework as a control story for auditors, but do not confuse the audit with the work. The work is the pipeline that runs every day, whether anyone is watching or not.

Frequently Asked Questions

How is drift different from misconfiguration?

Misconfiguration is a policy that does not conform to a rule at a point in time. Drift is a policy that has diverged from its original intent over time. A scanner catches the first. Only a reconciliation loop that compares current state against recorded intent catches the second.

Is ninety days the right threshold for unused permissions?

It is a reasonable starting point and aligns with common review cadences, but it is not a universal answer. High-risk permissions, like assume-role into privileged accounts or destructive actions on production data, warrant much tighter windows. Low-risk read operations on non-sensitive surfaces can tolerate longer windows.

Can IaC alone prevent drift?

No. IaC is necessary, not sufficient. It prevents one class of drift, the kind introduced by forgetting to commit a change. It does not prevent out-of-band fixes during incidents, group cascade effects, or pipeline-granted permissions that accumulate outside the repository. You still need runtime measurement.

How does this map to container and workload identities?

Container workloads inherit the problem directly, often worse because ephemerality masks accumulation. [NIST SP 800-190](https://csrc.nist.gov/pubs/sp/800/190/final) covers the container security surface, but the identity portion requires the same intent-to-behavior reconciliation applied to service accounts, pod identities, and any federated credentials the workloads use.

Least Privilege Policy Drift and Runtime Risk

Christian Simko

Least Privilege Policy Drift and Runtime Risk

How Drift Actually Happens

Temporary Exceptions That Calcify

IaC Rollbacks That Leave Artifacts

Group Cascades and Role Inheritance

Pipelines That Grant Themselves More

Break-Glass Roles Used for Non-Emergencies

The Separate Problem of Runtime Risk

Signals That Actually Tell You Something

Role Assignment Delta

Permission-to-Action Ratio

Age of Last Use

Unused Permissions by Risk Tier

Why Posture Scans Miss Drift

Fixing It Means Continuous, Not Quarterly

Continuous Review With Automation

Automated Downscoping

Intent-to-Behavior Reconciliation

Retirement Workflows

NHIs vs Humans: The Difference Actually Matters

Where Token Security Fits

Final Thoughts

Frequently Asked Questions

How is drift different from misconfiguration?

Is ninety days the right threshold for unused permissions?

Can IaC alone prevent drift?

How does this map to container and workload identities?

Discover other articles

An AI Agent Escaped Its Sandbox and Hacked a Real Company. It Got In With 0-Days; Credentials Did The Damage

Access Without Accountability: The Compliance Risk of Machine Credentials

Reversing 1Password's Proprietary SRP Authentication Protocol

Be the first to learn about Machine-First identity security