GRC Engineer - Engineering the Future of GRC
Posts
⚙️ Why DIY GRC Automation Breaks at Enterprise Scale

⚙️ Why DIY GRC Automation Breaks at Enterprise Scale

Why GRC Engineering principles that work in proof-of-concept fail when evidence collection automation drives your enterprise program scaling

Ayoub Fandi
August 07, 2025

"Six months ago, your GRC Engineering team successfully automated AWS evidence collection. Today, they're spending 80% of their time maintaining scripts that break every time a cloud provider updates an API, debugging access issues with infrastructure teams, and providing IT support for compliance screenshot requests.

Your policy-as-code sits in a Git repository that will never see a production CI/CD pipeline because engineering teams won't deploy vibe-coded rego policies that lack proper testing, error handling, and company-specific context. Your "automation" creates $10K annually in AI token costs because debugging unfamiliar code requires constant AI assistance.

Meanwhile, infrastructure teams tell you: "If you only need that data once a year for audit purposes, we'll give you the screenshot annually. We're not exposing production APIs to compliance scripts that increase our attack surface."

Sound familiar?

This is the hidden pattern killing GRC Engineering initiatives: teams start with promising automation wins, then gradually become glorified IT support teams maintaining evidence collection scripts instead of strategic risk management partners. The vision of engineering-driven compliance transformation fades into maintenance work that creates more entropy than efficiency.

Today's analysis breaks down exactly why this pattern is predictable, how systems thinking explains the coordination cost explosion, and what enterprise GRC teams should focus on instead of chasing cloud-native automation unicorns.

IN PARTNERSHIP WITH

Want to sponsor the GRC revolution?

The top spot is filling out quickly with limited inventory available for the rest of 2025. ~65% open rates and high-CTR by industry standards aren’t even the reasons why should you work with the GRC Engineer.

Helping propel the revolution in how companies think about GRC and build their programs is the real reason! If you want to showcase your offering to a highly-engaged audience of GRC leaders from the world’s most successful companies, you know what to do.

Why It Matters 🔍

The why: the core problem this solves and why you should care

You’re probably not single-cloud-native

The GRC automation conversation assumes a world that doesn't exist for most organisations. According to industry data from Cisco:

Enterprise Cloud Reality:
├── 82% use hybrid cloud setups (Cisco, 2025)
├── 92% use multi-cloud strategies (4.8 clouds average)
├── Only 8% stick to single public cloud provider
└── 71% struggle with cloud migration (not by choice)

Translation: Cloud-native automation captures maybe 10% of actual enterprise controls.

The real enterprise stack includes COBOL applications older than some compliance frameworks, IoT devices with no APIs, geographic variations across 36 countries, drift between HQ tech-stack and local subsidiaries and Oracle databases from 2003. Organisations can't migrate these systems without business interruption, not because they don't want cloud-native, but because they literally can't.

Another issue is that as companies are adopting CSPM tools to manage their multi-cloud deployments, focusing on a single cloud will limit your ability to pull the relevant evidence. Also, a provider like GCP is so good at segmenting access that you don’t even know what you don’t have access to. This leads to some scope issues when determining what to pull.

Automation also has a cost (I know, I know)

Each DIY GRC automation script creates exponential coordination overhead between teams. When you have expensive engineering time spent on maintaining evidence gathering scripts that purpose-built GRC tools already solve, you're looking at opportunity cost in the tens of thousands.

The low-hanging fruits get rotten pretty quickly and as you scale to match your company’s complexity, you find it challenging to maintain what you built in the past.

The multiplication effect: Multiple CI/CD platforms × Different teams × Legacy deployment processes × Geographic complexity = Coordination nightmare that scales worse than the problems it's trying to solve.

Processes for maintaining automated processes which were broken processes

Organisations end up hiring "GRC Engineers" to maintain the automation tools, creating processes to manage the processes that manage the automation, building custom dashboards to monitor the health of evidence collection systems and never measuring if automated evidence collection actually improved security outcomes versus just reduced manual efforts.

Self-serving a screenshot in 30sec can sometimes be better than spending 2 weeks to pull the same data from AWS every 60 minutes but no one can action it, you can’t maintain your script and it costs $1k in tokens because debugging vibe-code isn’t easy.

# Enterprise GRC Automation Reality Simulator™
class GRCEngineeringLifecycle:
    def __init__(self):
        self.enthusiasm = 100
        self.strategic_impact = 0
        self.maintenance_debt = 0
        self.ai_token_costs = 0
        
    def month_1_success(self):
        """First AWS API call works! 🎉"""
        print("✅ Successfully pulled S3 bucket policies!")
        self.enthusiasm = 100
        return "This is the future of compliance!"
    
    def month_6_reality(self):
        """The inevitable entropy"""
        self.maintenance_debt += 40  # Hours per week
        self.ai_token_costs += 10000  # Annual debugging costs
        self.strategic_impact = max(0, self.strategic_impact - 50)
        
        infrastructure_response = "We're not exposing production APIs to your Python scripts"
        engineering_response = "This won't pass security review"
        
        if self.enthusiasm > 20:
            self.enthusiasm -= 60
            return f"Why is nobody cooperating? {infrastructure_response}"
        
        return "Maybe we should just buy a platform..."
    
    def final_outcome(self):
        """Six months later"""
        if self.maintenance_debt > 30:
            return "Congratulations! Your GRC Engineers are now IT support specialists"
        
        return "Somehow you avoided becoming glorified screenshot collectors"

# Usage (unfortunately realistic)
grc_team = GRCEngineeringLifecycle()
print(grc_team.month_1_success())
# Output: "This is the future of compliance!"

print(grc_team.month_6_reality()) 
# Output: "Maybe we should just buy a platform..."

print(grc_team.final_outcome())
# Output: "Congratulations! Your GRC Engineers are now IT support specialists"

Strategic Framework 🧩

The what: The conceptual approach broken down into 3 main principles

Why quick-fixes won’t fix your issues

The Pattern: Whether organisations attempt comprehensive automation or targeted scripts, both approaches hit the same enterprise complexity wall.

Why it breaks at enterprise scale:

DIY Approach	Enterprise Challenge	Coordination Cost	Reality Check
Custom Evidence Scripts	4.8 cloud providers × Different APIs	Exponential maintenance overhead	Purpose-built tools already solve this
Policy-as-Code	Human judgment → Deterministic logic	CISSP analysts learning Rego	"Reasonable" can't be encoded
Incremental Automation	Each addition multiplies complexity	Coordination costs compound	Low-hanging fruit gets rotten quickly

The policy-as-code revolution never happened because each stage introduced insurmountable coordination challenges:

Policy-as-Code Journey:
├── Stage 1: "Just code your policies!"
│   └── Skills gap: CISSP analysts learning Rego
├── Stage 2: "Make it executable!" 
│   └── Translation problem: Human judgment → Logic
├── Stage 3: "Monitor everything!"
│   └── Integration nightmare: 57 different APIs
├── Stage 4: "See all violations!"
│   └── Alert fatigue: 3,782 daily violations
└── Stage 5: "Auto-fix everything!"
    └── Trust gap: Non-technical policies → Production changes

The core issue: Attempting to encode subjective human judgment ("ensure appropriate security measures based on risk") into deterministic logic ignores enterprise complexity. Even AWS can't define "public" consistently across services, but organisations expect to encode "reasonable" in perfect logic.

When you have 4.8 different cloud providers on average, "seamless integration" becomes a distributed systems problem.

The only constant is entropy, even in your GRC program

Each DIY automation integration point creates exponential complexity. The skills gap becomes a coordination bottleneck. GRC teams can't debug Python, engineering teams shouldn't write compliance policies, and both teams end up context-switching between their core responsibilities and automation maintenance.

The Integration Standoff:
├── GRC Team: "Our automation works in the lab"
├── Engineering: "Can't deploy without proper testing/monitoring"
├── DevOps: "We can't support code we can't debug"
└── Security: "This hasn't been reviewed and we don't understand dependencies"

The 90% Control Gap: Organisations that need GRC engineering most are precisely those with complex, heterogeneous environments that can't be solved with pure API automation. Focusing on cloud-native captures the easy 10% while missing the 90% that creates actual business risk.

How automation can increase technical debt (a lot)

Technical debt compounds over time, especially in unfamiliar technology stacks.

Custom scripts written by GRC staff create multiple failure points. There's a fundamental difference between understanding Python/Terraform/Rego versus implementing it in production pipelines or integrating with production code.

GRC professionals building automation that's too risky for production systems because it doesn't meet basic engineering standards. Engineering teams reject compliance automation because the risk is too high, ironic for GRC practitioners whose job is risk management.

Organisations measure automation activity (API calls, screenshots, policy checks) instead of security outcomes (risk reduction, attack surface decrease, mean time to remediation). This creates green dashboards while critical vulnerabilities remain out of SLA and perfect evidence collection while control effectiveness actually decreases.

Want to sponsor the GRC revolution (again)?

The middle spot still has a lot of space so if you’re interested, now is the time! ~65% open rates and high-CTR by industry standards aren’t even the reasons why should you work with the GRC Engineer.

Execution Blueprint 🛠️

The how: 3 practical steps to put this strategy into action at your organisation

Put it in paper, then in Excel, then in PDF, then in Python

Address complexity and ambiguity first, not time-consuming tasks.

Before building any automation, audit your current control orchestration. Most organisations discover that automating evidence collection from inconsistently executed controls just systematises dysfunction.

Assessment Priority (Risk-First):
├── Control execution consistency ← Start here
├── Standard Operating Procedures ← Build this
├── Process maturity assessment ← Measure this  
└── Evidence automation ← End here (if at all)

Enterprise Environment Assessment Matrix:

Environment Type	Control Maturity	Evidence Reality	Strategic Recommendation
100% Cloud-native API-first	High automation potential	Most evidence API-accessible	Consider targeted custom + platform
Mixed cloud + legacy	Requires orchestration first	Evidence scattered across systems	Platform-first + orchestration
Heavily legacy/COTS	Manual processes dominate	Evidence mostly manual/hybrid	Orchestration over automation
Geographic distribution	Inconsistent execution	Local variations in evidence	Standardise SOPs before tools

Critical questions:

Which controls actually execute consistently vs. exist only on paper?
What evidence is generated naturally by business processes vs. created for compliance?
Where do controls prevent actual incidents vs. just generate documentation?
Which processes can be systematised vs. require human judgment?

Evaluate the best options based on reality

Make platform vs. build decisions based on systematic assessment, not technology preferences.

Only after understanding your control reality should you evaluate automation approaches. This prevents the common trap of letting evidence collection automation drive your GRC strategy.

Technology Decision Timeline:
Week 1-3: Complete Control Assessment
├── Map actual control execution patterns
├── Document evidence generation vs. collection needs
└── Identify systematisation opportunities

Week 4-6: Platform Evaluation
├── Assess platforms against real requirements (not features)
├── Evaluate integration complexity for your environment
└── Calculate total coordination costs vs. platform benefits

Week 7-8: Strategic Implementation Planning
├── Design control orchestration improvements
├── Plan evidence automation for systematised processes only
└── Build human translation layers for complex coordination

Decision framework:

Technology Strategy Decision Tree:
├── Evidence Generated Naturally by Business Process?
│   ├── YES → Systematic collection (platform or simple scripts)
│   └── NO → Question if evidence is actually needed
├── Control Execution Consistent?
│   ├── YES → Consider automation after SOP documentation
│   └── NO → Fix orchestration first, automate later
├── Multiple Stakeholders Involved?
│   ├── YES → Platform + human translation layers
│   └── NO → Simple systematic approach
└── Enterprise Complexity (Multi-cloud/Geographic)?
    ├── YES → Platform essential for coordination
    └── NO → Custom approaches viable with proper SOPs

Success metrics that matter:

Engineering time freed for preventive controls (not just evidence collection)
Mean time to risk remediation (not compliance dashboard colours)
Attack surface reduction (not policy compliance scores)

Measure control effectiveness and its related costs

Determining the cost of each approach is challenging, you can use this table to start jotting down is it worth the effort and where to focus your limited resources:

You can use this table to get a quick heuristic on the cost of each approach and where DIY makes sense vs. platform.

Factor	Weight	DIY Score (1-5)	Platform Score (1-5)	DIY Weighted	Platform Weighted
Total 3-Year Cost	25%	_____	_____	_____ × 0.25 = _____	_____ × 0.25 = _____
Time to Value	20%	_____	_____	_____ × 0.20 = _____	_____ × 0.20 = _____
Maintenance Burden	20%	_____	_____	_____ × 0.20 = _____	_____ × 0.20 = _____
Scalability	15%	_____	_____	_____ × 0.15 = _____	_____ × 0.15 = _____
Risk/Reliability	10%	_____	_____	_____ × 0.10 = _____	_____ × 0.10 = _____
Team Capability Match	10%	_____	_____	_____ × 0.10 = _____	_____ × 0.10 = _____
TOTAL WEIGHTED SCORE	100%	_____	_____	_____	_____

In any case, you’ll also need to ensure you review regularly how your approach is working in practice, check out these questions to help determine where you need to pivot vs. need to double down.

Assessment Area	Key Questions	Decision Threshold
Control Effectiveness	Do controls prevent actual incidents?	If no prevention → Redesign control
Process Consistency	Same execution across teams/regions?	If inconsistent → Strengthen SOPs
Evidence Value	Does evidence inform decisions?	If purely compliance → Question necessity
Coordination Overhead	Time spent on GRC vs. security outcomes?	If >20% overhead → Simplify processes

Quarterly strategic reviews based on control orchestration principles:

Control Execution: Which controls actually execute vs. exist only in policies?
Evidence Utilisation: What evidence drives security decisions vs. fills audit folders?
Process Maturity: Where can systematisation reduce coordination costs?
Technology Alignment: Do tools support control execution or just evidence collection?

The Strategic Reality

Small companies can leverage GRC automation platforms because they're simple systems with homogeneous tech stacks. Enterprise GRC engineering requires systems thinking about coordination costs, process accumulation, and the reality that 90% of controls exist outside the perfect API-driven world.

The enterprises that need GRC engineering most are precisely those with complex, heterogeneous environments that can't be solved with pure cloud-native automation. The question isn't whether to automate GRC. it's whether to build systems that work with your actual enterprise complexity or chase the cloud-native unicorn that exists in 10% of your environment.

Which evidence collection could you hand off to purpose-built tools this quarter instead of building maintenance debt disguised as engineering innovation?

Did you enjoy this week's entry?

Content Queue 📖

The learn: This week's resource to dive deeper on the topic

Beyond The Noise: 12 Problems Security Practitioners Want Solved

www.leen.dev/beyond-the-noise

While this newsletter focused on the systems thinking behind why DIY automation fails, the underlying issue is often a fundamental data problem. Leen reached out to formulate how I thought about the data problem facing mature GRC programs.

Lots of amazing security professionals have contributed to this piece focused on what practitioners actually need in the real-world.

Have a look!

That’s all for this week’s issue, folks!

If you enjoyed it, you might also enjoy:

My spicier takes on LinkedIn [/in/ayoubfandi]
Listening to the GRC Engineering Podcast

See you next week!

Reply

or to participate.