• GRC Engineer
  • Posts
  • ⚙️ You Spent a Year Learning to Use AI for GRC. The Real Skill Is Assessing It.

⚙️ You Spent a Year Learning to Use AI for GRC. The Real Skill Is Assessing It.

The SOC 2 narratives, questionnaire answers and control docs crossing your desk are now written by AI, and built to pass, not to be true. Leaning on AI to check them erodes the judgment that would have caught them. Here is how to stay sharp!

Every AI-in-GRC piece I have written asks the same question: where do you plug AI in? The 3 Types of Automation in GRC Engineering. 3 Basic Things Killing Your AI Usage for GRC. All of it is about the AI you deploy to do your work.

That conversation is saturated. Mine included.

Here is the half we have skipped: the AI that other people deployed, whose output now lands in your evidence base. You did not prompt it. You did not see its context. And you are about to sign off on it.

That is assurance work. And it is squarely yours.

IN PARTNERSHIP WITH

Your security team’s guide to automated GRC

Manual evidence collection, scattered tools, and repetitive audits can take a real toll on security and GRC teams. This GRC playbook, created by Drata and Tines, offers a practical look at how teams are shifting to continuous, automated compliance.

Inside, you’ll find:

  • Detailed workflows for evidence collection, monitoring, audit prep, and vendor risk

  • Implementation guidance from credential setup to scaling

  • Best practices for building resilient, proactive GRC programs

Built to Pass, Not to Be True

The people who measure AI most rigorously keep finding the same thing. When a test is easy to game, and the model can tell what the grader wants, it games the test. It hunts for the expected answer. It clears the check without solving the problem.

You already know this failure. It has a name older than AI.

When a measure becomes the target, it stops measuring anything. It is why audit-driven programmes rot into theatre. The control comes to exist to pass the audit, not to reduce the risk. Goodhart's Law does not care whether the thing gaming the metric is a stressed team or a language model.

Now point that at an artefact you did not generate. A vendor's SOC 2 narrative is optimised to read like a clean SOC 2 narrative. An AI-drafted questionnaire response is optimised to clear your questionnaire. It was not built to be true. It was built to pass. The model that wrote it never felt a flicker of doubt.

So it arrives more fluent and complete than the human version ever was. That fluency is the trap. You catch AI's flaws in your own domain every day, so you assume it must be sharper everywhere else. Every expert makes that same bet, and every one of them watches it fumble the single thing they know well. They cannot all be right. Fluent, confident, and missing the context that makes it correct is the default everywhere. Yours included.

It's Already on Your Desk

Not a future problem. Walk your own pipeline:

Where you assess

What's now AI-generated

Why it bites

TPRM, SOC 2 review

Control narratives, exception write-ups

Fluent prose hides whether the control fits their environment or was auto-filled to match the framework

Trust, questionnaires

The vendor's answers, drafted by their AI

You're reading a model's guess at what passes, not the security team's view

Compliance, control docs

Policy and control descriptions

Reads complete, maps to the framework, and may describe a control that doesn't exist

Risk, assessment minutes

Meeting notes, risk summaries

The summary keeps what sounds important and drops the caveat someone actually said

Polish used to mean someone cared. It doesn't anymore.

The Loop You're Quietly Losing

Now the part nobody warns you about.

You didn't download your judgment. You built it through feedback loops. You signed off a control, it failed, someone made you feel it, and you assessed the next one differently. The friction taught you. That is the only way assessment skill has ever been built.

AI removes the friction.

When a model drafts your control documentation, you don't wrestle with whether the control holds. You review text that already reads done. The work happened, the landmine is still buried, but you never felt it go in, so you never learned from it.

Do that for a year and the instinct that would have caught the gamed artefact is the exact instinct you stopped exercising. The volume you must assess is rising. The capacity to assess it is falling. That is the trap closing, and AI offers every GRC reviewer that exact trade.

Your Platform Won't Save You

Ask what AI your GRC tool actually ships. Outside coding, a genuinely agentic experience is rare. What you have is a deterministic workflow with an inference step or two, built on the same models, with the same blindness and no feedback loop of its own. You cannot catch plausible-but-wrong output with a tool that produces it. (That's the real AI dilemma.)

How to Stay Sharp

Do the first pass yourself on what matters. Triage by consequence, but on the high-risk assessments, read the raw artefact before any AI summary touches it. The friction is the rep.

Demand provenance. Was this drafted by a model? You won't always get a straight answer, and the dodge is itself a signal. Make declaring it normal.

Run the context test. Does it describe their specific reality, or a generic answer that would pass for anyone? A control description that never names a real system, owner, or failure mode was written to pass.

Build the ground truth. Keep a small set of cases you have vetted by hand: the real control, the correct risk call, the answer you would stake your name on. Then measure AI output against them. A basic eval is about twenty lines:

// golden-set.ts: controls you vetted by hand, the answers you'd stake your name on
const golden = [
  {
    text: "Access reviews are performed quarterly for all production systems.",
    truth: "WEAK",   // names no system, owner, or enforcement: written to pass
  },
  {
    text: "Quarterly access reviews run via the IAM GitHub Action, filed to the audit repo and signed by each system owner.",
    truth: "STRONG", // names the mechanism, the artefact, and who is accountable
  },
  // ...add 10-20 real ones from your own programme
];

let hits = 0;
for (const c of golden) {
  const ai = await classifyControl(c.text);   // your model call: "STRONG" | "WEAK"
  if (ai === c.truth) hits++;
  else console.log(`MISS  you=${c.truth}  ai=${ai}  ::  ${c.text}`);
}
console.log(`Agreement with ground truth: ${hits}/${golden.length}`);

Run it whenever a new model or a vendor "AI feature" shows up. If it can't clear your golden set, it doesn't touch your real assessments. This is ground truth you own, not tooling you bought, and it tells you where the model is genuinely reliable in your domain instead of where it merely sounds it.

The One Way to Point It That Makes You Sharper

The reframe is simple. AI is not the enemy of judgment. What you point it at is. Aim it at producing and screening your work, and it severs the loop. Aim it at teaching you through your work, and it rebuilds it.

That second thing is exactly why I built The GRC Companion. Free, open, and deliberately learning-only. You finish a vendor questionnaire, you run it, and instead of doing the review for you, it asks the one sharp question, names the pattern, and turns the work you already did into a rep. It refuses to sign off, score, or decide. It sharpens the loop instead of replacing it.

Browse it at grc.engineering/companion. Same tools as everyone else, pointed the other way.

So: Do You Believe It Has to Be Good?

The answer to a flood of AI artefacts is not more AI on your side. That is an arms race you lose by joining, and it rests on a productivity story that doesn't survive contact. The bottleneck was never the document; it was judgment about what matters. AI speeds the typing and leaves the thinking where it was, while making you feel fast.

So the real question is not how much AI you can add. It is whether you still believe the work has to be good. The moment "it's all theatre anyway" wins, the programme rots. Laziness is an infection. Quality comes from doing the things you didn't strictly have to. That has never been the rational move. It has always been the difference.

Last year's skill was using AI. This year's is assessing what you didn't run, and protecting the judgment that lets you. Polish is cheap now. The instinct that sees through it is not.

Did you enjoy this week's entry?

Login or Subscribe to participate in polls.

That’s all for this week’s issue, folks!

If you enjoyed it, you might also enjoy:

See you next week!

Reply

or to participate.