AI Coding Agents Are Moving Into The Pull Request Workflow

AI coding tools are becoming more like junior contributors with repo access. That is the part all teams should be paying attention to. 

In June, GitHub introduced the GitHub Copilot app, a desktop experience built around agent-native development. GitHub describes a workflow where agents can work from GitHub context, use isolated worktrees, and bring changes back through pull requests for review.

A few days later, GitHub made security validation for third party coding agents generally available. When supported third party agents create code in a repository, GitHub runs checks such as CodeQL analysis, dependency review against the GitHub Advisory Database, and secret scanning for values like API keys or tokens. If a check finds an issue, the agent attempts to fix it before finalizing the pull request.

Those are useful, in terms of guardrails. This also shows where this is going. Agent written code is starting to enter the same workflow as human written code, which means teams need to decide how it gets reviewed and handled, in general.

Coding agents can help, but only when the team treats them as part of the development process instead of a shortcut.

The Pull Request Is The Right Place For Agent Work

Overall, a pull request is a better place for AI generated code than a chat window. It gives the work a branch, a diff, test results, security checks, and review comments.

For many teams, this will be useful first on less important work. For example - a flaky test can get attention, a small bug can get a first pass, and documentation can be updated when the code changes instead of months later. Those aren't glamorous or amazing wins, but they are real ones.

The review is still important. A pull request can make agent written code look more finished than it actually is. A change can be technically correct and still be wrong for the product.

That is where teams will need discipline. Agent work belongs in the pull request workflow because that is where assumptions can be challenged before they become production behaviour.

Review The Assumption!

The biggest review mistake is looking only at whether the code runs. With agent-created pull requests, you also need to know whether the agent understood the task properly. That means checking the issue, the files touched, the tests added, and any new dependency or configuration change. The problem is rarely that the agent writes obviously broken code - the more common problem is that it solves a slightly different problem than the one the team had.

This shows up in small ways. An agent may copy an older pattern because it appears several times in the repository, or it may simplify code that looks redundant but exists for a client-specific workflow.

That is the review shift! do not only ask 'does this work?' Ask 'does this belong here?'.

That question is especially important when the change touches anything that affects support, billing, and operations. Those areas often depend on context that is not obvious from the code alone.

Security Validation Is Helpful... But It Will Not Catch Product Risk

GitHub's security validation update is a good step. CodeQL, dependency review, and secret scanning are exactly the kind of checks teams should want around agent generated code. They catch problems that should not rely on a rushed human reviewer.

They also help standardize the workflow.

Still, there is a limit to what automated validation can see. It may catch a vulnerable dependency, but it may not know that the dependency was unnecessary.

A green check is not a product review. It is only 1 piece of evidence. The team still needs someone who understands the application, the users, and the code.

Most Agent Problems Are Context Problems

Agent mistakes often point back to the same issue - that the tool did not have enough useful context.

That can mean the issue was vague, the repository has stale documentation, or even that a business rule lives somewhere else entirely, such as in an old support case / ticket.

That is not only an AI problem, but a project health problem.

If an agent keeps making the same wrong assumption, a new developer probably would too. That makes agent adoption a useful stress test for the quality of a project's internal context. Clean setup instructions, current README files, useful comments, and clear issue writing all help the agent. More importantly, they help the humans who have to maintain the project. And yes, this is something we have firsthand experience with.

For older custom applications, this can be one of the best side effects. The team may discover that the barrier to safer AI assisted development is the amount of project knowledge that was never written down.

Do Not Start With The Riskiest Work

The best first use cases are boring on purpose. Small bugs with clear reproduction steps, missing tests, documentation updates, and narrow refactors are easier to review. They give the team a chance to see how the agent behaves without putting the highest risk areas of the system under pressure.

Other areas, like authentication and billing, deserve far more caution. Agents can still help in those areas, but the task needs a tighter frame. The issue should explain the expected behaviour, the reviewer should understand the business rule, and the team should be careful about letting the agent wander through the codebase looking for its own solution.

This is where a lot of AI adoption goes wrong. Teams test the tool on a task that is too vague, too sensitive or undocumented. Then, they either overtrust the result, or dismiss the tool completely. A better test is to choose work with clear edges and judge the output against a known expectation.

A Small Policy Is Better

Most teams don't need a long AI coding policy before they experiment. They do need a few rules that prevent agent work from slipping around normal review, though.

  • Every agent pull request needs a human owner. Someone has to be responsible for the result after merge.
  • Required reviews, CI, and security checks should stay in place. Agent work should not get a lighter path to production.
  • New dependencies should be reviewed carefully. Agents can reach for packages when existing project code would be better.
  • Start with bounded work and track the cleanup. The real test is whether the team saved time after review.

That last point is important. A pull request that looks fast but needs heavy correction may not be a win. Teams should pay attention to review time and whether the final code is easier or harder to maintain.

Why This Is Important For Custom Development

AI coding agents are going to be tempting. There is always maintenance work waiting, and there are always parts of a project that could use more test coverage or cleaner documentation.

Agents can help with that backlog. They can also expose weak spots in the way a project is managed. If requirements are unclear, tests are thin, documentation is stale, or review habits are inconsistent, agent output can add more motion without adding much progress.

This is why the topic fits closely with AI / ML automation. The useful part is the value that comes from fitting AI into a workflow with the right context.

For dev teams, that means treating coding agents like a new contributor to the workflow. They need boundaries. Review. TClear tasks. And you still needs to own the code after it ships.

What We Would Watch Next

The next stage will be about how teams manage the work they produce. GitHub's recent updates point in a sensible direction in that agent work is becoming more visible, more reviewable, and more connected to existing repository controls. That is the right place for it. The danger, however, is that teams start treating those pull requests as cheaper, faster versions of human work without adjusting how they review them.

Agent written code is absolutely going to become more common over time. But the teams that benefit most will be the ones that keep enough human judgement in the workflow to make the speed useful.