Automating Drupal Code Refactoring and Reviews with LLMs

July 7, 2025 | Scott Weston
Automating Drupal Code Refactoring and Reviews with LLMs

Drupal versions 10 & 11 bring new APIs, deprecations, and best practices. Keeping pace, especially in large codebases, can be time-consuming and error-prone. This is where Large Language Models (LLMs) and AI engines like ChatGPT, Claude, and Code Llama can serve as an “always-on” code reviewer capable of catching issues and suggesting refactors. Let's dive into how Drupal engineers can integrate LLMs into their workflow to speed up reviews, improve consistency, and reduce risk - all while retaining human oversight.

Why Use LLMs for Drupal Code Reviews?

Most Drupal teams already use static analysis tools like PHPCS, PHPStan, and drupal-check to enforce coding standards and catch obvious deprecations. LLMs complement these tools with better understanding of code context and the ability to provide deeper insights:

Spotting complex issues

An LLM can trace logic flow, flag missing access checks in form handlers, or notice edge-case bugs that linters miss.

Suggesting refactors

Instead of merely saying “use the service container,” an LLM can show how to convert procedural code to a service class with dependency injection. It can recommend replacing raw SQL with Drupal’s query builder or swapping a call to drupal_set_message() for the Messenger service.

Enforcing consistency

AI-driven reviews apply your team’s style guide uniformly. If a variable named $profile is actually a company ID, an LLM might suggest using $company_id instead for better clarity.

Asking the right questions

A well-tuned LLM can act like a senior reviewer, raising concerns like “what happens if $user->isAnonymous() here?” to drive developers to think critically about easily overlooked concerns like permissions.

By offloading repetitive, surface-level evaluation to AI, your senior engineers can focus on architecture and business logic, enhancing code quality without slowing down sprints.

Integrating LLM Checks into CI/CD

Embedding AI reviews in your CI pipeline ensures that feedback arrives early, often within seconds of code changes. Some possible integration patterns include:

Pre-commit hooks:

A local Git hook can gather staged PHP files, send them to a self-hosted LLM (e.g. Code Llama in Docker), and produce a consolidated markdown report. Developers see AI feedback before even pushing code, for example:

# Git pre-commit script snippet for PHP files

FILES=$(git diff --cached --name-only | grep '\.php$')
[ -z "$FILES" ] && exit 0 

# Prepare an empty review file 
> code_review.md 

for FILE in $FILES; do 
	content="$(cat "$FILE")"
	prompt="Review this Drupal PHP code for any bugs, deprecated API usage, or coding standard issues. Suggest improvements with code examples."
	# Query local AI (e.g. Code Llama via Docker)
	suggestions=$(docker exec my-llm-container llama "Code:\n$content\n$prompt")
	echo "## Review for ${FILE}:" >> code_review.md
	echo "$suggestions" >> code_review.md
	echo "\n---\n" >> code_review.md 

done 

exit 0

Pull request bots

In GitHub Actions or GitLab CI you can run a job on PR creation/update. The job could extract the diff, query a cloud LLM via API, and post inline comments or a summary on the PR. For example, a GitHub Action might use OpenAI’s API key to annotate changed lines with suggested fixes.

Scheduled scans

A nightly or weekly CI job could run a full-repository AI review, checking for newly introduced deprecations or security concerns and uploading the report as an artifact. This is akin to running static analysis regularly - but with broader AI insight.

Tip: Avoid triggering AI on every tiny commit if your repo is large—or if API costs matter. Consider running it only on merge events, PRs tagged with and “AI‐review” flag, or only on files below a certain size. Caching helps too - if a file hasn’t changed, skip the AI evaluation step.

Prompt Engineering: Getting High-Value Feedback

LLMs are only as good as the instructions you give them. Effective prompts typically include:

Role & context

“You are a senior Drupal engineer performing a code review on this module.” This cues the LLM to adopt a Drupal-specific mindset rather than providing generic programming advice.

Focus areas

Specify exactly what to check for, such as:

  1. Deprecated APIs (e.g. avoiding db_query() in favor of the Database service)
  2. Security (SQL injection, XSS, CSRF)
  3. Coding standards (PSR-12, Drupal’s Coder rules)
  4. Architecture (promoting services, decoupling logic)

For example:

“Review this code for Drupal 11 best practices: use dependency injection, avoid \Drupal::service() except in edge cases, ensure proper escaping in templates and verify API usage is not deprecated.”

Desired output & format

Ask for concise feedback, e.g.:

“Provide a brief markdown-formatted list of issues with line references and suggest code snippets that fix them.”

Examples or templates:

If you want more of a hand in guiding the approach to refactoring, include a small example. For instance, show AI what the before/after for replacing drupal_set_message() with the Messenger service might look like. This trains the LLM to pattern its suggestions in a similar way.

Iterative workflows

Break prompts into smaller tasks:

  • Pass 1: “List potential issues.”
  • Pass 2: “For each issue, show how to refactor.”

Many CI scripts automate this multi-pass approach to keep responses focused.

Example Prompt:

**System Role:** You are a Drupal AI assistant.

**Task:** Analyze this Drupal 10 code and refactor it for Drupal 11 readiness, ensuring security best practices and coding standard compliance.  

**Code:** 

<?php 
use Drupal\Core\Database\Database; 
function mymodule_update_node($nid) { 
    $conn = Database::getConnection(); 
    $result = $conn->query("SELECT title FROM {node} WHERE nid = $nid")-    >fetchField(); 
    drupal_set_message("Title: $result"); 
}

**Requirements:**

  1. Replace deprecated APIs with newer alternatives (use the Messenger service instead of drupal_set_message(), use the query builder for database queries).
  2. Ensure the code follows Drupal coding standards and best practices (e.g. proper escaping, no hardcoded queries).
  3. Provide the refactored code and briefly explain the changes.

In the above prompt, we clearly set the role, provided the code, and specified exactly what we wanted (updating deprecated APIs and following best practices). A well-trained model should respond with a corrected code snippet (e.g. using \Drupal::messenger()->addMessage() instead of drupal_set_message and using parameterized queries or the database API for the query), along with an explanation. Prompting with concrete requirements like this will get your team more focused results than a generic “refactor this code” request.

Finally, remember that prompt engineering is often iterative and experimental. If you find the AI is giving incomplete or off-target advice, refine your prompt and try again. Over time you’ll learn which phrases or instructions yield the most useful output for Drupal-specific code.

Cloud vs. Self-Hosted LLMs

When adopting AI for Drupal code reviews, you can either consume a cloud API or run an open-source model on your own infrastructure. Let’s briefly compare those two options at a high level:

Cloud-Based LLMs

Pros

  • Ease of use: No infrastructure to manage
  • Cutting-edge models: API calls to the latest LLMs often outperform smaller, local models
  • Automatic updates: You get improvements as the provider upgrades the model

Cons

  • Privacy and compliance: Your code is sent to a third party. Even with opt-outs, some organizations prohibit this
  • Cost and latency: API calls incur costs and may add latency to CI jobs, which can be expensive at scale

Self-Hosted LLMs

Pros

  • Data control: Code never leaves your network
  • Fixed costs: Once you have the hardware (GPUs, enough RAM) interference is “free” per query
  • Customization: Fine-tune on your Drupal codebase to improve relevance

Cons

  • Infrastructure overhead: Requires GPUs or high-memory servers and MLOps expertise
  • Potentially lower accuracy: Smaller models may lack domain knowledge, so fine-turning can require extra work

Many teams begin with a cloud service when experimenting and shift to a self-hosted option once full confidentiality is required or they want to reduce per-call costs. Some community modules, like ai-refactor or gpt-code-reviewer, let you point to either cloud APIs or local LLMs, making that switch easier down the road.

Best Practices & Key Caveats

While AI-driven reviews can boost productivity, they must be deployed responsibly:

Protect sensitive code

Sanitize inputs: Remove comments containing secrets, anonymize variable names if needed, and strip out configuration files.

Use on-prem or encrypted services: If your code processes personal data or is under strict compliance (HIPAA, GDPR, PCI), you’ll want to consider self-hosting or an enterprise AI tier with strong data protections.

Maintain human oversight

Validate AI suggestions: Treat AI output like a junior dev’s pull request. Always review, test, and confirm that refactors make sense within your application’s context.

Guard against hallucinations: LLMs can invent function names or misuse APIs. A quick sanity check of your IDE or a run of drupal-check helps catch nonsensical suggestions or changes.

Iterate & tune prompts

Start narrow: Initially ask the LLM to only check deprecations or coding standards.

Refine over time: If the AI flags false positives or ignores particular patterns, update your prompts or add example “good code” snippets.

Gather feedback: Encourage team members to note recurring misses or off-target advice and refine subsequent prompt revisions to account for those oversights.

Manage CI costs & performance

Limit scope: Only run AI reviews on changed files under a certain size, or only pull requests tagged specifically for LLM review.

Cache results: If a file hasn’t changed, skip any AI evaluation.

Set timeouts & fallbacks: Ensure a slow AI response doesn’t block your entire build. A failed AI step could simply warn rather than completely fail the pipeline.

Follow licensing & terms

Third-party code: Feeding GPL-licensed or non-copyleft code into some public LLMs may have legal implications, so be sure to check your provider’s terms of service.

API key security: Store keys in your CI secrets vault and rotate them regularly. Restrict their permissions to only the AI API.

By combining LLM-based reviews with existing static analysis and human code review practices, you can achieve a powerful feedback loop. AI catches trivial, repetitive errors while humans focus on nuanced architecture and functionality. As Drupal 10 transitions to Drupal 11, AI can be a valuable tool for facilitating site migrations, helping engineers quickly modernize code, spot security gaps, and uphold coding standards without slowing down development velocity.

Harnessing LLMs for Enhanced Code Quality and Maintenance

LLMs are reshaping how Drupal engineers approach code quality and maintenance. By integrating AI into CI/CD, crafting precise prompts and choosing the right deployment model, cloud or self-hosted, you gain a reliable “first pass” reviewer that catches deprecations, enforces standards, and suggests architectural improvements. Combined with human oversight, this yields faster iterations, fewer regressions, and a more consistent codebase as you move from Drupal 10 to Drupal 11 and beyond.

Have you tried integrating AI into your Drupal workflow? Let us know about your experiences and best practices - we’d love to hear more about how the community is harnessing LLMs to keep their Drupal codebases in shape.