Using LLMs to Understand and Modernize Legacy Code

Legacy systems in banking are not a technical debt problem in the abstract. They are operational reality. COBOL codebases running on mainframes still process a significant share of global transaction volume. The engineers tasked with modernizing these systems face a challenge that is as much epistemological as it is technical: understanding what the code actually does before touching it.

Large language models, applied through structured prompting workflows, are changing how development teams approach this problem. Not by replacing engineering judgment, but by compressing the time it takes to build it.

Why Legacy Understanding Is the Bottleneck

Modernization projects fail less often because of poor architecture choices and more often because the team did not fully understand what they were replacing. A COBOL program written in the 1980s may encode decades of business rule evolution — rate calculations, exception handling, regulatory adjustments — with no external documentation and original authors long gone.

The traditional approach involves painstaking manual review: reading code, reverse-engineering logic, interviewing whoever still remembers anything about the system. This process is slow, expensive, and produces knowledge that lives in individuals rather than artifacts.

LLMs provide a different starting point. Given a COBOL program or a legacy PL/I routine, a well-prompted model can produce a structured explanation of what the code does, what its inputs and outputs are, and where the non-obvious decision logic lives. This is not the same as fully understanding the system — but it is a materially better starting point than a blank page.

Prompting for Code Comprehension

The quality of what an LLM produces from legacy code is directly proportional to how well you frame the request. General prompts produce general answers. Structured prompts produce usable ones.

A reliable pattern for initial code comprehension:

"You are analyzing a COBOL program from a core banking system. Explain what this program does in plain language. Identify: (1) the business function it performs, (2) the key decision points and conditions, (3) any external files, databases, or system calls it depends on, and (4) any logic that appears regulatory in nature or that handles exceptions. Structure your response as a technical summary followed by a dependency list."

This prompt works because it constrains the output shape and directs the model toward what engineers actually need: functional intent, control flow, and integration surface. Claude and GPT-4 both handle COBOL reasonably well at this level of prompting, though output should always be reviewed against the source — these models can misread variable scoping or copybook-level definitions if the code is highly fragmented.

For longer programs, chunking is necessary. Pass individual sections with context about what preceded them:

"This is section 3 of a 12-section COBOL program. The previous sections handle account validation and interest calculation. Analyze this section and identify what new business logic it introduces, referencing how it builds on or modifies the logic described above."

Maintaining continuity across chunks prevents the model from treating each section as an isolated unit and losing the thread of the overall program.

Generating Documentation from Code

One of the highest-value applications of LLM prompting in legacy modernization is documentation generation — specifically, producing the documentation that should have existed but does not. Prompt pattern for structured documentation:

"Based on the COBOL program provided, generate a technical specification document in the following format: (1) System Overview — what business process this code supports, (2) Input/Output Specification — all inputs consumed and outputs produced with data types where determinable, (3) Business Rules — all conditional logic expressed as numbered rules in plain language, (4) Known Dependencies — external systems, files, and data structures referenced, (5) Open Questions — logic that is ambiguous or requires human verification."

The "Open Questions" section is particularly important in banking contexts. LLMs will attempt to interpret ambiguous logic rather than flag it, unless explicitly prompted to surface uncertainty. Making uncertainty a required output section produces documentation that is honest about its own limits — which is far more useful than confident documentation that is wrong.

Generated documentation should go into version control alongside the code it describes, and into your audit trail. In DORA-regulated environments, being able to demonstrate that you understood a system before modifying it is not just good practice — it may be a compliance requirement.

Mapping Legacy Logic to Modern Architectures

Once you have a working understanding of what the legacy code does, the next challenge is deciding what it becomes. This is where prompting shifts from comprehension to design assistance.

A useful prompt pattern for architecture mapping:

"Given the business logic described in this COBOL program summary, propose how this functionality would be implemented as a RESTful microservice. Identify: (1) the service boundary — what this microservice owns, (2) the API contract — endpoints, request/response shapes, and error codes, (3) data model — what entities this service manages and how they map from the legacy data structures, (4) integration points — where this service will need to call or be called by other services. Flag any business rules that are complex enough to warrant discussion before implementation."

This workflow — comprehension prompt, documentation prompt, architecture mapping prompt — creates a structured handoff between understanding the legacy system and designing its replacement. It also produces artifacts that can be reviewed, challenged, and refined by the team before any code is written.

GitHub Copilot is useful at the implementation end of this workflow, once the architecture is defined. Claude performs better at the comprehension and documentation phases, where longer context windows and more structured output matter. Using both in their respective strengths is a practical division of tooling.

What LLMs Cannot Do Here

LLMs do not understand your specific business context. They will explain what COBOL code does in general terms but will not know that a particular field carries a regulatory meaning specific to your institution, or that an exception path was added in 2003 to address a specific audit finding. Human domain knowledge remains the interpretive layer.

Treat LLM output as a first draft that requires expert review, not a finished artifact. The value is in the speed of that first draft — not in replacing the judgment that must follow.

For teams working under change management constraints, every LLM-assisted artifact entering your documentation or architecture review process should be clearly attributed and reviewed. Regulatory auditors are increasingly aware of AI-assisted development, and the credibility of your documentation depends on being transparent about how it was produced.

The Right Way to Start

Pick one program. Not the most critical one, and not the simplest. Pick something representative — a program your team has struggled to understand, that carries real business logic, and where better documentation would have concrete value.

Run the comprehension prompt. Review the output with someone who knows the system. Note where the model was right, where it was wrong, and where it flagged uncertainty correctly. Use that calibration to refine your prompting approach before scaling it across the codebase.

Legacy modernization at scale requires institutional commitment. But the prompting workflow that makes it tractable can be validated in an afternoon.