IBM taps AI to translate COBOL code to Java

COBOL, or Common Business Oriented Language, is one of the oldest programming languages in use, dating back to around 1959. It's had surprising staying power; according to a 2022 survey, there's over 800 billion lines of COBOL in use on production systems, up from an estimated 220 billion in 2017.

But COBOL has a reputation for being a tough-to-navigate, inefficient language. Why not migrate to a newer one? For large organizations, it tends to be a complex and costly proposition, given the small number of COBOL experts in the world. When the Commonwealth Bank of Australia replaced its core COBOL platform in 2012, it took five years and cost over $700 million.

Looking to present a new solution to the problem of modernizing COBOL apps, IBM today unveiled Code Assistant for IBM Z, which uses a code-generating AI model to translate COBOL code into Java. Set to become generally available in Q4 2023, Code Assistant for IBM Z will enter preview during IBM's TechXchange conference in Las Vegas early this September.

Code Assistant for IBM Z is designed to assist businesses in refactoring their mainframe apps, ideally while preserving performance and security, according to IBM Research chief scientist Ruchir Puri. Running locally in an on-premises configuration or in the cloud as a managed service, Code Assistant is powered by a code-generating model, CodeNet, that can understand not only COBOL and Java but also around 80 different programming languages.

"IBM built a new, state-of-the-art generative AI code model to transform legacy COBOL programs to enterprise Java with a high degree of naturalness in the generated code," Puri told TechCrunch in an email interview. "In addition to code transformation, Code Assistant supports the complete application modernization life cycle and helps developers understand, refactor, transform and validate the translated code in a modern architecture."

Puri says that CodeNet, which was trained with 1.5 trillion tokens and has 20 billion parameters, was engineered with a large context window -- 32,000 tokens -- to "capture the broader context" for "more efficient COBOL to Java transformation." Parameters are the parts of a model learned from historical training data and essentially define the skill of the model on a problem, such as generating text, while tokens represent raw text (e.g. "fan," "tas" and "tic" for the word "fantastic"). As for context window, it refers to the text the model considers before generating additional text.

There's a number of tools, apps and services to convert COBOL apps to Java syntax today, some of which are entirely automated. Puri acknowledges this but makes the case that Code Assistant takes steps to avoid sacrificing COBOL's capabilities while delivering on reducing costs and producing code that's easy to maintain -- unlike some of the rival offerings on the market.

"IBM built the Code Assistant for IBM Z to be able to mix and match COBOL and Java services," Puri said. "If the 'understand' and 'refactor' capabilities of the system recommend that a given sub-service of the application needs to stay in COBOL, it'll be kept that way, and the other sub-services will be transformed into Java."

That's not to suggest that Code Assistant is flawless. A recent Stanford study finds that software engineers who use code-generating AI systems similar to it are more likely to cause vulnerabilities in the apps they develop. Indeed, Puri cautions against deploying code produced by Code Assistant before having it reviewed by human experts.

"Like any AI system, there might be unique usage patterns of an enterprise's COBOL application that Code Assistant for IBM Z may not have mastered yet," Puri said. "It's essential that the code is scanned with state-of-the-art vulnerability scanners to ensure code security."

Risks aside, IBM no doubt sees tools like Code Assistant as important to its future growth. Today, about 84% of IBM's mainframe customers run COBOL -- mostly customers in the financial and government sectors. And while IBM's mainframe division is still a large portion of its overall business, the company views the mainframe as a bridge to the expansive, lucrative hybrid computing environments that it also hosts and facilitates.

IBM sees a future in broader code-generating AI tools, as well -- intent on competing with apps like GitHub Copilot and Amazon CodeWhisperer. In May, IBM launched fm.model.code within its Watsonx AI service, which powers Watson Code Assistant, allowing developers to generate code using plain English prompts across programs, including Red Hat’s Ansible Lightspeed.