Workshop on Generative Ai

Aditya Kanade

Microsoft Research, India

Title: Making LLMs Usable in Large-scale Software Engineering

Abstract The rise of LLMs has resulted in a flurry of activities to build innovative applications. The LLMs have proven to be valuable in building helpful Copilots for IDE-based code completion. Can we replicate this success in large-scale software engineering that goes beyond completing code within a local context? In this talk, I will discuss some of the challenges in making this happen and our recent work that attempts to address them. In particular, I will discuss how to get LLMs to automate coding at the level of repositories, prevent hallucinations and evaluate the ability of LLMs to meet non-functional requirements of software.

Disha Shrivastava (Virtual talk)

Google DeepMind, UK

Title: Effectively Utilizing Contextual Cues for LLMs of Code

Abstract Source code provides an exciting application area of deep learning methods, encompassing tasks like program synthesis, repair, and analysis, as well as tasks at the intersection of code and natural language. Although LLMs of code have recently seen significant success, they can face challenges in generalizing to unseen code. This can lead to inaccuracies especially when working with repositories that contain proprietary software or work-in-progress code. In this talk, I will discuss some ways of effectively harnessing useful signals from the available context such that it can improve the performance of the LLMs at the given task.

Dinesh Garg

IBM Research, India

Title: Generative AI for Cobol-to-Java Translation

Abstract A large number of mission-critical business applications were written decades ago using the relevant technology of that time, for instance in COBOL, and are in use in large companies across the world and across industry sectors. It is estimated that there are over 200 Billion lines of COBOL code in the world today. There is significant interest in these companies to translate such code into modern language such as Java to leverage the more widely available Java skills in the market. Developing tools and techniques for automatic translation of COBOL applications into Java is a challenging problem due to factors such as COBOL being a procedural language, whereas Java being object-oriented. To further complicate the matter, both these languages have their own packages/libraries. In this talk, I will share my experiences in building a generative AI-based solution called Watsonx Code Assistant for Z (WCA4Z) developed at IBM Labs. The input to WCA4Z is an enterprise-level COBOL application whereas, the output is an equivalent Java application. The hallmark of our end-to-end WCA4Z solution is that it generates a Java code that is human-readable and maintainable which is not the case with several alternative solutions. Thus, WCA4Z has a potential to save significant developer effort and reduce cost of application modernization. The novelty of the solution lies in a fine blend of program analysis techniques with Large Language Model (LLM) ideas. I hope this talk will convince you that a clever use of Generative AI can catalyze the task of legacy code modernization in an extremely cost-effective manner.

Hridesh Rajan

IOWA State University, USA

Talk cancelled due to unavoidable reasons..

Jyothi Vedurada

IIT Hyderabad, India

Title: Leveraging Large Language Models for Effective Software Development Practices.

Abstract In this talk, we will explore three effective techniques aimed at enhancing software security, bug detection, and documentation practices by leveraging the capabilities of large language models and graph neural networks. The first technique focuses on bug detection, specifically addressing buffer-overflow bugs in C/C++ code snippets using large language models. By incorporating bug patterns during fine-tuning, the models achieve improved bug detection performance. The second technique introduces Cell2Doc, a machine-learning pipeline designed to streamline manual documentation efforts within notebook code cells. Cell2Doc takes advantage of existing pre-trained language models and identifies logical contexts within a code cell to significantly improve documentation efficiency. Lastly, the third technique involves a Graph Neural Network (GNN)-based model to mitigate API misuse during software development, a prevalent issue that often leads to security vulnerabilities and system failures.

Vikrant Kaulgud

Accenture Labs, India

Title: Gen AI assisted SLDC – A peek into the past, present and future