Reengineering Legacy C: A Case Study

When speaking of reengineering legacy systems, it is usually assumed that the systems in question are written in COBOL, running on a mainframe and accessing data stored in VSAM files. However, there are also a great number of systems written for other character based user interfaces and assorted flat file or proprietary database engines that require an overhaul to exploit today's application deployment infrastructure. This paper is a brief look at the reengineering of one such application.

The Impetus for Reengineering

The applications in question are used to control the flow of batch data from the individual stores in a large, national retail chain to a variety of mainframe systems. Files are batched under operator control for reformatting and transmission.

The legacy system ran on a small battery of proprietary hosts whose communication capability was limited to the 3780 bisync protocol. The dial up connections were quite slow and increasingly costly as greater amounts of data were being generated for upload. In order to defray these costs and support other distributed application development initiatives, the corporation's MIS department deployed a wide area network that connects all sites. Thus, it became desirable to retarget the applications to run on platforms supporting the WAN.

Reengineering the System Architecture

The system, which was entirely written in C, could very nearly have been ported straight to the target UNIX platform. Three factors made this undesirable and less than trivial. First, the character based user interface showed its age against the corporate standard Windows interface to which users had become accustomed. Second, the database format was based on binary files reflecting C data structures, and was not accessible from standard end user or development tools. Finally, the database engine was available only in object format for the proprietary host operating system. Implementing this from scratch as part of the port was itself nearly out of the question, and combined with the first two reasons was enough to kill this option.

Since the objections to a direct port related to the user interface and database access portions of the code, the logical path to follow was to salvage the application logic while rewriting the UI for Windows and the database access for Sybase. The C code had a modular structure very conducive to this, so it was in this direction that the project proceeded.

The initial plan was to use Visual Basic to implement the user interface and application flow, and package the legacy C code in a DLL, rewriting database access using Sybase DB-Library calls. During design, it became apparent that it would be easier to access Sybase directly from Visual Basic, removing database access from the C code entirely. This was more natural when, for example, filling a list box from the database.

For the most part, the remaining C code reformatted flat data files according to a set of rules. These files resided on UNIX servers and would be accessed via NFS. This proved impractical for two reasons. First, the intensive read/write activity generated large amounts of network I/O. Second, the naming standard chosen for uploaded files did not fit the 8.3 file name syntax of Windows.

In an attempt to maintain the same general architecture, the naming limitation was addressed without regard to the network traffic. By creating remote procedure call equivalents of fopen, fgets, fputs and fclose, the C code could work with only slight modifications.

Flushed with the success of the RPC implementation and wary of the pending scalability problems that would accompany the chosen architecture, it was decided to remote all of the function calls in the DLL. In other words, the original C code, stripped of all UI and database access, was deployed in a UNIX daemon servicing remote procedure calls. While this required the code be altered for 32-bit integers and pointers rather than 16-bit, this was a well understood aspect of porting. With the new architecture, a single network packet could trigger a C function that performed hundreds of thousands to millions of disk I/O operations local to the UNIX server.

The Bottom Line

Performance of the system has been exemplary, running many orders of magnitude faster than the original. The user interface is far more user friendly, and richer with list boxes replacing text entry fields. Additional reports have been authored by users of the system, which was made possible by use of an ODBC/SQL database engine. The Visual Basic front end, comprising about 2 dozen forms and another 2 dozen modules, has been easily modified and maintained, and MAPI and OLE Automation have been used to integrate e-mail and form letter capabilities. Finally, the C code has worked like a charm from the beginning, and has been ported to a second UNIX variant.
Copyright (c) 1996 Scott Nichol.
17-Jul-96