Using HTTP and CGI as Middleware

HTTP is well-known as the protocol implemented on top of TCP/IP that provides the basis for connections between Web browsers and Web servers. Similarly, CGI is established as the standard interface between Web servers and server-side executable content. Unfortunately, because most of our experiences with the Web and these technologies occur as we navigate with our browsers, we overlook the possibility to use them in a browser-less situation. This technical brief considers the potential of HTTP and CGI as middleware. It begins an overview of the relevant (and sometimes overlooked) features of HTTP, then compares the HTTP/CGI combination to other middleware technologies. The discussion of HTTP is by no means exhaustive, but is simply intended to highlight important features and provide a general background. There are many excellent dissertations on HTTP, as well as HTML and CGI, available on the Web and in book stores and libraries.

The HTTP Conversation

Each HTTP conversation consists of exactly two messages, a request from the client and a response from the server. Although the HTTP 1.0 standard defines several request types or verbs, the two by far most common are GET and POST. Seeing as how the server can return any data it wants for either, the difference is mainly that POST can send data as part of the command.

HTTP Message Contents

To provide simple cross-platform interoperability, messages headers are MIME-formatted ASCII text. The first line of a request message is the request itself. This has the format

<METHOD> <URI> HTTP/1.0 <CR><LF>

Method is typically GET or POST and the URI is the document path. The first line of a response message is the response result code, which has the format

HTTP/1.0 <code> <message> <CR><LF>

Code is a numeric result code and message is an optional text string describing the result. The value of 200 is the status code for OK.

MIME Headers

The request and response lines are followed by any number of MIME headers. The header section ends with a blank line. Headers have the format

<header>: <value> <CR><LF>

Header is the name of a header, and value is its value for the current request or document. Common headers for requests are Accept and User-Agent, which allow specification of MIME types to accept for the response and the name of the client software originating the request. Common headers for the reply include Server, MIME-version, Content-type, Last-modified and Content-length. Of these, only Content-type is absolutely essential, since the client software must be informed as to the format of the data. Content-length can help confirm that the contents were completely received, but the content is usually followed by a TCP/IP disconnect, so its length is implied.

MIME Attachments and MIME Types

While this paper will not describe this, it is possible to use MIME attachments and MIME types as a rather coarse object implementation. The attachment provides the contents or data portion of the object, while the type implies the actions that are valid on that data. This does not, however, provide a means to ship actual code for manipulating the data.

Document Body

The document body follows the headers. It is completely free form, supporting any stream of binary data.

Where Does This Fit as Middleware?

Some of the characteristics by which middleware technologies can be categorized or evaluated are

  • Asynchronous v. synchronous communication
  • Guaranteed v. unreliable delivery
  • Support for transaction semantics
  • Location transparency
  • Data conversion transparency
  • Level of API abstraction
  • Accessibility from various programming languages
  • Support on multiple hardware and operating system platforms
Applied to HTTP/CGI, we find
  • Synchronous communication: the client sends a command and awaits the response. Although multithreading techniques allow programs to treat these separately, they are not truly asynchronous.
  • Unreliable delivery: the response is required to determine whether the command is successful, and the application must handle any failure itself.
  • No transaction semantics: multiple commands cannot be grouped together as an atomic unit of work.
  • No location transparency: there is no built-in abstraction above Internet addressing.
  • No data conversion: all data is transmitted as uninterpretted octets, although there is the ability to label the type of each block of data (MIME attachment).
  • Low API abstraction: there are various API implementations in the form of PERL packages, Microsoft's WinInet, CERN's libwww libraries and the CGI "API" itself, none of which abstract much above the level of sockets.
  • Many languages can be used to write HTTP clients, servers and CGI applications.
  • Virtually all hardware and operating system platforms support HTTP/CGI.

Evaluated this way, the HTTP/CGI combination is very similar to remote procedure calls (RPC), although it lacks transparent data conversion. Also, RPCs usually have an optional component to make location more transparent, e.g. the X.500-based DCE directory, ONC's NIS+ or even ONC's portmapper.

An Example Using HTTP/CGI

Suppose that in a chain of networked retail stores, it is necessary to be able to query any store for the number of units of a particular item it has in stock. The requesting store sends an identifier for the item, often known as a stock keeping unit (SKU), and desires to get back the quantity on hand (QOH). This functionality can be provided with a simple HTTP client and CGI application. The client establishes a connection to a Web server, sends a GET request, then accepts and parses the reply. The Web server invokes the CGI application in response to the request, and the CGI application returns the result to the client via the server. The client can be coded very simply using something like Microsoft's WinInet. The CGI application has all the business logic, e.g. it knows how to query the database for the QOH or how to calculate the QOH from some other values.

Under the covers, the HTTP request might look like

GET /cgi-bin/getqoh?sku=6673456 HTTP/1.0
Accept: text/plain
User-Agent: queryqoh

while the response might be

HTTP/1.0 200 OK
Server: NCSA/1.4
MIME-version: 1.0
Content-Type: text/plain
Last-modified: Wednesday, 29-May-96 18:22:20 GMT
Content-length: 1

3

The CGI application could be written as the following in C

long getqoh(int sku)
{
  //
  // The real business is done here
  //
}

int main(void)
{
  long qoh;
  long sku;
  char qs = getenv("QUERY_STRING");

  printf("Status: 200 OK\n");
  printf("Content-Type: text/plain\n\n");

  if (qs) {
    qs = strchr(qs, '=');
    if (qs) {
      if (sscanf(qs+1, "%ld", &sku) == 1)
        printf("%ld", getqoh(sku));
      else
        printf("-999999991");
    } else
      printf("-999999992");
  } else
    printf("-999999993");

  return 0;
}

The Future of HTTP/CGI as Middleware

As the Web browser/server infrastructure integrates HTTP/CGI middleware alternatives, such as Java-to-Java networking or CORBA IIOP, HTTP/CGI will be relegated to static/dynamic document serving. This is because the alternatives will provide capabilities, such as guaranteed delivery and transparent data conversion, that are essential in heterogeneous, distributed computing enviornments, as well as improved utilization of network resources. Custom client/server applications will rely on the alternative methods to provide network connectivity.


Copyright (c) 1996 Scott Nichol.
2-Jun-96