CS 596 Client-Server Programming
CGI

[To Lecture Notes Index]
San Diego State University -- This page last updated February 22, 1996

Contents of TITLE Lecture

CGI

CGI = Common Gateway Interface

Standard for interfacing external applications with information servers.

Most common application: HTTP servers

Why CGI?

Static pages served by HTTP servers are boring...

CGI allows for dynamic generation of web documents.

Good CGI reference material at

http://hoohoo.ncsa.uiuc.edu/cgi/

Uses of CGI

Some examples of dynamic documents at SDSU:

The SDSU home page (http://www.sdsu.edu/)
www.sdsu.edu homepage statistics (http://www.sdsu.edu/cgi-bin/genstats.pl)
Class schedule (http://www.sdsu.edu/cgi-bin/schedule/)

Why a dynamic SDSU homepage?

Remote users get a blurp about SDSU, internal users don't
Browsers that support tables get a version with tables, others don't
Accesses to the homepage are counted (and http://www.sdsu.edu/cgi-bin/genstats.pl uses this data to generate the statistics)

CGI: the protocol

Basic steps in the life of a CGI program:

Web server gets request
Web server figures out that the request is for a document that it knows is a CGI program
Web server builds an environment with several special purpose variables
Web server starts the CGI program in this environment
CGI program interprets the environment variables
CGI program sends document type information to STDOUT
CGI program sends generated document to STDOUT
CGI program quits.

Trivial CGI program

Here is possibly the most trivial CGI program that can be written under Unix:

#!/bin/sh
echo "Content-type: text/plain"
echo ""
echo "Hello, World"

If this program were to be referenced by a web browser, the resulting page would show just

Hello, World

at the top left corner.

Things to note:

This CGI program didn't look at any of the information passed to it from the web server
There needs to be a blank line between the header and the actual document.
More things can be added to the header. Look at the HTTP protocol for all possible header lines.

MIME types and CGI

When a CGI program creates a dynamic document, it has to tell the client what the document type is.

Some MIME types:

text/plain
text/html
image/gif
image/jpeg
video/mpeg

Most of the time text/html is used

HTML Forms

Besides just generating documents dynamically, CGI's main purpose is to deal with HTML forms.

Sample HTML form:

<form action="/cgi-bin/doSomething">
<input type="text" name="someText">
<input type="submit" value="Enter">
</form>

This will produce a text field and a button labeled "Enter".

The action attribute specifies the CGI program that gets run when the button is clicked.

Environment variables

The CGI program does NOT get any information from its command line.

Environment variables are used:

SERVER_SOFTWARE
SERVER_NAME
GATEWAY_INTERFACE
SERVER_PROTOCOL
SERVER_PORT
REQUEST_METHOD
PATH_INFO
PATH_TRANSLATED
SCRIPT_NAME
QUERY_STRING
REMOTE_HOST
REMOTE_ADDR
AUTH_TYPE
REMOTE_USER
REMOTE_IDENT
CONTENT_TYPE
CONTENT_LENGTH

Notable CGI variables

REQUEST_METHOD
This is either "GET" or "POST"
The difference is in how form values are retrieved.

QUERY_STRING
If the REQUEST_METHOD is "GET", this is a list of name/value pairs separated by `&'.
The name and values are separated by `=`

CONTENT_LENGTH
If the REQUEST_METHOD is "POST", this contains the length of data available on STDIN. This input then needs to be interpreted the same as the data from QUERY_STRING above.

PATH_INFO
Is used to pass extra information that was encoded in the URL that started the CGI program.
This is normally in the form of values separated by `/'.

Example QUERY_STRING

If our CGI program was called in response to the following HTML form:

<form action="/cgi-bin/doSomething">
<input type="text" name="someText">
<input type="text" name="someMoreText">
<input type="submit" value="Enter">
</form>

and the user had entered "hello there" in the first text field and "Goodbye" in the second, the QUERY_STRING would be:

someText=hello+there&someMoreText=Goodbye

Note that the QUERY_STRING data will be URL encoded, meaning that characters that would confuse the syntax need to be encoded as a `%' followed by two hex digits.
In addition, all spaces in values are replaced with `+'.

Java vs. CGI

Problems with using a Java program as a CGI program:

No arguments are passed to the CGI program. Java programs need to be started with the java interpreter
Java programs do not have access to their environment.

Solution:

jcgi

jcgi is a little C program that takes care of these problems:

It uses its argv[0] to find the directory and name of the class to start.
It takes the environment variables and passes them to the java program as properties.

jcgi

To use the jcgi program, you need to create a symbolic link from the jcgi executable to a file that is the name of the class that contains main().

Example:

public class TestThis
{
  public static void main(String a[])
  {
    PrintStream out = System.out;
    out.println("Content-type: text/plain");
    out.println("");
    out.println("Hello, World");
  }
}

% ls
TestThis.java    TestThis.class
% ln -s /opt/local/lib/java/jcgi TestThis.cgi
%

The .cgi extension is used so that the program will be seen as a CGI program by the web server on moria.

Java properties

Once we have a java CGI program, we now want to access the possible information passed to it from a form.

The relevant environment variables are available as properties in the java program.

Use System.getProperty() to get these.

for example:

String QueryString = System.getProperty("QUERY_STRING");

Once the query string is available, we can use the StringTokenizer class to split it up into a set of name-value pairs.

Each name-value pair can then be split up into the name and value and placed into a Hashtable for each access.

sdsu.CGI

The sdsu.CGI class will take care of all these things.

The notable method is

String get(String) which will get the value for a name.

To use sdsu.CGI in a program you can do the following:

import sdsu.CGI;
import java.io.PrintStream;
public class CGITest
{
  public void main(String a[])
  {
    PrintStream out = System.out;
    CGI cgiVariables = new CGI();
    out.println("Content-type: text/plain");
    out.println("");
    out.println("The text was: " +
               cgiVariables.get("someText"));
  }
}

Web applications

All the CGI stuff is pretty neat, but how can it really be used?

The WWW browser together with a CGI program can be seen as the GUI to an application.

Notable issues here:

Each CGI request creates an HTML page with one or more HTML forms that refer to itself.
The CGI program has to deal with the stateless nature of the web. All state information has to be passed on to the next invokation.

This state information needs to be mostly hidden from the user:

Use the hidden input fields in forms
Use the PATH_INFO to pass hidden information

Class schedule web application

The class schedule is a CGI program located at http://www.sdsu.edu/cgi-bin/schedule

To get to the Summer 96 schedule, the following URL is used:

http://www.sdsu.edu/cgi-bin/schedule/semester=summer96

The CGI program will be run with PATH_INFO set to

/semester=summer96

This information is then used by the CGI program to select the database to use.

All links from that point on will use that URL as the base, so that the operations are performed on that semester.

For example if we browse the courses offered in the Biology department we get the following URL:

http://www.sdsu.edu/cgi-bin/schedule/browse=dept/command=search/dept=BIOL/semestr=summer96

Command dispatching

Since a web application will probably have several different things it can do, there needs to be some sort of command in the URL that calls the CGI program.

Either PATH_INFO or hidden form elements can be used for this.

The main program will then need to dispatch to different code depending on the command.

In C or Perl this can easily be done by using a table that maps commands to functions.
In java this can be done by mapping commands to objects.

Issues with web applications

There are several things to be aware of when designing web applications:

Concurrent access to data
Load on the web server
Privileges
Network sniffing of passwords
User modification of URLs

CGI programs for non-HTML

As we have seen, the first thing a CGI program needs to do is to advertise what type of document it is going to send.

The access counter that Mark Boyns wrote uses this feature to create a dynamic GIF image when it is run.

The CGI program keeps track of the number of times it was called for that particular page and then it creates the image.

Look at http://www.sdsu.edu/~boyns/counter.html for more information on this program.

CS 596 Client-Server Programming CGI

Contents of TITLE Lecture

CS 596 Client-Server Programming
CGI