SDSU CS 596: Client-Server Programming
Spring Semester, 1997
Doc 13, HTML and HTTP

To Lecture Notes Index
San Diego State University -- This page last updated Mar 4, 1997
----------

Contents of Doc 13, HTML and HTTP

HTML and HTTP Augmented BNF
URL HTTP URL HTML What is HTML?
HTML 101
Learning HTML
HTTP HTTP Message Format
Full-Request
Server Response
Status-Line
Header Fields
Request Methods
Request Methods (continued)
Request Methods Not Always Supported
General Message Header Fields
Request-Header
Response-Header
Entity Header Fields

 


Doc 13, HTML and HTTP Slide # 2

Augmented BNF

"literal"
Quotation marks surround literal text. Unless stated otherwise, the text is case-insensitive
rule1 | rule2
Elements separated by a bar ("|") are alternatives
( rule1 | rule2 )
Parenthesis group elements so they function as one
*rule
"*" preceding an element indicates repetition
<n>*<m>element
At least <n> and at most <m> occurrences of element
[ rule ]
Square brackets enclose optional elements
<n>element
Exactly <n> occurrences of element
<n>#<m>
At least <n> and at most <m> elements, each separated by one or more commas (",") and optional linear white space (LWS)

Doc 13, HTML and HTTP Slide # 3

URL

URL == Uniform Resource Locator

 

Augmented BNF for a URL is defined as:

 

URL        = scheme ":" *(uchar|reserved)["#" fragment]
uchar      = unreserved | escape
unreserved = ALPHA | DIGIT | safe | extra | national
escape     = "%" HEX HEX
extra      = "!" | "*" | "" | "(" | ")" | ","
safe       = "$" | "-" | "_" | "."
unsafe     = "CTL | SP | <"> | "#" | "%" | "<" | ">"
national   = <any OCTET excluding ALPHA, DIGIT, reserved,
             extra, safe, and unsafe>
reserved   = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+"
fragment   = *( uchar | reserved )

Doc 13, HTML and HTTP Slide # 4

HTTP URL

 

The "http" scheme is used to locate network resources via HTTP

 

http_URL = "http:" "//" host [ ":" port ][ abs_path ]
host     = <A legal Internet host domain name or IP
           address (in dotted-decimal form), as
           defined by RFC 1123>
port     = *DIGIT
abs_path = "/" rel_path
rel_path = [ path ] [ ";" params ] [ "?" query ]
path     = fsegment *( "/" segment )
fsegment = 1*pchar
segment  = *pchar
params   = param *( ";" param )
param    = *( pchar | "/" )
pchar    = uchar | ":" | "@" | "&" | "=" | "+"
query    = *( uchar | reserved )

Syntax and semantics of URLs can be found in RFC 1738


Doc 13, HTML and HTTP Slide # 5

HTML

 

Some Buzz Words

 
WWW
World Wide Web (or Web, for short)
SGML
Standard Generalized Markup Language
This is the standard for describing markup languages
DTD
Document Type Definition
This is a specific markup language, written using SGML
HTML
HyperText Markup Language
HTML is an SMGL DTD
HTML uses markup tags to tell the WWW browser how to display the text

Doc 13, HTML and HTTP Slide # 6

What is HTML?

 

HTML is a language for describing structured documents

 

HTML does not describe page layout. (Why not?)

 

This language is used by web browsers to render the document and display it to a user.

 

Several versions of HTML:

 

Other developments:


Doc 13, HTML and HTTP Slide # 7

HTML 101

HTML files are all ASCII (except for kanji and other alphabets)

Tags are used to markup a document

A tag starts with "<" and ends with ">"

Some tags come in pairs: a start tag and an end tag. End tags have a "/" following the "<".

"<" or ">" are reserved for tags only. Tags themselves may not include these characters, either.

In general, all white space (including newlines) is reduced to the equivalent of one space.

 

Basic HTML document:

<HTML>

<HEAD>

<TITLE>Sample HTML Document</TITLE>

</HEAD>

<BODY>

This is a document

</BODY>

</HTML>

 


Doc 13, HTML and HTTP Slide # 8

Learning HTML

 

Learn by example. "View Source" in any browser.

Many books on HTML

Online tutorials (start at http://www.yahoo.com/)

 

Hyperlinks

<A HREF="http://www.sdsu.edu/">SDSU homepage</A>

will render as

SDSU homepage

 

Images

<IMG SRC="http://www.sdsu.edu/graphics/ComputerSci.gif">

will render as


Doc 13, HTML and HTTP Slide # 9

HTTP

HyperText Transfer Protocol

Stateless, object-oriented protocol

The typing and negotiation of data representation, allows systems to be built independently of the data being transferred.

Assigned port 80

 

Basic Client-Server Interaction

 

Client: Open connection

Server: Accept/Reject connection

Client: Send request

Server: Send response to request

Connection closed


Doc 13, HTML and HTTP Slide # 10

HTTP Message Format

 

HTTP-message  = Simple-Request
              | Simple-Response
              | Full-Request
              | Full-Response
Full-Request  = Request-Line
                *( General-Header
                   | Request-Header
                   | Entity-Header )
                CRLF
                [ Entity-Body ]
Full-Response = Status-Line
                *( General-Header
                   | Request-Header
                   | Entity-Header )
                CRLF
                [ Entity-Body ]
HTTP-header   = field-name ":" [ field-value ] CRLF
Entity-Body   = *OCTET

Doc 13, HTML and HTTP Slide # 11

Client Request

 

Request        = Simple-Request | Full-Request
Simple-Request = "GET" SP Request-URI CRLF

 

Simple-Request Example

telnet: www.eli.sdsu.edu 80

Server: accepts connection

telnet: GET /index.html<CRLF>

Server:

<!DOCTYPE HTML SYSTEM "html.dtd">

<HTML><HEAD><TITLE>

Roger Whitney

</TITLE></HEAD>

<BODY>

<CENTER><H2>

Roger Whitney<br>

Computer Science (etc...)

 


Doc 13, HTML and HTTP Slide # 12

Full-Request

Full-Request = Request-Line
               *( General-Header | Request-Header | Entity-Header )
               CRLF
               [ Entity-Body ]
Request-Line = Method SP URI SP HTTP-Version CRLF

Example

telnet: www.eli.sdsu.edu 80
Server: accepts connection
telnet: GET /index.html HTTP/1.0<CRLF>
telnet: <CRLF>
Server:
HTTP/1.0 200 Ok
Server: Netscape-Commerce/1.12
Date: Tuesday, 04-Mar-97 07:58:45 GMT
Last-modified: Thursday, 27-Feb-97 00:19:07 GMT
Content-length: 3949
Content-type: text/html
 
<!DOCTYPE HTML SYSTEM "html.dtd">
<HTML><HEAD><TITLE>
Roger Whitney

Doc 13, HTML and HTTP Slide # 13

Server Response

Response      = Simple-Response | Full-Response
                Simple-Response = [Entity-Body]
Full-Response = Status-Line
                *( General-Header | Response-Header | Entity-Header )
                CRLF
                [ Entity-Body ]

 

Simple response is sent only in response to simple request

Sample Full-Response:

HTTP/1.0 200 Ok

Server: Netscape-Commerce/1.12

Date: Tuesday, 04-Mar-97 07:58:45 GMT

Last-modified: Thursday, 27-Feb-97 00:19:07 GMT

Content-length: 3949

Content-type: text/html

 


Doc 13, HTML and HTTP Slide # 14

Status-Line

Status-Line   = HTTP-Version SP Status-Code
                SP Reason-Phrase CRLF
Status-Code   = 3DIGIT
Reason-Phrase = token *( SP token )

 

Example

HTTP/1.0 200 Ok

 

Status codes

1xx: Not used, but reserved for future use

2xx: Success The requested action was succesfully received and understood

3xx: Redirection Further action must be taken in order to complete the request

4xx: Client Error The request contains bad syntax or is inherently impossible to fulfill

5xx: Server Error The server could not fulfill the request


Doc 13, HTML and HTTP Slide # 15

Header Fields

 

HTTP-header   = Field-name ":" [Field-value ] CRLF
Field-name    = 1*<any CHAR, excluding CTLs, SP,
                and ":">
Field-value   = *( Field-content | comment | LWS )
Field-content = <the OCTETs making up the field-value
                and sinsisting of either *text or
                combinations of token, tspecials, and 
                quoted-string>

 

Sample Full-Response:

HTTP/1.0 200 Ok

Server: Netscape-Commerce/1.12

Date: Tuesday, 04-Mar-97 07:58:45 GMT

Last-modified: Thursday, 27-Feb-97 00:19:07 GMT

Content-length: 3949

Content-type: text/html

 

 


Doc 13, HTML and HTTP Slide # 16

Request Methods

 

Method = "GET" | "HEAD" | "PUT" | "POST" 
         | "DELETE" | "LINK" | "UNLINK" 
         | extension-method

 

GET and HEAD must be supported by all HTTP/1.0 servers

Servers should return Status-Code

"501 Not Implemented"

if the method is unknown.

 

GET

Retrieves whatever item is identified by the URI

The URI can refer to a data-producing process, or a script

The produced data shall be returned as the Entity-Body

 

HEAD

Identical to GET except that the server must not return any Entity-Body in the response.


Doc 13, HTML and HTTP Slide # 17

Request Methods (continued)

 

POST

Request that the origin server accept the item enclosed in the request as a new subordinate of the resource identified by the URI.

Allows a uniform function to:

 


Doc 13, HTML and HTTP Slide # 18

Request Methods Not Always Supported

Why?

 

PUT

The enclosed item in the request is to be stored under the supplied URI

 

DELETE

Request that the server delete the resource identified by the given URI.

 

LINK

Establishes one or more Link relationships between the existing resource identified by the URI and other existing resources

 

UNLINK

Removes one or more Link relationships from the existing resource identified by the URI


Doc 13, HTML and HTTP Slide # 19

General Message Header Fields

 

General-Header = Connection
                 | Data
                 | Forwarded
                 | Mandatory
                 | Message-ID
                 | MIME-Version
Connection       = "Connection" ":" 1#connect-option
connect-option   = token [ "=" word ]

 


Doc 13, HTML and HTTP Slide # 20

Request-Header

 

Request        = Simple-Request | Full-Request
Full-Request   = Request_Line
                 *( General-Header | Request-Header | Entity-Header)
                 CRLF
                 [ Entity-Body ]
Request-Header = User-Agent
               | If-Modified-Since
               | Pragma
               | Authorization
               | Proxy-Authorization
               | Referer
               | From
               | Accept
               | Accept-Encoding
               | Accept-Language

 


Doc 13, HTML and HTTP Slide # 21

Response-Header

 

Full-Response   = Status-Line
                  *( General-Header | Response-Header | Entity-Header )
                  CRLF
                  [ Entity-Body ]

Response-Header = Server
                | WWW-Authenticate
                | Proxy-Authenticate
                | Retry-After

 


Doc 13, HTML and HTTP Slide # 22

Entity Header Fields

 

Unknown header fields should be considered Entity-Header fields.

 

Entity-Header    = Allow
                 | Content-Length
                 | Content-Type
                 | Content-Encoding
                 | Content-Transfer-Encoding
                 | Content-Language
                 | Expires
                 | Last-Modified
                 | URI-header
                 | Location
                 | Version
                 | Derived-From
                 | Title
                 | Link
                 | extension-header
extension-header = HTTP-header

 

----------