Web Servers
Start course
1h 58m

In this course, we will learn the concepts of Java EE 7 with a focus on Java Basics.

Learning Objectives

  • What Java classes, Executable Java, Access Modifiers are and how they work

Intended Audience

  • Anyone looking to get Oracle Java Certification
  • Those who want to improve Java 7 EE knowledge
  • Java developers


  • Have at least 2 years of Java development experience 

In this lesson, we will talk about web servers. So, let's start. Web servers dish out billions of web pages a day. They tell you the weather, load up your online shopping carts, and let you find long lost high school buddies. Web servers are the workhorses of the World Wide Web. A web server processes HTTP requests and serves responses. The term web server can refer either to web server software or to the particular device or computer dedicated to serving the web pages. Web servers come in all flavor shapes and sizes. There are trivial 10 line Perl Script web servers, 50 MB secure commerce engines and tiny servers on a card. But whatever the functional differences, all web servers receive HTTP requests for resources and serve content back to the clients. Web servers implement HTTP in the related TCP connection handling. They also manage the resources served by the web server and provide administrative features to configure, control, and enhance the web server. The web server logic implements the HTTP protocol, manages web resources and provides web server administrative capabilities. The web server logic shares responsibilities for managing TCP connections with the operating system. The underlying operating system manages the hardware details of the underlying computer system and provides TCP/IP network support, file systems to hold web resources and process management to control current computing activities. Web servers are available in many forms. You can install and run general purpose software web servers on standard computer systems. If you don't want the hassle of installing software, you can purchase a web server appliance in which the software comes pre-installed and pre-configured on a computer, often in a snazzy looking chassis. Given the miracles of microprocessors, some companies even offer embedded web servers implemented in a small number of computer chips, making them perfect administration consoles for consumer devices. Let's look at each of those types of implementations. General-purpose software web servers run on standard network-enabled computer systems. You can choose open source software such as Apache or W3C's Jigsaw or commercial software such as Microsoft's and iPlanet's web servers. Web server software is available for just about every computer and operating system. While there are tens of thousands of different kinds of web server programs including custom crafted special purpose web servers, most web server software comes from a small number of organizations. In February 2002, the Net Craft survey showed three vendors dominating the public Internet web server market. The free Apache software powers nearly 60% of all Internet web servers. Microsoft web server makes up another 30%. Sun iPlanet's servers comprise another 3%. Take these numbers with a few grains of salt however, as the Net Craft survey is commonly believed to exaggerate the dominance of Apache software. First, the survey count servers independent of server popularity. Proxy server access studies from large ISPs suggests that the amount of pages served from Apache servers is much less than 60% but still exceeds Microsoft and Sun iPlanet. Additionally, it is anecdotally believed that Microsoft and iPlanet servers are more popular than Apache inside corporate enterprises. Web server appliances are prepackaged software or hardware solutions. The vendor pre-installs the software server onto a vendor chosen computer platform and pre-configures the software. Some examples of web server appliances include Sun/Cobalt RaQ web appliances, Toshiba Magnia SG10, IBM Whistle web server appliance. Appliance solutions remove the need to install and configure software and often greatly simplify administration. However, the web server often is less flexible and feature-rich, and the server hardware is not easily repurposable or upgradable. Embedded servers are tiny web servers intended to be embedded into consumer products, for example, printers or home appliances. Embedded web servers allow users to administer their consumer devices using a convenient web browser interface. Some embedded web servers can even be implemented in less than one square inch but they usually offer a minimal feature set. Two examples of very small embedded web servers are; IPic match head-sized web server, Netmedia SitePlayer SP1 ethernet web server. If you want to build a full featured HTTP server, you have some work to do. The core of the Apache web server has over 50,000 lines of code. An optional processing modules make that number much bigger. All this software is needed to support HTTP/1.1 features, rich resource support, virtual hosting, access control, logging, configuration, monitoring, and performance features. That said, you can create a minimally functional HTTP server in under 30 lines of Perl. Let's take a look. Here, this example shows a tiny Perl program called Type-O-Serve. This program is a useful diagnostic tool for testing interactions with clients and proxies. Like any web server, Type-O-Serve waits for an HTTP connection. As soon as Type-O-Serve gets the request message, it prints the message on the screen. Then it waits for you to type or paste in a response message which is sent back to the client. This way, Type-O-Serve pretends to be a web server, records the exact HTTP request messages and allows you to send back any HTTP response message. This simple Type-O-Serve utility doesn't implement most HTTP functionality, but it's a useful tool to generate server response messages the same way you can use telnet to generate client request messages. Here, this figure shows how the administrator of Joe's hardware might use Type-O-Serve to test HTTP communication. First, the administrator starts the Type-O-Serve diagnostic server listening on a particular port. Because Joe's hardware store already has a production web server listing on port 80, the administrator starts the Type-O-Serve server on port 8080. You can pick any unused ports with this command line. Once Type-O-Serve is running, you can point a browser to this web server. The Type-O-Serve program receives the HTTP request message from the browser and prints the contents of the HTTP request message on screen. The Type-O-Serve diagnostic tool then waits for the user to type in a simple response message followed by a period on a blank line. Type-O-Serve sends the HTTP response message back to the browser and the browser displays the body of the response message. The Perl server we showed in this example is a trivial example web server. State of the art commercial web servers are much more complicated but they do perform several common tasks as shown in this figure. Set up connection. Accept a client connection or close if the client is unwanted. Receive request. Read an HTTP request message from the network. Process request. Interpret the request message and take action. Access resource. Access the resource specified in the message. Construct response. Create the HTTP response message with the right headers, then responds then to the response back to the client. Log transaction. Place notes about the completed transaction in a log file. If a client already has a persistent connection open to the server, it can use that connection to send its request. Otherwise, the client needs to open a new connection to the server. When a client requests a TCP connection to the web server, the web server establishes the connection and determines which client is on the other side of the connection, extracting the IP address from the TCP connection. Once a new connection is established and accepted, the server adds the new connection to its list of existing web server connections and prepares to watch for data on the connection. The web server is free to reject and immediately close any connection. Some web servers close connections because the client IP address or host name is unauthorized or is a known malicious client. Other identification techniques can also be used. Most web servers can be configured to convert client IP addresses into client host names using reverse DNS. Web servers can use the client host name for detailed access control and logging. Be warned that host name lookups can take a very long time slowing down web transactions. Many high capacity web servers either dismantle hostname resolution or enable it only for particular content. You can enable hostname lookups in Apache with the hostname lookups configuration directive. For example, the Apache configuration directives in this example on hostname resolution for only HTML and CGI resources. Some web servers also support the IETF ident protocol. The ident protocol lets servers find out what username initiated an HTTP connection. This information is particularly useful for web server logging. The second field of the popular common log format contains the ident username of each HTTP request. If a client supports the ident protocol, the client listens on TCP port 113 for ident requests. This figure shows how the ident protocol works. In figure, the client opens an HTTP connection. The server then opens its own connection back to the client's ident server port 113, sends a simple request asking for the username corresponding to the new connection, and retrieves from the client the response containing the username. Ident can work inside organizations, but it does not work well across the public Internet for many reasons including many client PCs don't run the idented identification protocol Deamon software. The ident protocol significantly delays HTTP transactions.

Many firewalls won't permit incoming ident traffic. The ident protocol is insecure and easy to fabricate. The ident protocol doesn't support virtual IP address as well. There are privacy concerns about exposing client usernames. You can tell Apache web servers to use ident lookups with Apaches identity check on directive. If no ident information is available, Apache will fill ident log fields with hyphens. Common log format log files typically contain hyphens in the second field because no ident information is available. As the data arrives on connections, the web server reads out the data from the network connection and parses out the pieces of the request message. When parsing the request message, the web server parses the request line looking for the request method, the specified resource identifier, URI and the version number, each separated by a single space and ending with a Carriage Return Line Feed, CRLF sequence reads the message headers, each ending in CRLF, detects the end of header's blank line ending in CRLF if present. Reads the request body if any, length specified by the content length header. When parsing request messages, web servers receive input data erratically from the network. The network connection can stall at any point. The web server needs to read data from the network and temporarily store the partial message data in memory until it receives enough data to parse it and make sense of it.

Some web servers also store the request messages in internal data structures that make the message easy to manipulate. For example, the data structure might contain pointers and lengths of each piece of the request message and the headers might be stored in a fast look up table. So, the specified values of particular headers can be accessed quickly. High performance web servers support thousands of simultaneous connections. These connections let the web server communicate with clients around the world. Each one with one or more connections open to the server. Some of these connections may be sending requests rapidly to the web server while other connections trickle requests slowly or infrequently and still others are idle waiting quietly for some future activity. Web servers constantly watch for new web requests because requests can arrive at any time. Different web server architecture service requests in different ways as this figure illustrates. Single threaded web servers process one request at a time until completion. When the transaction is complete, the next connection is processed. The architecture is simple to implement, but during processing all other connections are ignored. This creates serious performance problems and is appropriate only for low load servers and diagnostic tools like typeserve. Multi process and multi-threaded web servers dedicate multiple processes or higher efficiency threads to process requests simultaneously.

The threads and processors may be created on demand or in advance. Some servers dedicate a thread or a process for every connection. But when a server processes hundreds, thousands or even tens of thousands of simultaneous connections, the resulting number of processes or threads may consume too much memory or system resources. Thus many multi-threaded web servers put a limit on the maximum number of threads and processes. To support large numbers of connections, many web servers adopt multi-plex architectures. In a multi-plex architecture, all the connections are simultaneously watched for activity. When a connection changes state, a small amount of processing is performed on the connection. When that processing is complete, the connection is returned to the open connection list for the next change in state. Work is done on a connection only when there is something to be done. Threads and processes are not tied up waiting on idle connections. Some systems combine multi threading and multiplexing to take advantage of multiple CPUs in the computer platform. Multiple threads each watch the open and perform a small amount of work on each connection. Once the web server has received a request, it can process the request using the method, resource, headers, and optional body. Some methods example post, require entity body data in the request message. Other methods e.g. options allow a request body but don't require one. A few methods e.g. GET forbid entity body data in request messages.

Web servers are resource servers. They deliver pre-created content such as html pages or jpeg images as well as dynamic content from resource generating applications running on the servers. Before the web server can deliver content to the client, it needs to identify the source of the content by mapping the URI from the request message to the proper content or content generator on the web server. Web servers support different kinds of resource mapping, but the simplest form of resource mapping uses the request URI to name a file in the web servers file system. Typically, a special folder in the web server file system is reserved for web content. This folder is called the document root or docroot. The web server takes the URI from the request message and appends it to the document root. In this figure, a request arrives for /specials/saw-blade.gif. The web server in this example has document root. The web server returns the file. To set the document root for an Apache web server add a document root line to the httpd.conf configuration file. Servers are careful to not let relative URLs back up out of a docroot and expose other parts of the file system. For example, most mature web servers will not permit this URI to see files above the Joe's hardware document root. Virtually hosted web servers host multiple websites on the same web server giving each site its own distinct document root on the server.

A virtually hosted web server identifies the correct document root to use from the IP address or hostname in the URI or the host header. This way, two websites hosted on the same web server can have completely distinct content even if the request URIs are identical. In this figure, the server hosts two sites, www.Joes/hardware.com and www.marys-antiques.com. The server can distinguish the websites using the HHTP host header or from distinct IP addresses. When request A arrives, the server fetches the file for. When request B arrives the searcher fetches the file for. Configuring virtually hosted docroots is simple for most web servers. For the popular Apache web server, you need to configure a virtual host block for each virtual website, and include the document root for each virtual server. Another common use of dock roots gives people private websites on a web server. A typical convention maps URIs whose paths begin with a / and ~ followed by the username to a private document root for that user. The private dock root is often the folder called public_html inside that user's home directory, but it can be configured differently. A web server can receive requests for directory URLs where the path resolves to a directory, not a file. Most web servers can be configured to take a few different actions when a client requests the directory URL. Return an error, return a special default index file instead of the directory. Scan the directory and return an HTML page containing the contents. Most web servers look for a file named index.html or index.htm inside a directory to represent that directory. If a user requests the URL for a directory and the directory contains a file named index.html or index.htm, the server will return the contents of that file. In the Apache web server, you can configure the set of file names that will be interpreted as default directory files using the DirectoryIndex configuration directive. The DirectoryIndex directive lists all the file names that serve as DirectoryIndex files in preferred order. This configuration line causes Apache to search a directory for any of the listed files in response to a directory URL request. If no default index file is present when a user requests the directory URI and if directory indexes are not disabled, many web servers automatically return an HTML file listing the files in that directory and the sizes and modification dates of each file, including URI links to each file. This file listing can be convenient, but it also allows nosy people to find files on a web server that they might not normally find. Web servers can also map URIs to dynamic resources. That is, two programs that generate content on demand. In fact, a whole class of web servers called application servers connect web servers to sophisticated backend applications. The web server needs to be able to tell when a resource is a dynamic resource where the dynamic content generator program is located and how to run the program. Most web servers provide basic mechanisms to identify and map dynamic resources. Apache lets you map URI path name components into executable program directories. When a server receives a request for a URI with an executable path component, it attempts to execute a program in a corresponding server directory. Apache also lets you mark executable files with a special file extension. This way executable scripts can be placed in any directory. CGI is an early, simple, and popular interface for executing server side applications. Modern application servers have more powerful and efficient server side dynamic content support, including Microsoft's active server pages and Java servlets. Many web servers also provide support for server-side includes, if a resource is flagged as containing server-side includes, the server processes the resource contents before sending them to the client. The contents are scanned for certain special which can be variable names or embedded scripts. The special patterns are replaced with the values of variables or the output of executable scripts. This is an easy way to create dynamic content. Web servers also can assign access controls to particular resources. When a request arrives for an access controlled resource, the web server can control access based on the IP address of the client or it can issue a password challenge to get access to the resource. Once the web server has identified the resource, it performs the action described in the request method and returns the response message. The response message contains a response status code, response headers and a response body if one was generated. If the transaction generated a response body, the content is sent back with the response message. If there was a body, the response message usually contains a content-type header describing the MIME type of response body. A content-length header describing the size of the response body, the actual message body content. The web server is responsible for determining the MIME type of response body. There are many ways to configure servers to associate MIME types with resources. The web server can use the extension of the file name to indicate MIME type. The web server scans a file containing MIME types for each extension to compute the MIME type for each resource. This extension-based type association is the most common. The Apache web server can scan the contents of each resource and pattern match the content against a table of known patterns to determine the MIME type for each file. This can be slow but it is convenient, especially if the files are named without standard extensions. Web servers can be configured to force particular files or directory contents to have a MIME type regardless of the file extension or contents. Some web servers can be configured to store resource in multiple document formats. In this case, the web server can be configured to determine the best format to use by a negotiation process with the user. Web servers also can be configured to associate particular files with MIME types. Web servers sometimes return redirection responses instead of success messages. A web server can redirect the browser to go elsewhere to perform the request. A redirection response is indicated by a 3XX return code. The location response header contains a URI for the new or preferred location of the content. Redirects are useful for permanently moved resources. A resource might have been moved to a new location or otherwise renamed, giving it a new URL. The web server can tell the client that the resource has been renamed and the client can update any bookmarks etc. before fetching the resource from its new location. The status code 301 moved permanently is used for this kind of redirect. Temporarily moved resources. If a resource is temporarily moved or renamed, the server may want to redirect the client to the new location. But because the renaming is temporary, the server wants the client to come back with the old URL in the future and not to update any bookmarks. The status codes 303, C other and 307 temporary redirect are used for this kind of redirect. URL augmentation. Servers often use redirects to rewrite URLs often to embed context. When the request arrives, the server generates a new URL containing embedded state information and redirects the user to this new URL. The client follows the redirect re-issuing the request but now including the full state augmented URL. This is a useful way of maintaining state across transactions. The status codes 303 C other and 307 temporary redirect are used for this kind of redirect. Load balancing. If an overloaded server gets a request, the server can redirect the client to a less heavily loaded server. The status codes 303 C other and 307 temporary redirect are used for this kind of redirect. Server affinity. Web servers may have local information for certain users. A server can redirect the client to a server that contains information about the client. The status codes 303 C other and 307 temporary redirect are used for this kind of redirect. Canonicalizing directory names. When a client requests the URI for a directory name without a trailing slash, most web servers redirect the client to a URI with a slash added till the relative links work correctly. Web servers face similar issues sending data across connections as they do receiving. The server may have connections to many clients, some idle, some sending data to the server and some carrying response data back to the clients. The server needs to keep track of the connection state and handle persistent connections with special care. For non-persistent connections, the server is expected to close its side of the connection when the entire message is sent. For persistent connections, the connection may stay open, in which case the server needs to be extra cautious to compute the content-length header correctly or the client will have no way of knowing when a response ends. Finally, when a transaction is complete, the web server notes an entry into a log file describing the transaction performed. Most web servers provides several configurable forms of logging. So, that's it. Hope to see you on our next lesson. Have a nice day.


About the Author
Learning Paths

OAK Academy is made up of tech experts who have been in the sector for years and years and are deeply rooted in the tech world. They specialize in critical areas like cybersecurity, coding, IT, game development, app monetization, and mobile development.

Covered Topics