Although a Web developer does not need to know how Web pages get to the users' browsers it can help them to understand why some pages are slow or don't work.
What happens
The concept is simple - the browser asks the server for a Web page and the server provides that page. In practice it is a lot more complicated:
- The user types in an address to the browser or clicks on a link
- The browser sends a request to the Domain Name Server to find the IP address of the named Web site's server
- The DNS looks in it's list of hosts on the Internet for the domain and server the user typed in and sends the matching IP address to the browser (if it doesn't know the address it asks a more important DNS server and so on)
- the browser sends a request using that IP address for the specific page or other item
- The server sends the contents of the file (normally html) to the browser
- The browser starts reading the page, interpreting (parsing) it and converting it to a displayed page (known as rendering)
- If the browser finds any sort of link to another item (e.g. an image SRC attribute or a stylesheet) it sends another request to the server for that item by going back to step 3
The requests sent to the server are known as Request Headers. A typical header looks like this (remember this goes from the browser to the server):
GET http://www.yourwebskills.com/internetnaming.html HTTP/1.1
Host: www.yourwebskills.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 ( .NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Proxy-Connection: keep-alive
Cookie: __utma=186523937.615597064.1265363981.1276585149.1276588863.271; __utmz=186523937.1272545611.151.6.utmcsr=virtual.fred.ac.uk|utmccn=(referral)|utmcmd=referral|utmcct=/course/view.php; phpbb3_ahuzp_u=1; phpbb3_ahuzp_k=; phpbb3_ahuzp_sid=1aa6019cae133f8f5bcf9e8288c0e1b3; style_cookie=null; __utmb=186523937.2.10.1276588863; __utmc=186523937
If-Modified-Since: Mon, 01 Mar 2010 17:26:26 GMT
If-None-Match: "6a38246-1879-4b8bf8c2"
Cache-Control: max-age=0, max-age=0, max-age=0
Proxy-Authorization: NTLM ------
The first line is the important one. It tells the server which page is wanted using http (the protocol used for Web pages).
The second line just splits the server name from the rest of the URL.
The third line tells the server what browser is sending the message (the browser can lie if you tell it to).
The next few set what sort of file will be accepted, the desired languages, proxy settings and the date the page was last accessed by this browser etc.. The server may choose to ignore these.
The Cookie line tells the server the contents of any cookie for the chosen server. A cookie is a tiny text file used by the browser to remember things which the server asked it to remember (maybe a user name or user preferences but also often tracking information).
All of this information is available to PHP scripts using the $_SERVER superglobal.
How this affects downloading a page
A page with lots of links (not hyperlinks) to other files (images, JavaScript or CSS) will be much slower to download. Each item mentioned in the Web page will trigger a new request header being sent to the server. Then the browser will need to wait for the file to be sent. Images, external stylesheets and external JavaScript can mean tens or even hundreds of sets of request headers being sent.
Limiting the number of headers sent is one way to speed up your pages.
Responses
When a server receives a request header it will normally send the requested file but it will also send a response. A typical response looks like this:
HTTP/1.1 304 Not Modified
Via: 1.1 PROXY3
Connection: Keep-Alive
Proxy-Connection: Keep-Alive
Date: Tue, 15 Jun 2010 08:38:18 GMT
Etag: "6a38246-1879-4b8bf8c2"
Server: Apache
This confirms some stuff and also lets the browser know the Web server software being used. The important bit is the first line which tells the browser that the file is there but has not changed since the date of the copy kept in cache by the browser. Therefore the file is not being sent with this response.
Other common codes include 200 which means that the requested item exists and is being sent with the response.
More commonly seen by users are the "400" response codes. For example, 404 which means the requested item was not found. Maybe the item was spelled wrongly in the request or has been deleted from the server.
Two other useful codes are 301 and 302. These are used when the requested item has moved or been renamed. 301 tells the browser (or search engine) that the item has "moved" permanently. 302 means a temporary change. Both might be renaming of the file or a move to a new server. It allows you to move a page or site without losing users who onbly know the old location.
Firebug and similar tools
There are a number of tools which allow you to see what headers were sent and received for any Web page. One is Firebug for Firefox which will display the complete headers or a summary:

Opera has a similar tool built in (Dragonfly).
Even without looking at the actual headers you can see that the page requested had not been changed on the server since a copy was cached by the browser. That saves bandwidth as the browser uses the cached copy rather than downloading the page again. All of the related resources are also cached so there are no downloads, just headers going to and fro. You can see that there was the html page, a css stylesheet, one image and one external javascript file. You can also see that Google's analytics server is a lot faster than the one hosting yourwebskills.com!SVG presentation showing data flow to and from browser and server (modern browsers only).




