C.1 Creating the web
Evolution of the web, different protocols and web technologies. Difference between static and dynamic web pages. External data sources. Role of the browser.
C.1.1 Distinguish between the Internet and World Wide Web
- interconnected set of networks and computers
- Permits transfer of data(e.g.: WWW, email, P2P, VOIP, FTP)
- Permits delivery of services
- Data transfer governed by protocols(TCP/IP)
World Wide Web
- A set of hypertext-linked resources
- Resources are identified by URIs(Unified Resource Identifier)
- Transferred between client and server via the Internet
- Resources can be read by use of a browser
C.1.2 Describe how the web is constantly evolving
Figure 1: The Web Expansion (TrendOne, 2008)
The beginnings of the web (Web 1.0 , Web of content)
The world wide web started around 1990/91 as a system of servers connected over the internet that deliver static documents, which are formatted as hypertext markup language (HTML) files, which support links to other documents, but also multimedia as graphics, video or audio. In the beginnings of the web, these documents consisted mainly of static information and text, where multimedia were added later. Some experts describe this as a “read-only web”, because users mostly searched and read information, while there was little user interaction or content contribution.
Web 2.0 – “Web of the Users”
Web 3.0 – “Semantic Web”
The internet and thus the world wide web is constantly developing and evolving into new directions and while the changes described for the Web 2.0 are clear to us today, the definition for the Web 3.0 is not definitive yet. Continuing the read to read-write description form earlier, it might be argued that the Web 3.0 would be the “read-write-execute” web. One interpretation of this, is that the web enables software agents to work with documents by using semantic markup. This allows for smarter searches and the presentation of relevant data fitting into context. This is why Web 3.0 is sometimes called the semantic executive web.
But what does this mean?
It’s about user input becoming more meaningful, more semantic, by users giving tags or other kinds of data to their document, that allow software agents to work with the input, e.g. to make it more searchable. The idea is to be able to better connect information that is semantically connected.
However, it might also be argued that the Web 3.0 is what some people call the Internet of Things, which is basically connecting every day devices to the internet to make them smarter. In some way, this also fits the read-write-execute model, as it allows the user to control a real life action on a device over the internet. Either way, the web keeps evolving and the following image provides a good overview and an idea where the web is heading to.
C.1.3-4 Identify the characteristics of the following:
HTTP – Hypertext Transfer Protocol
- Application layer protocol from the Internet Protocol suite to transfer and exchange hypermedia
- request-response protocol based on client-server model
- user agent (e.g. web browser) requests some resource from a server through an URL, and the web server gives and response
- different HTTP request methods, e.g. for retrieving or submitting data (GET and POST)
HTTPS – Hypertext Transfer Protocol Secure
- Based on HTTP
- Adds an additional security layer of SSL or TLS
- ensures authentication of website by using digital certificates
- ensures integrity and confidentiality through encryption of communication
- still possible to track IP address and port number of web server (which is why HTTPS websites are also blocked in China)
HTML – Hypertext Mark-up Language
- semantic markup language
- standard language for web documents
- uses elements enclosed by tags to markup a document
URL – Uniform Resource Locator
- unique string that identifies a web resource
- reference to a web resource
- primarily used for HTTP, but also for other protocols like FTP or email (mailto)
- includes name AND access method (e.g. ‘http://’)
- serves as a mechanism to retrieve a resource
- follows a specific syntax
XML – Extensible Mark-up Language
- markup language with a set of rules defining how to encode a document
- similar to HTML in using tags
- used for representation of arbitrary data structures
XLST – Extensible stylesheet language
- styling language for XML used for data presentation and transformation
- data presentation means displaying data in some format/medium, about style
- data transformation is about parsing a source tree of nodes out and transform it into something different
- XLST can be used to transform XML files into other XML files, HTML, PDF, PNG and others
- interpreted programming language
- core technology of most websites with HTML and CSS
- high-level, dynamic and untyped; therefore relatively easy for beginners
- allows to dynamically manipulate the content of HTML documents
- makes websites dynamic
CSS – Cascading style sheet
- style sheet language to describe the presentation of a mark-up document, usually HTML
- used to create better designed websites
- intended to separate content in presentation in HTML and CSS
- it uses selectors to describe particular elements of a document, and gives these properties that define things ranging from font color to page position
URI – Uniform Resource Identifier
- more general definition than URL
- a string serving as an identifier for some resource(document, image, mailbox, video, files, etc.)
Figure 2: The Difference Between URLs and URIs (Daniel Miessler)
C.1.5 Describe the purpose of a URL
An URL defines a pathway to a specific resource. It allows to link different resources, which creates the basis for navigating the WWW (it is the links that make it a web).
C.1.6 Describe how a domain name server functions
A DNS server makes sure that a website can be accessed under a human-readable name like
www.cs-ib.net. In order to access a website, such a name first needs to be translated into a machine-readable IP address like 18.104.22.168. The DNS server does exactly this. It stores the respective IP address corresponding to a domain name in its databases and returns it on request.
Further Reading: How Stuff Works - DNS Server
C.1.7 Identify the characteristics of:
IP - (Internet Protocol):
- Part of the Internet layer TCP/IP Internet protocol suite
- Responsible for delivering packets to the correct address/computer
- Defines the format of a packet
- Adds a header to each TCP segment which includes routing information
- IPv4 uses a 32-bit address
- IPv6 will use 128-bit address
TCP – (Transmission Control Protocol):
- Part of the trabsport layer TCP/IP Internet protocol suite
- Receives data from an applicaiton and divides it up (into segments) ready for IP
- Establishes initial connection
- Provides error-checking which can lead to the re-sending of packets
FTP - (File Transfer Protocol):
- Protocol for transferring files (between server and client) over a TCP based network
- Files can also be manipulated/modified remotely
- Control information(e.g.: log-ins) are sent separately from the main file data(out-of-band control) which differs FTP from HTTP
C.1.8 Outline the different components of a web page
A web page can contain a variety of components. The basics structure of a HTML document is:
This is not visible on the page itself, but contains important information about it in form of metadata.
The title goes inside the head and is usually displayed in the window top of the web browser.
There are various types of meta tags, which can give search engines information about the page, but are also used for other purposes, such as to specify the charset used.
The main part of the page document. This is where all the (visible) content goes in.
Some other typical components:
Usually a collection of links that helps to navigate the website.
A hyperlink is a reference to another web page.
Table Of Contents
Might be contained in a sidebar and is used for navigation and orientation within the website.
Area at the top of a web page linking to other big topic areas.
Usually used for a table of contents or navigation bar.
Most of the time used for logo with a link to the main page.
Area of the web page preventing the sidebar to extend to the bottom of the web page.
C.1.9 Explain the importance of protocols and standards on the web
- A set of rules for successfully carrying out some process
- Investigate TCP/IP/FTP
- Set of technical specifications that should be adhered to, to allow for functionality
- Semantic web standards involve the use of ontologies
What is ISO?
ISO is the International Organization of Standardization, an independent, non-governmental organization that develops and publishes international standards. It has been founded in 1946 and since then has published over 21000 international standards regarding aspects of technology and manufacturing.
Who constitutes ISO?
The members are from 163 countries including 3 368 technical bodies that help standards to be developed. In addition, the organization has over 135 people working fulltime at the central in Geneva. Experts of the same field work together to develop standards and these are settled on through a consensus process.
Why is there a need for ISO?
These standards ensure safety, reliability and quality for products and services, while also providing a common denominator for different processes to communicate, e.g. for technologies.
C.1.10 Describe the different types of web page
- Site about individuals/small groups
- Includes information the creator wants to
- Mainly displays content, therefore usually static
- Usually does not require any back-end server-side programming (static)
- May use some front end technologies
- Often not self-hosted and not coded, but created using a WYSIWYG website creator like wix, Dream Weaver
- Site about certain topics made by an individual
- The “blogger” publishes smaller contents most commonly shown the most recent updated posts
- Is a sort of online diary
For a static blog:
- Optionally, static content manager, e.g. Jekyll
For a dynamic blog:
- Database, e.g. MySQL, MongoDB
- Backend programming language, e.g. PHP, to handle incoming posts, etc.
- Content Management System (CMS), e.g. WordPress
Search Engine Pages
- Indexes content from the internet or an intranet
- Serves related links based on a users query
- Use a web crawler to maintain real-time information –> looks at meta tags of websites
- A meeting or medium where ideas and views on a particular issue can be exchanged
- Database for storing posts
- Some language, e.g. PHP, Node.js, Ruby on Rails, Django
- CMS (content management system), e.g. vBulletin
- A website or database developed by many different users
- Any user can add and edit content
- Dynamic (accessing Database)
- content management system (CMS), e.g. mediaWiki
- A website that offers current news in form of articles or videos
- Includes updated news feed
- usually dynamic
- Content management system (CMS)
- Usually some database
- back end programming language
- Site that allows free photo sharing by online advertising
- Examples: Flickr.com, Picasa
- Software that enables digital photo slide shows and then upload these to the web
Learning Management System
- Software application for delivery, administration, tracking, documentation, reporting of e-learning courses.
- Back-end programming + database to store students progress, course contents, etc.
C.1.11 Explain the differences between a static web page and a dynamic web page
- Lower cost to implement
- Low scalability
- Hard to update
- Higher cost in the long term if content is to be updated
Sites that include server-side programming as well, usually to retrieve content dynamically from a database. This allows for data processing on the server and allows for much more complex applications.
- information can be retrieved in an organized way from databases
- allows for content management systems
- low ongoing cost, unless design changes or extra features are implemented
- sites are usually based on templates, which information is fed into
- less individual sites
- higher initial cost
- usually larger codebase
C.1.12 Explain the functions of a browser
- Plug-ins allows multimedia to be displayed
- Retrieves information from the Internet via hyperlinks
C.1.13 Evaluate the use of client-side scripting and server-side scripting in web pages
Website logic that runs on the server. Common tasks include the processing of search queries, data retrieval from a database and various data manipulation tasks. A good example are online-shops, where items are displayed based on a search query. Once the user decides to buy an item, server-side scripts check user credentials and make sure that the shop receives the order.
- Direct execution(e.g. ASP, PHP)
This is scripting that happens in the browser of the client. It is used for animations, form validation and also to retrieve new data without reloading the page, e.g. in a live-chat.
C.1.14 Describe how web pages can be connected to underlying data sources
Cookies: Cookies are small files stored on a user’s computer. They hold data specific to a website or client and can be accessed by either the web server or the client computer. Cookies contain data values such as first-name and last-name. Once the server or client computer have read the cookie through their respective codes, the data in the cookie can be retrieved and used for a website page. Cookies are created usually when a new web page is loaded. Disabling cookies on your computer will abort the writing operation that creates cookies. However, some sites require cookies in order to function. Cookies are used to transport information from one session on a website to another. They eliminate the use of server machines with huge amounts of data storage, since cookies are more efficient and smaller.
Databases: A database is an organized collection of data, which allows to retrieve specific data easily based on queries. Data are usually organized in a way that allows the application to find data easily. There is different logical models of how to organize data in a database, e.g. relational models, object models, navigational models and more. A database is accessed (in order to retrieve data, update them, administration) through a database management system (DBMS), such as for example MySQL, PostgreSQL, MongoDB, etc. . These systems usually differ in the database model that they use.
XML: XML is a flexible way to structure data and can therefore be used to store data in files or to transport data. It allows data to be easily manipulates, exported, or imported. This way, websites can also be designed independent from the data content. An example use of XML are RSS feeds, where it’s used to store data about a feed.
CGI(Common Gateway Interface): See C.1.15
C.1.15 Describe the function of the common gateway interface (CGI)
This is a standard protocol for web servers to execute console programs (applications that run from the command line) in order to generate dynamic websites. It implements an interface for the web server (as in the software) to pass on user information, e.g. a query, to the application, which can then process it. This passing of information between the web server and the console application is called the CGI. Thanks to CGI, a variety of programming languages such as Perl, Java, C or C++ can be used, which allow for fast server-side scripting.
C.1.16 Evaluate the structure of different types of web pages
See evaluation from the previous sections, specially C.1.8, C.1.10 and C.1.11 .