Understanding Computer Networks
This article explains the basic building blocks of a local area network and how it connects to the Internet.
Let's start with a single computer. It is currently a lonely computer and, as such, it doesn't have access to the wealth of resources that are available elsewhere in the company and/or on the Internet.
If this computer needs to access other computers, a very simple network can be established by connecting a network cable between two computers.
In above example, the wire is the physical medium that we use to actually transmit electrical signals between the two networking computers. There are other types of physical media. For instance, light can be used to transmit data over fiber optic cables, which has advantage of very long transmission range. Radio waves can also be used to transmit data wirelessly, which gives freedom of movement.
In above example, the cables are connected to each machine's Network Interface Card (NIC). The NIC is the device that translates what each computer wants to say to physical world, i.e. currents, waves, photons, etc.
Interoperable, scalable and robust networks require several conceptual layers. As you may have guessed, the first layer is the "The Physical Layer". We will build on top of this layer as we grow our network. Next up on the stack is the "Link Layer"
Although these two computers are physically connected to each other, they need to agree on a language so that they can understand each other. In networking jargon, this language is called a Protocol. Because they use an agreed upon protocol, they already understand the crucial details of how to conduct a conversation, such as:
- The specifications of the physical world, e.g. voltage and current, or ranges of radio frequencies
- Addressing; so that computers can find and address each other on the network
- The details of media access, e.g. who talks first, how to avoid interrupting each other, how long someone can talk uninterrupted, etc.
- Basic error detection; so that errors in transmission can be detected (some protocols allow correction too)
- Packet format; this is basically an envelope for the data that is being transmitted
Let's grow out network to illustrate some of these important concepts. Here we are adding other network resources, such as a printer and wireless devices.
Switch / Hub
Now that we have multiple computers, printers and possibly other resources, we need mechanisms to connect them to each other. One-on-one wiring will not scale; for example, 10 network hosts would require 45 point-to-point network cables, necessitating 45 network interface cards on each host. 100 hosts would require over 4000 point-to-point connections! A Network Switch is a device that has multiple Ports, to which network resources are connected via network cables. It receives packets from its ports, and redirects them to the specific port, which the destination computer is connected to. An older, mostly obsolete device is called a Hub. Hubs also have ports like switches, but instead of sending packets only to ports, where the destination resource is connected, they send data to all ports. Although they are cheaper, they flood all the ports with unnecessary traffic, which results in low performing networks.
Addressing at the Link Layer
In order to find each other, network resources have an address. At the link layer, this address is called a Media Access Control (MAC) Address. This is how a switch knows which port to send the packets to. A MAC address is usually 6 bytes long. It is statically assigned by the manufacturer of the hardware (but it is possible to change it later, although this is seldom needed and, typically, is only for advanced purposes). Each MAC address is guaranteed to be globally unique; so, there will never be two computers on the network with the same MAC address, regardless of the make, model, type or age of the hardware.
A MAC address allows computers to specifically address another resource on the network. Sometimes, it is desirable to broadcast to the whole network, such as when a computer first boots up and announces important identification information to its neighbors.
Examples of Link Layer Protocols
A very popular link-layer protocol is "Ethernet", which typically sends data over Cat-5 or Cat-6 twisted pair wires. Another popular link-layer protocol is Wi-Fi, which sends data using radio waves over the air. Some other examples are Token Ring and ATM. Communication over a serial port (e.g. RS-232) is also considered a link layer protocol.
Let's keep growing our network by adding more resources, such as servers, network attached storage devices, databases etc.
It is possible to grow a network by using only network switches. However, in any realistic organization networks usually become very large. If the network Topology is flat with no hierarchy or conceptual sub-networks, it becomes very difficult to administer (e.g. setting different policies for different organizational units). Therefore, it makes more sense to divide networks logically, into sub-networks, called Subnets. This is achieved by using Routers instead of switches. Below is a diagram where we use a router instead of a switch.
The most common network layer protocol is Internet Protocol (IP).
Classical IPv4 has four octets (an 8-bit number, ranging between 0 and 255), separated by full-stops. An example IP address is "192.168.15.7".
IP allows routing across networks by defining a hierarchical address scheme. The hierarchy of the network is defined with sub-net masks. For instance, /24 mask tells us that the first 24-bits of the address (192.168.15 in above example), is our network, and last 8-bits (7 in above example) are host-identifiers.
Certain IP addresses have special meaning.
- 127.0.0.1 - This is the localhost, i.e. the host that the program runs on. It is used to address other applications on the same machine.
- Trailing zero bits - for example, for 192.168.0.0 denotes the network if the subnet mask is 16 bits (i.e. 192.168.0.0 / 16)
- Trailing one bits - for example, for 192.168.1.255 denotes the broadcast address for 192.168.0.0 / 24 network. Another example is 10.1.2.
- All one-bits, e.g. 255.255.255.255 - denotes the Broadcast address, i.e. all the machines in the entire world. Although it is addressed to everyone in the world, most routers will not forward such packets in order to avoid noise and spam.
Certain IP address ranges are reserved for local network connections. Packets addressed to these destinations will NOT leave your network, because no router on the Internet will forward them. Below are some examples of IPv4 private address ranges:
- 10.0.0.0 - 10.255.255.255 (16,777,216 addresses) - Most commonly used in big corporations
- 172.16.0.0 - 172.31.255.255 (1,048,576 addresses)
- 192.168.0.0 - 192.168.255.255 (65,536 addresses) - Most commonly used in home and small business networks
It is, of course, possible to further divide these addresses into smaller subnets to divide the network logically . For instance, we could reserve 192.168.1.1 - 192.168.1.255 for use by the finance department, and we could reserve 192.168.2.1 - 192.168.2.255 for the sales department.
The IPv4 address space is 32-bits, which means there are about 4-billion possible addresses. Due to an increasing number of devices, and because of inefficiencies of address assignments, there is a need for a larger address space. IPv6 has a 128 bit address space, which is large enough to assign several million IP address to every single atom in the world. IPv6 is still growing in popularity. As of 2017, it is fair to say that the Internet still runs on IPv4.
Protocols Supporting IP
There are a number of protocols at this layer that are designed to support IP.
Address Resolution Protocol (ARP)
When we know the IP of a network resource, we know in which direction it is. However, in order to actually deliver the packet, we need the help of lower layers. The layer below IP, as we've mentioned before, is the Link Layer. By using Address Resolution Protocol (ARP), network resources (computers, printers, etc.) and network devices (routers, etc.) can announce their IP addresses when they first boot up ("Hi all, my IP address is 10.15.70.12 and my MAC address is 0x2354A64389CC"). Additionally, they can listen, and respond, to ARP queries ("What is the MAC address for 10.15.70.44?").
Domain Name System (DNS)
An IP address is a (very long) number. It is hard to remember and type in, therefore humans much prefer hostnames. In order to access network resources by their name, we need to map computer's names to IP addresses. Domain Name System (DNS) allows resolving easy-for-human names to easy-for-computer IP addresses. Typically, there will be a DNS server in each network that will respond to queries for names. However, no single DNS server knows the answers to all the possible requests for names. When they don't know the answer, they redirect the question to a server with more authority for that domain name.
Let's illustrate this with an example. Let's say that we want to know the IP address of downloads.microsoft.com. If this is the first time we are querying this domain name, chances are our DNS server doesn't know the IP address of this domain. Typically, our DNS server would send a request to a top level DNS server (in this case .com) to find out who is the authoritative server for microsoft.com. Once it gets that information, then it would ask that DNS server for the IP address for downloads.microsoft.com.
ARP vs DNS
Although both ARP and DNS are tasked with address resolution, they are in fact very different protocols.
- ARP operates at the network layer. DNS operates at the transport layer (DNS typically works over UDP, please see next section).
- ARP is single hop only - no further resolution attempts are made, if no response is received. DNS servers, on the other hand, forward name queries to other DNS servers, for example, if they don't know the answer themselves.
- Since DNS servers may forward name queries, other DNS servers outside your network can receive also your name resolution queries. Therefore, they may possibly get information about what names you are interested in. For instance, if a name resolution request is made for "www.bankofamerica.com", it can be presumed that you are a Bank of America customer.
Internet Control Message Protocol (ICMP)
ICMP is used by network devices, such as routers, to send error messages (e.g. host or router could not be reached) and operational information (e.g. redirection of data packets to an alternative route). Another common use of ICMP is Ping utility. When network administrators need to verify access to other network resources, they use ping utility to send ICMP Echo Requests, which will be echoed back if the destination resource is up and running. Additionally, and depending on response time, administrators can identify slow connection speeds. Please note that sometimes, network administrators deliberately disable ICMP Echo Request responses to make their servers and other valuable assets less visible so they can prevent network scanning attacks.
Internet Group Management Protocol (IGMP)
So far we have discussed point-to-point (Unicast) addressing where one host addresses another one, as well as broadcast addressing, where one host addresses all the other hosts on the network. There is a third sort of addressing mechanism, called multicast addressing. A packet with multicast addressing will be addressed to multiple hosts. Common scenarios for this are video streaming in educational set-ups or game state distribution. Instead of sending the same information to multiple, individual computers, thus duplicating network traffic; with multicast addressing it is possible to send the information once only for most of the route, only splitting to individual hosts at the fork in the route. Internet Group Management Protocol (IGMP) is used to set up multicast groups.
Growing the Network Further
Network Layer is all we need to grow our network. Let's add other subnets.
We can even grow to the size of Internet.
Before connecting to Internet, we will typically want a firewall to protect our network from unwanted packets coming from network. As we mentioned, with IP, computers are addressable across the globe. Although that's normally a desirable thing, it can also make our network reachable to hackers. Firewalls filter incoming packets (sometimes outgoing packets, too) and drop packets that are not expected or wanted.
From thereon, we can connect to Internet. But what is the "Internet"? Basically the Internet is a collection of other networks; often quite similar to our own. The necessary infrastructure is provided by Internet Service Providers (ISP). We usually connect our network through another router. Unlike our own network, though, the router of an ISP is usually physically distant. In this case, an Ethernet connection wouldn't work because of the distance limitations. Fiber optic cables, coaxial cables (like in cable TV), radio waves (microwave antennas or satellite dishes for really remote locations) and even phone lines are common mechanisms.
To connect to the world, all we need is IP at the network layer. However, we need more than that when we need to transfer large amounts of data, reliably, between different applications on devices. This is where the Transport layer comes to rescue.
This layer is responsible for sending and receiving data between different applications on the hosts. Let's start with the most popular one.
Transmission Control Protocol (TCP)
Transmission Control Protocol (TCP) is by far the most popular transport layer protocol. It is used by all websites and most server applications.
It owes its popularity to its being a reliable protocol. A protocol is said to be reliable if it can
- Detect errors within the packets.
TCP verifies the integrity of received packets using a mathematical formula called checksum on its payload. If the received packet is verified as intact, TCP will signal the sender with Acknowledgement (ACK) packets. If the received packet is found to be corrupt or lost, TCP will not send an ACK response (or it will re-send an ACK for the last correctly received packet, signaling that the last transmission was not received correctly). The sender will then re-transmit the packet (perhaps in smaller chunks - called window size) until it is received properly.
- Can detect out-of-order packets.
Due to network load balancing, or other unforeseen events, packets can be sent through different routes, causing them to arrive out-of-order. It is also possible that multiple copies of the same packets are delivered. TCP solves this problem by introducing the notion of connections; thus making TCP a Connection-Oriented protocol. While a connection is being first established, TCP endpoints send and receive a number of Handshake packets before any useful data exchange happens. Among other things, handshake packets contain initial sequence numbers. Each packet sent thereafter updates the current sequence number. Packets can be sorted into the correct order at the receiving end, using these sequence numbers.
Besides being reliable, TCP allows the addressing of different applications on hosts via ports (not to be confused with physical ports on a switch). Thanks to different ports, a host can act as a web server and a mail server at the same time. TCP has a 16-bit port number, allowing for 65536 ports. Ports below the number, 1024, are reserved for well-known applications (e.g. port number 80 for HTTP, or port number 23 for FTP).
TCP is a fairly complex and relatively high-overhead protocol. For simpler tasks, there is User Datagram Protocol (UDP)
User Datagram Protocol (UDP)
User Datagram Protocol (UDP) is the simplest transport layer protocol. Despite its simplicity, however, it provides two functions that the network layer (and below) don't consistently provide.
- Ports: Similar to TCP ports, UDP ports allow the addressing of different applications on hosts via ports (again, not to be confused with physical ports on a switch). Thanks to different ports, a host can act as a DNS server and a video server at the same time. Again, similar to TCP, UDP has 16-bit port number, allowing for 65536 ports.
- Error detection: Similar to TCP, UDP can detect errors by applying a mathematical formula called checksum on its payload.
Because of its simplicity, UDP has some shortcomings
- Although UDP can detect errors, it cannot correct them. A damaged or lost packet is lost forever, unless a higher layer application implements a propriety solution at that layer.
- Due to network load balancing, or other events, packets can be sent through different routes, causing them to arrive out-of-order. UDP cannot detect that packets are out-of-order. Applications must be ready to handle packets in unexpected orders.
- UDP doesn't have any notion of connections. Therefore, it is relatively easy to inject packets into an existing conversation.
UDP has several interesting traits:
- UDP is as simple as transport layer protocols get. Therefore, it has a very small size-overhead.
- UDP is connectionless. If a host needs to send a UDP packet to a certain port, it just sends it without any prior handshake. Therefore, it has a very small time-overhead.
- As mentioned before, UDP is unreliable. Although this is a deal breaker for most applications, it is actually advantageous for applications where small errors don't matter. A good example is voice or video transmission. A slightly damaged or out-of-order video frame is a far less noticeable error than interrupting the video stream to request a retransmission of the lost frame.
Because of its unreliable nature, UDP is not suitable for use when data integrity is important. Transmission Control Protocol (TCP) is preferable for such applications.