From Ethernet to HTTPS and Everything in Between
Presented by: Brian Bradley
Originally aired on August 21, 2022 @ 12:30 AM - 1:30 AM EDT
A presentation of the layers of technologies that the Internet is made up of, and how and why the world wide web culminated into a request and response oriented information exchange.
English
Protocols
Network
Transcript (Beta)
Hello, my name is Brian Bradley. I'm a systems engineer at Cloudflare. This video I'll be using to survey the layers of technologies that the Internet is comprised of.
We'll be covering various protocols including Ethernet, Internet, user datagram, transmission control, secure socket layer, hypertext transfer, the domain name system, World Wide Web, and conclude with caching.
So, Ethernet. Let's begin our journey through the construction of the Internet with Ethernet.
Ethernet was developed in the 80s to make communication between two or more directly connected computers easy.
At the time it was developed, it competed with other technologies and networking strategies that fulfilled a similar role or supported a similar use case.
Those included fiber distributed data interface, attached resource computer network called ArcNet, and token ring.
The Ethernet protocol itself is just a set of rules that govern how two machines should interpret signals that are transmitted electrically over a medium.
The original medium was a coaxial cable that served as a shared medium between multiple computers.
So, more than two, could be three, four, five, ten. They would be connected to the same electrical network by Ethernet cable so that a signal produced anywhere would be received everywhere.
The signal itself is divided into small pieces called frames.
And maybe you've heard the terminology Ethernet frame.
The frames contain the payload and some metadata, the source and destination identifiers, and some error detecting code.
So the erroneous or faulty frames could be discarded.
These frames are kept relatively small in order to reduce the chance of a transmission error.
All of the computers connected to the same electrical medium together comprise what's called an Ethernet segment.
So a machine connected to a segment inspects all the frames for the destination addresses and ignores frame data that is not destined to reach it.
So it's like everyone's sort of hearing each other's conversations.
That's how it worked. Unique identifiers, identified source and destination within a segment were predetermined at manufacturing time.
And those are called media access control addresses or MAC addresses.
The Institute of Electrical and Electronics Engineers is the authority responsible for allocating MAC addresses.
They allocate them for manufacturers. The identifiers are globally unique.
One of the reasons Ethernet was so successful was because of its simplicity.
But over time, developments were made to improve upon Ethernet shortcomings.
Cable breakage could yield an entire segment unusable. And so devices called repeaters were invented to increase the maximum practical size of the segment and to isolate cable breakage.
With repeaters, the number of machines connected to a segment could be increased, but the bandwidth was constrained because their independent attempts to communicate could result in a collision.
The larger the number of communicating machines in the segment, the higher the chance for collision and the lower the effective bandwidth.
So to alleviate the collision problem, devices called bridges were invented to forward only well-formed Ethernet frames from one segment to another.
That way, if you did have collisions, they wouldn't have to propagate through the entire network and create even more collisions.
They could sort of destroy each other when they cross an Ethernet bridge.
Additionally, bridges maintain a table that allows it to remember which segments contain which addresses so that it may forward frames to only segment that contains the machine destined to receive it.
So multiple transmissions occurring simultaneously over the wire from different machines was a problem because the frames could not usefully carry information when they collided.
And bridging only partially alleviated the problem.
So new devices called switches were invented to electrically join multiple Ethernet cables, but inspected destination metadata in each frame and only send the frame along the cable that is directly connected to the destination.
So that's sort of like the best that you can do.
And by connecting multiple machines directly to a switch, they can each communicate with each other without broadcasting their communication to every other machine connected to a switch.
And that greatly reduces the chance of frame collision and increases the bandwidth of the segment.
Collisions can still happen, even with that, because communication bound for your machine still has to travel over one wire, the wire leading from the switch directly to your computer.
But the chances of getting a collision there is much less than having a collision occur somewhere off on the network without switches.
Ethernet is an intrinsically insecure technology.
It's trivial to spoof an Ethernet frame and to fool a bridge into sending the data into a segment that it wasn't originally intended for.
By joining segments that should not be joined, it's possible to form like a link level denial of service as well.
You can create like a very noisy, very noisy Ethernet network just by knowing how to connect the nodes that shouldn't be connected.
Although Ethernet made it simple to allow machines to communicate directly with each other, possibly with the assistance of intervening specialized devices like repeaters, switches, and bridges, it's not well suited to express the intent that a piece of data should be communicated across networks in order to reach a destination.
And for that, the Internet protocol was invented. So, Internet protocol.
Internet protocol makes it possible to route a piece of data called a datagram from a source to a destination based on identifiers called IP addresses to support delivery over different networks with different transmission characteristics.
Internet protocol specifics, I'm sorry, the Internet protocol specifies a means to fragment and reassemble pieces of a datagram so that a wide variety of networks can support Internet communication and those Internet protocol datagrams are also called packets.
A key invention, like probably the biggest, like most important invention of Internet is this device called a router and routers facilitate communication between two separate networks, two totally different networks.
They do that by they inspected the IP address inside of the packet and they look in a table that they store inside themselves and they determined based on that how to route the datagram.
They're specialized to join specific kinds of networks together.
You could have a router that combines two, that combines, joins two ethernet networks together or more.
Or one that sort of joins cellular and ethernet so it makes it possible for devices that are cellular to talk to devices that are on an ethernet network.
And that's really where the magic of like the Internet protocol happens.
It lets you combine these very different networks together and really it lets you connect devices that wouldn't otherwise have the ability to be connected and send information across those networks.
Which is important.
So a router might join many, many ethernet networks together.
So it might have multiple ethernet host interfaces.
That's what you would call that is a host interface.
And, you know, nowadays there's a gajillion different kinds of routers that support a gajillion different kinds of host interfaces.
But to identify which host interface a datagram is destined for, the range of possible addresses is divided into sub ranges called sub networks and designation of network prefixes is defined by the Internet protocol itself.
And certain ranges are allocated to Internet service providers and then those Internet service providers allocate ranges of addresses to their constituents and their customers.
Another pretty, I guess, salient thing to think about with the routers is how do the routers get their routing information, right?
How do they collect those tables up?
So routers communicate with each other via a very specialized routing protocols called gateway protocols.
Internet service providers must cooperate with each other in order to share routing information using those gateway protocols.
The external gateway protocol is like the common language of all the routers between different Internet service providers and individual Internet service providers may or may not use a form of gateway protocol called an internal gateway protocol to manage the sharing and distribution of routing information among their own fleet of routers.
The entire routing network that an Internet service provider manages is called an autonomous system.
To indicate the fact that that's how the provider manages the dissemination of routing information that doesn't necessarily follow like a specific standard or convention.
It's fully autonomous. And as long as the individual Internet service providers can communicate routing information with each other coherently, then it doesn't really matter how they configure their own routers.
So, those autonomous systems connect with each other through like a physical infrastructure called Internet exchange points.
And those are kind of like a crossroads of the Internet. It's like a place where many networks can route into many other networks, serviced by potentially different providers completely.
And being a crossroads means that the branching factor at the Internet exchange point, like how many different networks it branches out into and how many destinations you go out to into from that Internet exchange point, it's really high, extremely high.
So, it's like an ideal place to put a server.
So, because it would be able to directly communicate over many different networks directly without having to be bottlenecked by going through an intervening network.
And when multiple Internet service providers cooperate with each other, can carry each other's traffic, that's called peering.
So, content delivery networks like Cloudflare are usually just big gigantic fleets of servers distributed among data centers that are within or very close to those Internet exchange points.
And that lets them effectively serve content while routing through few or sometimes no intervening networks.
For instance, if you live in a big city or maybe like a city like San Francisco or something, there might be an Internet exchange point within your city.
And large content delivery networks like Cloudflare will probably have a data center within or near to that exchange point.
So, the data that can be served to your machine probably will be served within like a few milliseconds, very fast.
Because it doesn't have to travel over the larger Internet.
So, the external border gateway protocol, that's like the common language, it's intrinsically insecure.
It's just based on trust between Internet service providers. There have historically been incidents that broke the entire Internet for hours.
For instance, in 2004, a Turkish Internet service provider called TTNET accidentally announced itself as the destination of basically the entire Internet.
And so, all Internet service providers that it was peered with received bad routing information and spread bad routing information to their own peers.
And it just kept spreading and made things really bad.
We got fixed, but it's interesting that even though it's a fundamental building block of the Internet, the efficacy of external border gateway protocols just build on trust.
So, the Internet protocol itself adheres to the end-to-end principle, which means that intelligence is expected to exist in, like, the communicators at the ends.
Like, not in the network, but on the ends of the network where the machines are.
Or maybe some intelligence in the protocols that you could layer on top of the Internet protocol.
For that reason, it doesn't guarantee delivery of datagrams.
There's no way to know that a datagram was received.
Datagrams might arrive, or not arrive, or arrive out of order, or arrive multiple times.
The wisdom in providing less guarantees is that it supports a wider range of use cases that it would be impractical otherwise.
For instance, video streaming. It can just buffer up data as fast as it can.
And if you miss some packets, it doesn't really matter.
You get a blip in your video stream, but it's not the end of the world, and it doesn't really completely ruin the experience of the video stream.
Transmission of email might require a different, more stringent set of guarantees.
And it might need to perform things like checks, and acknowledgments, and all kinds of stuff.
But you can just build that on top of something unreliable, like the Internet protocol.
You can implement your own acknowledgment. That's basically the spirit of Internet protocols.
Don't do everything for everybody.
Just provide a universal service. And, you know, like Ethernet, it's fundamentally insecure.
Like most of the Internet is fundamentally insecure.
Datagrams can be spoofed or read by any intervening party with just Internet packets.
So, let's see.
What's next? Let's talk about user datagram protocol, UDP. So, the user datagram protocol is an instance of a protocol that is unreliable, like the Internet protocol.
It's well suited for things like video streams. It includes some small amount of integrity verification.
It has like a checksum in it.
Just to ensure the datagram can arrive undamaged.
And it adds a little extra metadata on top of the Internet protocol.
Kind of adds one special little sauce. See, the Internet protocol by itself only allows two computers to communicate.
But it doesn't specify a way for computers to simultaneously have different conversations at the same time.
So, the user datagram protocol was kind of invented to let computers distinguish between separate conversations using a number called the port.
The Internet Signed Numbers Authority has divided permissible ports into different ranges.
One range for well-known, like well -established ports that everybody knows about.
And then there's a range for ports that are sort of been, they've been reserved.
I would say probably not everyone knows about them, but they've been reserved.
And you can go and look at them.
And then there's like a range for just wild, wild west where anybody can do anything.
It's sort of dynamic. So, software engineers just use those conventions to facilitate communication between applications that offer different kinds of services.
Like a web service would be given the port 80 or maybe 8080.
Or maybe 443, depending on certain attributes the service might have.
And you might have like a database.
Database might have a specific kind of port.
Email has a specific kind of port. Services that actually are used by all the browsers like DNS has a specific port that's associated with it.
It really lets you get more than one utility out of a single IP address.
That's one way to think about it.
And then some of those like the email, like I said earlier, you can't really service that with UDP or with IP, something unreliable.
You want something more reliable, more guarantees.
But the port idea isn't really specific to UDP. It's just sort of the one thing that makes UDP different than IP.
The port conventions are just going to be honored regardless of the protocol, as long as the protocol includes some sort of ability to distinguish different communication channels via designation of a port.
And some ports are associated with, you know, I guess you could think of as some ports are associated with communication or services that can have relaxed communication constraints.
They can be sort of sloppy.
And others sort of really need a more robust communication channel.
Because of a spectacular growth in utilization, there weren't enough addresses originally reserved to facilitate the current addressing requirements of Internet.
The original specification mandated enough bits of information for like, I guess I think it uniquely tracks like a little bit over 4 billion values.
And the entire value space wasn't by any means used optimally. Like it was pretty bad at first.
And they made some changes to like help make more and more use out of it.
But even then they weren't doing a great job. So, generally an entire network like a campus or certainly a home, things like a campus, a big office building or something, might only just get one IP address.
So in order to expose individual devices on that network to the rest of the Internet, you know, I want to be able to expose my printer to the Internet at large so that I could print something at home from the office.
I need to be able to talk to my printer and say, you know, send this command to my printer from the Internet.
I got to have some way to like locate that printer, right?
And if my entire home network is just one IP address, how do I even do that?
And the answer is kind of hacky, but that's the way it works.
It's to basically abduct the role of the port.
You abduct that role of the port and instead of having it supposed to be like it's a service designator, it kind of still is a service designator.
It just happens that you've sort of overridden that and made it so that it kind of also designates the machine on which the service is running, right?
So you would pick a specific port and you'd say, okay, when I say this IP address plus this port, I'm actually talking about my printer.
Even though in my home network, it's going to be, you know, within my home network, all of my different devices will have different IP addresses.
But from the perspective of the larger Internet, my entire network only has a single IP address.
So you might think, okay, how does that happen?
How does the like translation go on? How do you translate between, you know, me saying my entire home network's IP address plus like some port?
How do I translate that into the IP address that's on my local network, my local home network that uniquely identifies my printer?
And that's actually just called network address translation.
It's running on the router and it connects the network to the rest of the Internet.
From the perspective of the Internet, just want to underscore, from the perspective of the Internet, your network is just like a single machine, basically.
It sees it like a single machine and the ports, they identify services.
That's the illusion it wants to have.
Could be in reality that those are all running on different machines within the network, but it doesn't care.
That's not the illusion that you get.
So there's no way to really be sure.
If someone gives you an IP address in a port, there's no way to be sure if they give you another IP address in a different port.
Same IP address, different port.
There's no way to be sure that that's running on the same machine or a different machine.
It could be running on a different machine. You don't know.
If you're talking about your own or a local network that doesn't have network address translation within the network, doing something funky, then you can be sure.
But with network address translation, you can't be. And it's a problem because there's a relatively small number of ports available for use.
So there are limits on the number of externally addressable machines that you can have.
Just an artificial limit. Nothing physical. Just like we basically hacked ourselves into a bad place.
Did our best to make use of the available space that we could, but eventually the Internet just kept growing and growing and growing.
It just runs out of room, even with the ports, even abducting the port to mean something else.
It's still not enough. So it was necessary to develop a new version of the Internet protocol, and that would be IPv6.
So the old abused version is called IPv4, and the new version is called IPv6.
Many of the limitations and constraints that existed in IPv4 are no longer really serious concerns in the new version.
And just like the Internet protocol, by the way, the user datagram protocol, it's insecure.
Just like IP.
Nothing about it would... I mean, for one thing, you can spoof the user datagram protocol packet.
Yeah, you can spoof the datagrams, right?
If it says that this is the source and destination, you can just change any of that.
If you can intercept it, you can just change it. So one technology that's built on UDP, the user datagram protocol, is the domain name system.
So the domain name system is a... It's a service that serves... It's kind of like a...
It's almost like a database, a giant key-value store, kind of. It's just a big table.
It's a table that maps names to IP addresses. So the Internet protocol...
It basically just talks about things in terms of numbers that it understands, right?
But, I mean, I don't want to have to memorize a bunch of numbers so that I can access machines or use machines.
Or, like, I don't have to memorize... If my home network is publicly exposed over the Internet, it's not.
But if it was, and I wanted to make use of my printer, I don't want to have to memorize my home network's IP address just so I can say, okay, you know, 55, 9, 38, 6, 5, 7, double colon, like, or whatever, right?
I don't want to have to spout a big string of stuff. I want something a little more...
Like, more human.
And that's the reason you can go into your browser and you can type in something like Cloudflare.com or something, like Google.com.
You can type that in and it means something to the computer.
And, you know, how does that work?
So, the domain name system is actually... It's not just a system. It's a protocol as well.
So, there's this process of resolving a domain name system.
Sort of... You give it some text and the domain itself, it consists of these segments, pieces.
It can be, say, for instance, api.Cloudflare.com.
The .com is what's called a top-level domain.
Org is another top-level domain. And then there's assigned domains, like Cloudflare, that you'd have to go to a registrar to get an assignment.
And then there's subdomains like api, which after you've gotten your assigned domain, you can make as many subdomains as you'd like.
So, the domain name api.Cloudflare.com, it contains three different kinds of domains.
Top-level domain, an assigned domain, and even a subdomain.
In order to resolve the domain names into a number, the operating system on the machine that you're using is pre -configured with the IP address of a machine that can service domain name system requests, right?
Because it's based on user datagram protocol.
Because it's based on user datagram protocol, there's no way to know whether or not the machine that you're sending your DNS request to actually received it.
So, you don't know whether you'll get something back promptly or whatever.
So, you'll maybe just have to wait for a little bit, and then if you didn't get any answer, send another request.
That's the best you can do. The domain name system can store other kinds of records besides IP addresses, and various metadata and information about the address.
It also has certain use cases that I guess aren't super obvious.
It's been used as a general purpose database for storing a real-time black hole list.
Black holes, it's sort of like what happens when you throw something into a black hole, it gets stuck down into nothing, right?
Internet service providers like to take traffic that is malicious and route it to a place where routers just abandon it, and it basically goes to nothing.
And so, you have a list of these malicious things or identifiers for the traffic, and you need a way of storing a lot of it.
And the domain name system is actually pretty good at what it does. So, it serves as a pretty great distributed database for that kind of thing.
And it's used to combat unsolicited email, more or less.
So, I guess that would be not really the domain name system proper.
That's not the intended use of it.
Let's see what else.
Domain name system, it's insecure because it's just a single user datagram.
So, it's possible to query domain name system on behalf of another machine.
And I would do that by making a datagram and saying that the sender is not me, it's you.
And so, I would send that off, and, you know, a few moments later, you're going to get an answer from the domain name system.
And your answer might be bigger than the question I asked by quite a lot, if I ask a very wordy question.
And you can attack a machine that way, basically using the domain name system as a kind of hammer or weapon.
So, it's not secure at all.
You know, surprise, surprise. Most of the Internet's not secure, neither is UDP, neither is IP.
And that brings us to transmission control, I suppose, since we've already covered UDP.
Might as well cover transmission control protocol.
So, unreliable communication channels have their use, right? But for many applications, a robust, reliable communication channel is a requirement.
And for that, transmission control protocol wasn't been in.
The key, I guess the key trait of transmission control protocol is probably the concept of a connection.
So, a connection is just a state.
That both the receiver and the sender enter into, like, an agreement.
Say, I'll have this state, and you have this state, and we'll call this state connection.
And it signifies that further communication can sort of just proceed until the connection is broken.
Doesn't sound particularly useful. But it's the metadata and capabilities that are enabled by that cooperatively tracked state that make transmission control protocol so useful.
It's bidirectional, so that both parties are senders and receivers.
And each of those directions can be independently terminated.
The data that's sent over the connection is divided into these little pieces called segments.
And the procedures associated with the connection automatically manage the, like, reordering of the segments, resubmission of segments, if there was no acknowledgement sent from the receiver that a segment was received, deduplication of segments, congestion control to keep from overwhelming a slow receiver or network.
They have integrity checks to, like, validate the segments, just like UDP.
And they automatically break up a stream of data into segments, or on the other hand, reconstitute it from segments into a stream.
So, from the person or machine using it, it just looks like I'm pushing a stream of bytes out in order, and it gets out and arrives on the other end in order.
That's what it looks like. There's a lot of little magic happening.
It's just built on IP, IP protocol. And remember, IP protocol does almost nothing.
So, it's doing quite a lot of stuff. That sort of is, like, the cleverness of IP protocol, is that it did so little that it let everyone sort of use it for anything.
UDP uses it, and TCP uses it, even though they're dramatically different.
Because of the additional intelligence and stronger guarantees, the time required to submit information is greater than for something unreliable, like user datagram protocol.
To be specific, on the receiving side, this is because the segments must be presented in order, right?
And so, even if you have, like, received 5, 6, 7, 8, 9, I'm just ordering the segments here.
You can't flush that out to the receiver until you receive 1, 2, 3, right?
1, 2, 3, 4.
You need that fourth one. So, you can receive 1, 2, 3 pretty quickly, and then receive 5 through 9 fairly quickly, but take a while to get 4, and you're not going to get anything for a long time.
Because it has to be in order. And on the writing side, the sending side, it also has some pretty big latency overhead, because you need an acknowledgement from the receiver.
So that the receiver knows that...
Okay, you need acknowledgement from the receiver so the sender knows that it doesn't have to keep sending you the same thing over and over, right?
Because it'll keep sending the receiver something until it gets acknowledgement that it doesn't have to do that anymore.
That's its way of guaranteeing that the actual datagrams arrive.
That's its way of getting that robustness out of something as unreliable as IP.
So, the latency penalty is actually, like, a game changer in a bad way for certain applications.
Like, you wouldn't want it for a video stream. But it's not a substantial drawback for a lot of applications.
And some applications just really need that reliability and robustness and, like, ordering guarantee.
You wouldn't have a lot of applications without TCP.
Like, text-oriented communication, email and all that.
Although those aren't really based directly on TCP, the concept is the same.
If you want to, like, issue commands to a machine remotely, like a remote shell, it's going to be based on TCP, typically.
Things like file transfer, where you want every byte to arrive, that's pretty important.
It may not necessarily be totally important that they arrive in order, unless you're writing to a disk.
Because disks are sequential devices. SSD, faster sequentially, but not necessarily, like, strictly have to be sequential.
You can have some fudge room there.
So, I don't know anything about, like, a hyper -optimized protocol to make file transfer as fast as possible.
But I'm sure such a thing exists.
But TCP is a very good and very heavily used protocol for a variety of things, because it's such a robust thing.
It hasn't changed all that much since it was invented.
But, and, you know, you probably knew this was coming, it's intrinsically insecure.
It's very insecure.
Any intervening machine can hijack a connection by spoofing the metadata that tracks the sequencing of the segments, right?
So, you can insert any data you want.
If you're in the middle, you know, people call that monster-in-the-middle attacks.
That's what's happening, right? Someone's in the middle, and they're making your friend think that you're saying something that you're not.
And you can do that noisily, or you can do that sort of sneakily.
Sneakily would be, like, just change one of the segments.
Noisily would be, like, insert a bunch of segments, right?
They're going to figure it out if you insert a bunch. But if you just change one of them, it's so hard to detect.
It's, like, almost impossible to know that that happened.
You can also do, like, a kind of denial-of -service attack, I guess, if you, let's see, if you repeatedly send SYN and ACK packets with the intent of, like, forcing the server to manage resources just to have those connections open.
Because remember, those connections aren't free.
And you're going to be using up some very valuable port space.
By doing that. So, okay.
So, let's go on to transport layer security. Because we talked about all this stuff that's been so insecure.
And we need, like, a hero to come and save the day.
Kick down the door and, like, say, okay, no more of this. We're going to be safe, right?
So, transport layer security is the successor to something called secure sockets layer.
Which is old, ancient dinosaur stuff. It's, like, protocols designed to provide security over a network so that the communication can be, there's, like, three things.
Private, authenticated, and reliable. And transport layer security is, it's session oriented.
Which is interesting. It's different than connection oriented.
It provides its guarantees for connections started or resumed within a session.
So, session is this, like, bigger umbrella than a connection.
You can have multiple connections within a session, if you'd like.
So, let's think.
Authentication is achieved via public key cryptography. So, in order to make it possible to trust that a machine has the authority that it claims that it has, an already trusted authority has to exist.
That you can query and say, you know, help me to know this for sure, that I can trust this person.
And I know that this machine I'm talking to is who they claim to be, right?
So, there's a bit of mathematical magic going on there.
And the way it works is the certificate from the certificate authority that you already trust includes a public key.
That can be used to encrypt or decrypt any data.
Encrypted or decrypted with a corresponding private key that is associated with it.
And those words, public and private, are important. Public, because everyone has access to it.
Anybody could ask the certificate authority to give me a copy of the public key that I can use as a machine to encrypt or decrypt data.
But only one entity is supposed to have access to the private key. So, if I have a public key and I, and here's the other important bit, if I have the public key and I encrypt or decrypt, I can only encrypt it one way.
I can't encrypt and then decrypt the data that I just encrypted.
It doesn't work like that. The public key can encrypt data into a form that can then only be decrypted by the private key.
And the private key can encrypt data into a form that can then only be decrypted by the public key.
So, it's this asymmetric thing. That's where it gets its name.
So, only the machine with the authority claimed by the certificate should have the private key.
So, if that machine can prove to you that it understands a message that was encrypted with a public key, then the machine is basically considered authenticated.
Let's see.
Privacy is achieved via symmetric cryptography. So, as part of authentication, a cipher and keys, the symmetric keys, are agreed on just as part of the authentication.
So, you establish this handshake and establish this communication with someone and you've established the person that you're talking to is who they claim to be.
Great.
So, now both ends are responsible for collaborating with each other to come up with these extra pair of keys.
And the purpose of that is to ensure that nobody else can understand the conversation that you're having.
You're speaking your own private language that only two machines in existence understand.
And the symmetric because it encrypts and decrypts both ways for that.
Reliability is the last one, right?
We went over authentication and privacy. So, reliability.
So, reliability is achieved because each message just includes an authentication code and then you've encrypted it.
So, it's not possible to go in and muck with that without invalidating the whole thing.
So, it's easy to prove that it hasn't been tampered with or undetected.
Loss hasn't occurred. And that's, if it hasn't, the validation will fail.
And so, it's not really that complicated.
The reliability is actually pretty easy to achieve once you have the other things.
And hyper, let's see.
No, let's say, let's, let's. Okay, so transport. Let's, let's not go into hypertext transfer yet.
Let's talk about a little bit more about transport layer security.
So, it's something people don't really think about too much, but it can technically be used.
TLS, transport layer security, can technically be used as long as the state for a session can be agreed upon and synchronized between two parties.
And usually that means it's combined with a connection oriented protocol like TCP, right?
But there's nothing that strictly forbids it from being used with connectionless protocol after the necessary state has been synchronized.
Okay, so you could maybe use TCP or something else or your own hacked up version of UDP to initiate a session, finally get a session going, and then the rest of communication can happen however you want.
You could just do it with IP if you wanted to.
You'd lose a lot of stuff because IP is lossy and unreliable, but it'd be encrypted.
And a lot of unreliable protocols can be secure. It's like maybe something we don't think about too much, but they can be.
Just because they don't have connections doesn't mean they can't have sessions.
Well, in some sense, at least.
So let's move on to, okay, now let's move on to hypertext transfer, which is like the last part of this.
So hypertext transfer protocol, also called HTTP, is it's built to solve another problem, right?
All these layerings on top of each other have been built to solve a problem that existed in one of the lesser layers.
And one of the issues with transport control protocol is that it can suffer from state drift and load balancing issues, which sound like big words, but essentially using transport control protocol as it means it just means you have to track state on both ends of connection in order to keep like a context.
It's like two people talking with each other and communication can go both ways.
So, you know, it's like I have to keep track of some state and he has to keep track of some state and we're talking with each other and the state has to like march forward on both sides.
And that's great and all.
Up to the point of what happens if your, you know, your partner, your other machine sort of like goes kaput, right?
What if it turns off or something?
Or there's a network problem. It's not easy to, like, it's not easy to resume that state, right?
How am I going to get the state back? I had this conversation that was half completed with this other machine.
How do I get it back? And generally, it's not possible.
So that's a problem.
It sort of is like a kind of intrinsic unreliability that exists in the intended use case of TCP.
Now, TCP is perfectly great and perfectly reliable for fairly brief communication.
But what if I wanted to hold open a connection for a long, long, long, long time?
Not really good for that, is it? So, so it presents an issue.
Well, for one thing, holding connection open for a very, very long time isn't ever a good idea anyway, because the machine you're talking to could go away, right?
If it catches on fire, there's nothing you can do.
And they do. Machines catch on fire sometimes.
So it all presents an issue, a big issue.
If connections can be interrupted due to machines being turned off or crashing, networks having problems, a myriad of other issues, tracking the state and making sure resumption of the state can be achieved, even if the machine catches on fire, it's not something that we're going to be able to do.
So you have to adopt some sort of solution. And the solution that the industry has come up with is, okay, we just won't have state.
We'll adopt a stateless approach.
So it's an approach where responses are intentionally not dependent on anything except just the request.
So there's a request, you can think of it in terms of TCP, I'd make a request over the channel, and I get a response back.
And the response I get back is not dependent on previous requests that I made.
It's dependent only upon this instigating request, nothing else. Right? Now, you can fudge a little.
Maybe there's some state that's on a machine, and so it's like a key value store or something, and you're fetching stuff like that.
Like I said, you can fudge a little, but the intention is that it's stateless.
In other words, that previous requests don't modify your response.
So, and because it's, it's also not just, it's typically, typically, the case that you have like a server who's responsible for answering queries and then you have a client, all these different clients are responsible for asking the queries, right?
So there's only one response giver typically.
And if you think about it, because of the statelessness, the machines actually become fungible.
So any machine can be taken offline an instant, and the request can be rerouted to a different server that would serve the same response without any loss of service.
And that's an important quality for implementing like reliability in software applications at a big enough scale.
It's actually pretty challenging to have reliability without something like that.
And the fungibility is underscored when the response either depends exclusively on the request or when eventual consistency is fine.
When you have state on different machines and they can be sort of out of alignment with each other, but they eventually are expected to become consistent, that's fine.
For instance, it might be tolerable if a server which shares responsibility for others with other servers for serving the bytes of an image over hypertext transfer protocol.
If all those servers can disagree with their peers about what constitutes the most current version of an image, that's fine.
Actually, it's not like a game changer or like a killer thing because it's probably like an artist came in and modified made one little tweak or something.
Not always, but you know, as long as you have some amount of control over that, it's actually pretty good.
And here's like the kicker of the whole talk.
So one of the critical flaws of the Internet is the vulnerability of networks that populate the Internet to distributed denial of service attacks.
Even if your server equipment can handle the attack, there's no guarantee that various network bottlenecks within an Internet service providers between your network and the path of the traffic would be robust enough to handle it.
The load could just cause it all to come crashing down. And that's why caching matters.
And that's what makes Cloudflare's utilization of powerful data centers next to the various Internet exchange points such a compelling thing.
By organizing the information on your site or application into a stateless sort of request response model and tolerating some amount of eventual consistency where permissible, it becomes possible to leverage a content delivery network like Cloudflare to defeat, just totally defeat and shamefully defeat distributed denial of service attacks.
It's like the ability to have a voice that can be heard by any audience that you'd like and nobody else privately in such a way that no party, no third party can impersonate, censor, deny, or intercept you.
And that's kind of like an unprecedented thing.
You know, that's what makes the Internet an equalizing force.
And that's why caching is important. It really makes your voice louder.
So that's pretty much the end of my whole spiel about Internet technologies.
I hope you enjoyed.
Thank you for watching.