I’d like to pick your brain for just a little bit…
The Internet has been measured and analyzed since the first connection was made between networks. Since measurement activities are shaped by everything from the intentions of the people taking the measurements to the vantage point of network and service operators, there are many different approaches and activities that are underway today that make up the landscape of “Internet measurement activities”. NOMA is tackling one corner of the landscape, but understanding its value is dependent on an awareness of the bigger picture.
To that end, we’re drafting a paper that aims to provide an overview of existing Internet measurement activities, approaches, challenges and activities, to build out a map of that measurement landscape. It is aimed at the general reader with an interest in the topic, including policy makers, measurement experts wishing to position their work in the landscape of such activities, and network operators seeking to understand available tools, services and practices with regard to measuring the Internet from their network’s perspective.
With this post, and subsequent ones over the coming weeks, I’m putting out draft text and asking if you would kindly share your thoughts on any errors or omissions, or even just general insights.
Comments to measuring<at>techark.org would be most appreciated.
First up for discussion — a review of the challenges that are faced when people start talking about “let’s measure <some aspect of> the Internet”.
1 Issues – Internet data measurement
There are some particular challenges that need to be addressed when reviewing data collected from, or before forming any kind of measurement of the Internet.
1.1 What are the “endpoints”?
On the user’s end, does the measurement start from the user’s desktop computer, or the CPE? While the smarts for the measurement may be running on the user’s desktop, the reality is that the home network (between the desktop and the CPE) may factor negatively into any measurements. For example, a service provider might provide the network to deliver 75Mbps of data to the CPE, but the user’s desktop may be connected to his home network by an old Ethernet cable – top speed 10Mbps. If the user runs a speed test from their desktop, they can’t see anything faster than the 10Mbps wire delivers.
Similarly, providers are often interested in ensuring that “their network” is well-connected to popular sites, such as Facebook and YouTube. From a routing perspective, “their network” means routers and other network boxes that might be spread far and wide geographically, and have little to do with the “last mile” connection to the customer’s premises. An ISP may have great connections to popular services, but if the customer is connected to the ISP by over-subscribed shared links, old copper, or other low grade links, the endpoint is not going to see advantage from that connectivity.
The same is true when talking about IPv4 and IPv6 connectivity – an ISP may support IPv6 in its core, but it takes a lot of work to update the hardware closest to the customers to ensure that each customer has IPv6 connectivity to their CPE. Then, what happens within the home network determines whether or not the desktop can actually connect to anything over IPv6.
On the server end, analogously, does the measurement reach a particular box on the network, or just one of several real or virtual servers that may be supporting a given service. For example, there is no single computer that “runs the Google website”. Sometimes service instances can be distinguished by differing IP addresses, but even a single IP may support a large server farm behind the edge of the service network.
On the one hand, the user only cares about what they experience – which is everything from their desktop to the server providing the responses to their Internet activities. On the other hand, being able to break down performance by some logical “neighbourhoods” helps: separating out the home network performance from the performance within the access network, and subsequent hops to the network service.
1.2 What is “near”?
From Buenos Aires, Argentina to Cape Town, South Africa is 4,276 mi (6,881 km) across the globe. However, that’s not how Internet traffic flows from Buenos Aires to Cape Town. Virtually (and, quite possibly, literally) all routes out of Buenos Aires to Cape Town go through Miami, US. To be quite clear, the distance from Buenos Aires to Miami is 4,405 mi (7,089 km) – already longer than the distance between the two endpoint cities – and then the distance from Miami to Cape Town is an additional 7,650 mi (12,312 km).
That makes Seattle, US (2,732 mi (4,397 km) from Miami) closer to Buenos Aires in the network than Cape Town is, although that is not at all obvious from looking at a geographical map.
1.3 What is a “fixed point” on the Internet?
At a logical level, “the Google server” is a fixed point in the Internet. However, given the discussion of endpoints, above, it should be clear that there is no single Google server, or one single “Google fixed point”. The same is true of other major global services. For some end users, Google and Amazon services may be “close” to each other, and for other end users that may not be true. The difference stems from the fact that each of Google and Amazon necessarily lay out their service CDN/duplication servers in ways that make sense to their own business, and not based on any global Internet service grid.
A “polestar” endpoint is one that is well known and fixed in the network – at a single IP address that is not anycast from multiple vantage points. This describes few major services today (anything popular is hosted by a CDN). Some NTP servers, as general Internet infrastructure, fall into that category. Of course, services that are built out for the purpose of looking through the network towards fixed points can establish their own polestars.
1.4 Span and scope of measurements
With the variations outlined above, another challenge in setting up Internet measurements is ensuring appropriate span or scope of the measurements. For networks under your administrative control, you can manage and account for different factors, and you can install active or passive gatherers at any and all points as necessary. That gives you confidence in the measurements within your own network, but it doesn’t help address the variability of any measurements that reach outside it (e.g., toward a “pole star” server). It also doesn’t necessarily give information that is readily compared outside the scope of your own network.
To get global span, it is necessary to have some kind of reach into and/or through other networks, and diversity is important. The approaches discussed below outline how that has been addressed in projects to date.
 CPE is “customer premises equipment”; the box that connects to your ISP’s access network.
 To make matters worse, most routes actually go from Miami to some other network node, in places such as Colorado, US or Paris, France, before connecting to Cape Town.
 Network Time Protocol – see https://tools.ietf.org/html/rfc5905
Okay — we’re listening! Comments to measuring<at>techark.org would be most appreciated.