Summary of
"End-to-End Routing Behavior in the Internet"
A Study by Vern Paxson
The purpose of this paper is to provide a summary of the article "End-to-End Routing Behavior in The Internet" by Vern Paxson, Ph.D of the Lawrence Berkley National Laboratory, (LBNL) University of California, Berkley. In the subject article Paxson presents his analysis of Internet routing behavior, which he derived from a study of repeated "traceroute" measurements that he conducted between Internet sites from 37 different locations around the world. Paxson uses the behaviors that he observed in 40,000 individual "end-to-end" Internet routing measurements to characterize the routing behavior in three main areas of interest. The first of these is routing pathological conditions. Paxson presents measured data on Internet routing loops, errors, oscillations, and outages to support his finding that the probability of encountering a major routing pathology more than doubled from the end of 1994 to the end of 1995. The second area of interest is routing stability. Paxson uses repeated measurements between pairs of Internet sites to demonstrate the frequency with which certain routes changed over time. The third area of interest is routing symmetry. Paxson presents data to illustrate how the path of the routed message (from end A to end B) is affected by routing direction (A=>B, or B=>A). Paxson relies on statistical analysis of his measurement data to support his findings and provides discussion as to their meanings.
The report begins with a discussion on related research that was performed and documented prior to the start of Paxson’s own research. Paxson states that most of the previous routing related research was focused on the design and function of routing algorithms and protocols. He points out that few of the previously preformed studies provided any focus on routing performance. Those studies that did address performance lacked the kind of quantitative analysis that he performed. One previous study that Paxson discusses as having relevance to his own work is one conducted by B. Chinoy in 1992, on the performance analysis of routing information on the NSFNET. Chinoys study focused on how information was routed inside the network. Chinoy found that the changes in network routing activity were concentrated at the edges of the network rather than along its back-bone. Chinoy suggested that further study should be conducted to analyze the "end-to-end" dynamics of routing information across networks. Paxson undoubtedly drew inspiration for his study from Chinoy’s study. At several points throughout his article Paxson points to the findings of Chinoy and some others to support his own findings and conclusions.
Problem
The Internet is a system of interconnected computer networks, or autonomous systems (AS) which use a Boarder Gateway Protocol (GBP) to exchange routing information with one another. Each AS is a network or group of networks that is controlled by a single administrative entity (a business entity, a university, etc.). Instabilities within the AS's and in their exchange of routing information can sometimes have a drastic effect on the behavior of "end-to-end" message routing process. The purpose of Paxson's study was to identify patterns in the behavior of this routing so that we can understand the nature of the routing behavior. Understanding the behavior of a system is an important step toward improving it.
Summary of Key Terms
There are several key terms, which Paxson introduces and uses throughout his report to convey the methodology and measures that are used in his study. These terms are defined below.
Methodology
The routing measurement study was conducted by first running the NPD program at a number of hosting Internet sites. Each of these NPD’s was contacted at varying intervals by a controlling program that was running at LBNL. The controlling program directed the NPD to route a message to the NPD of another host site and used the traceroute program to record the desired measurements. The measures of interest include time and success of connectivity as well as the virtual path of the routed message.
There are two distinct sets of routing measurements that were taken in this study to measure Internet routing performance. The first set of measurements, referred to in the report as D1 involved recording data about traceroutes between 27 Internet sites. A total of 6991 traceroutes were attempted between these sites, in November and December of 1994. For D1, the virtual path between each pair of Internet sites was measured with a mean interval of 1 to 2 days.
The second set of measurements, referred to as D2 involved recording data about traceroutes between 33 Internet sites. A total of 37097 traceroutes were attempted between these sites, in November and December of 1995. For D2, the virtual path between each pair of Internet sites was measured with two different intermeasurement intervals: 60 % with a mean interval of 2 hours, and 40% with a mean interval of 2.75 days. This allowed for assessing routing stability over different time scales. Also, most of the D2 measurements were conducted in pairs (A=>B, and B=>A) allowing for the study of routing symmetry: how the virtual path is affected by the direction of the route.
The routing behavior study included recorded events between just 37 Internet hosts. Since this is such a small percentage of the millions of host sites worldwide, the recorded data cannot be used to make conclusions about the effects of endpoint host conditions on Internet routing. However, the focus of this study was not the behavior of the end conditions, but the route between them. Since the resulting routes made use of a significant number of the AS's which together make up the internet and since the different routes within each AS are expected to have similar characteristics, the measurements do provide a reasonable representation of the routing events.
One of the short comings of the methodology used in this experiment is that due to the centralized nature of the experiment execution, there is a tendency to underestimate the problems that exist in the "end-to-end" message routes that are being observed. As mentioned above, a controlling program that was running at LNBL initiated all of the routing measurement events. If the connection from the controlling program to the remote Internet host failed, then nothing is known about the nature of the routing that might have happened. Certainly, some of these connection failures were resulted from instances of pathologies that were of interest to the study. But, because the initial connection was attempted at the point of occurrence, this data is lost. To eliminate this shortcoming in any further related research, the controlling program could be made to contact each site prior to the start of the experiment and provide instruction on the timing and destinations for a batch of routing events.
Routing Pathologies
One of the routing pathologies studied in this experiment is traceroute loops, in which the same router was reported at least 3 times in a single virtual path. Records of the traceroutes showed that 0.13% of those from D1 and 0.16% of those from D2 experienced persistent routing loops. One reason for the existence of a loop is that a particular routing change, (possibly avoiding some temporary outage) has not yet been propagated to all routers. The loop will exist for some period of time until the participating routers have updated the information in their routing tables. Some of these loops were observed to exist for many hours.
The pathology of fluttering was also studied. Fluttering refers to rapid fluctuations in the path of particular route. Those routers that exhibited significant fluttering were observed to route messages to the same destination, at different times, using completely different routes. Figure 3 of Paxson’s report illustrates one of the most dramatic instances of fluttering that was observed. Fluttering can cause unstable network paths. It can adversely affect routing symmetry and introduce error into calculations for round-trip message routing times.
Infrastructure failures were recorded when a router identified a host as "host unreachable". Such occurrences were responsible for a small reduction in the overall availability rating of the Internet. For D1 the availability rating was 99.8%. For D2 the availability rating was 99.5%.
The path length in number of hops was recorded of each route. The results indicated an increase in the operational diameter of the Internet from 1994 to 1995. The mean path length for D1 was 15.6 hops. The mean path length for D2 was 16.2 hops. In D2 there were six instances where the path length exceeded the maximum allowed by the traceroute program. This also indicates an increase in the Internet's operational diameter.
Erroneous routing was observed to occur in only one surprising instance. A message that was intended for a site in London ended up in Israel.
End-to-End Routing Stability
Two different aspects of routing stability were explored. These are prevalence and persistence. Prevalence is a measure of probability of observing a particular path for a route. Routing persistence is a measure of how long a route is likely to remain unchanged. Route stability was assessed based on the data from D2 because it allowed comparisons on two time scales. Before the route stability was assessed, the data was reduced to eliminate routing events that involved pathologies and those that experienced minor route fluctuations between tightly coupled hosts in a localized network.
The analysis of prevalence focused on the existence of a dominant path for a particular route. For analysis of routing prevalence a steady state probability value was calculated for every virtual path, for each route. This value is equal to the number of occurrences of the path divided by the total number of path measures for that route. The data was compared from three perspectives; at host granularity, at city granularity, and at AS granularity. The data for host granularity showed that, for half of the virtual paths measured, the same route was observed 82% or more of the time. For city and AS granularities the percentages were 97% and 100% respectively. This can be seen in Figure 6 of Paxson’s report, which shows a plot of the cumulative distribution of the prevalence of the dominant route. Most of the variation that occurs in routing is at the host level.
To define routing persistence the data was analyzed to determine how frequently the routes changed. The data was systematically analyzed to ultimately show that 82% of the end-to-end route pairs experienced, on average 1 routing change every 1.5 days. This is considered a low rate of oscillation. Two-thirds of the routes studied were considered to be "quite stable".
Routing Symmetry
Routing symmetry refers to a comparison of the two, one-way virtual paths between two hosts. Ideally, a routed message on the Internet would follow the same path on the outgoing and incoming trips between two hosts. Routing symmetry is important because some timing dependent communications are based on the assumption that the route is symmetric. Conditions of asymmetry can cause errors in timing and synchronization calculations.
Only D2 data was used in the symmetry analysis. Most of the routes in D2 were paired (A to B, and B to A), thus allowing for a comparison of the hosts that were routed in both directions. Of the 11339 successful pairs of measurements, 49% observed an asymmetry in the route where at least one other city was visited. The majority of these asymmetries were confined to a single alternate hop in the virtual path.
Summary of Results
The three major findings from this study on Internet routing behavior are summarized below.
The variation in the "end-to-end" route measurement data reported in this study emphasizes that there is no "typical" Internet path or Internet site. The Internet is a vast and diverse environment that has no doubt changed significantly in the years since 1995. Current Internet routing behavior should be studied. A comparison of current data with Paxson's data from 1995 would be instructive and perhaps useful in making predictions about the Internet routing issues that will be faced in the future.