A3-CalvinArdiSimonTan

From CS294-10 Visualization Fa08

Jump to: navigation, search

October 1, 2008

Contents

Data Domain

Idea

Map of the Internet: IPv4 Allocation

We'd like to turn xkcd's Map of the Internet, a map of the IPv4 allocation space, into an interactive visualization similar to SmartMoney's Map of the Market.

The data domain is a set of IPv4 addresses (32-bit numbers separated into 4 octets, such as xxx.xxx.xxx.xxx where xxx ranges from 0-255) mapped to a Regional Internet Register (RIR) as allocated by the Internet Assigned Numbers Authority (IANA). RIRs delegate, according to their regional policy, subsets of their resources to Internet Service Providers (ISPs) or clients. Thus, any allocated IPv4 address can be tied with a regional registrar, company (ISP or otherwise), or individual.

We'd like to visualize this mapping so that we can see the corresponding relation of IPv4 space (i.e. a specific block) and who/what that space was allocated to (i.e. organization X).

Data Sets/Resources

In the end, we took the IANA records of the /8 subnets and created a custom data file, available here: Media:slash8.txt

Frustrations

We were unable to find publicly available ARIN bulk data on the web; apparently a form needs to be filled out and sent by postal mail (!). In addition to this rather arduous task, it doesn't seem like they like giving the database out, as evidenced by this student.

All RIRs do provide a web query interface, but it would have been ideal to have the entire database for processing.

Application Description

blog.icann.org - ip allocations
blog.icann.org - ip allocations

Related Work

As stated above, this idea was heavily inspired by xkcd as well as some spin-off work done by Kim Davies of the Internet Corporation for Assigned Names and Numbers (ICANN). xkcd's work presents information in a clean and simple format and gives the general idea of which IP space is allocated and which isn't; however, at some points, there is a lack of needed detail. For example, we can easily tell that MIT owns the 18.x.x.x prefix, yet there is a huge area marked "Various Registrars" on the map with no further information. One might wish to drill down and discover who really owns that region of IPv4 space.

Similarly, Davies' rework presents a more accurate picture of IP address allocation, but lacks the detail (as presented in the blog) and labeling of xkcd's work. Colors are used to indicate nominal variables in this case: geographical regions that blocks are allocated to.

The Internet Mapping Project is a now-defunct visualization project on the topology of the Internet.

The IPv4 Address Report has a good amount of data, but is more focused at the growing scarcity of IPv4 allocation blocks and attempts to estimate the date when there will no longer be any more available IPv4 addresses.

Application

Given that we are talking about the Internet, it only seems right to produce an application that is easily usable and accessible on the Internet. That said, it seems most likely that prefuse flare will be the visualization tool of our choice. There will be a lot of interaction with this visualization that's particularly suited for the mouse (and potentially user input from the keyboard).

Our visualization will aid in the mapping between IPv4 blocks and their 'owners' by providing a zoomable interface that will allow drilling down through 5 levels of subnets to see any of 5 levels of granularity of IPv4 space. The interaction model will be simple: Double-click (or roll a mouse scroll wheel forward) to zoom in on a particular subnet, revealing all of the subnets beneath it. Click a zoom out button (or roll a mouse scroll wheel backward) to zoom out to a wider perspective. Hover over any block to get a tooltip with more information (varies) about that particular subnet.

At each level, we will encode the blocks' 'owners' with color. The 'owners' will be highly generic at the lower zoom levels (e.g. "Europe", "Japan", other regions, or top-level organizations) but become increasingly specific as the user zooms in (e.g. BBC, AT&T, MIT). In this way, one can observe patterns in IP address allocation at multiple levels and with different perceptions of "ownership". This is discussed further below.

Visual Hierarchy and Color Encodings

At each level, blocks will be encoded with colors to visually group them by nominal values; these values will change between zoom levels. For example, the IP address representing BBC.co.uk may be one color signifying "Europe" at the lowest zoom level, a different color signifying the "RIPE" Regional Internet Registry at a higher zoom level, and eventually its own color to signify the domain name itself at the highest zoom level.

A level would be defined by the subnet using the CIDR prefix. Color encodings on each level could be:

  • /0: Geographical region, organizations (xkcd view)
  • /8: Regional Internet Registries
  • /16: Domain registrars or companies
  • /24: Domains, server types, or companies
  • /32: One domain URL, if one exists

We are using different sets of encodings for each zoom level simply because none of these encodings could result in a useful visualization at all zoom levels and not all of them make sense for all IP addresses. This may cause a problem if we cannot find data for each of the sets of encodings for all IP addresses; hence, we expect to have many blocks with "no data available".

Interactive Visualization Techniques

Using visualization technique: Treemap

Strengths:

  • Shows a lot of data in a small space
  • Shows hierarchical data well
  • Allows for fractal-like zooming (one element could hold an entire other treemap)
  • It's included as one of the Flare demos

Since this application will be based on SmartMoney's Map of the Market, we will similarly make use of the treemap visualization technique. xkcd's original comic (Map of the Internet) also lends itself to use of this technique, as it demonstrates that the distribution of IP addresses is highly space-constrained in a visual sense. In fact, the sheer number of possible IPs is so vast that xkcd could only make the /8 subnets clear within the size of one comic. What xkcd does not show is how IP addresses are divided within each /8 subnet, which is something we aim to do with our application.

This introduces the point that IP address allocation is also hierarchical by the specification of the Internet Protocol. The treemap technique is naturally conducive to use with this hierarchical data set; it will allow comparison between IP address blocks at any subnet level, /8 all the way to /32 (individual IP addresses) with a zoomable interface.

There is a question of what to encode with the treemap's block sizes. In the Map of the Market, each block's size encodes the market capitalization of the stock it represents. While it would be interesting to have the size of each block in our visualization represent the internet traffic, market capitalization, or some other interesting statistic of each subnet, we simply don't have these data for every subnet in existence (and for some, it's illogical). We have decided to have the size of each treemap block simply represent the number of IP addresses in that subnet, as xkcd does.

Hence, one downside of using the treemap for our domain is that all the visible blocks will always be the same size. (All IP subnets contain the same number of IP addresses.) Size as a visual variable will not be really encoding anything interesting; rather, it will simply serve as a reminder that all blocks represent the same quantity of IP addressses. We feel this is all right, since we can use coloring of contiguous blocks to encode other data (e.g. how xkcd uses color to indicated allocation status or registar ownership).

Storyboard

The visualization begins with an overview of the IP address space divided into the /8 subnets, nearly identical to xkcd's famous comic. By hovering over each subnet (treemap element), a tooltip with some information will be shown.

Visualization overview and tooltip

Users can double-click (or roll their mouse's scroll wheel forward) while mousing over a subnet block to zoom into that subnet, seeing all the /16 subnets underneath it.

At the /24 subnets

Notice that a "Zoom Out" button appears on the screen in order to provide a way to zoom back out again. The user can also zoom out by rolling their mouse's scroll wheel backward.

The user can continue to zoom in until they reach a particular /32 'subnet' (a.k.a. a single IP address).

A single IP address

Implementation

Final Application

As inspired by xkcd, a Map of the Internet was developed using Flare to provide an interactive and "birds-eye" view of how Internet Protocol (IPv4) addresses are allocated. The visualization currently can toggle between two sets of color codings: one describing each IPs allocation status (allocated, unallocated, etc.), and another depicting each IPs owning company or registry that is in charge of delegation (there are 5 regional ones and many individual organizations). whois lookups for more information regarding the particular subnet can be done by clicking on an individual IP block.

The source code can be downloaded here: Media:A3-CAST.tar.gz

Run MapOfTheInternet.swf or open the HTML file in the root of the archive to launch the application.

Building

The source code is included inside the tarball. Flare and Flex are required for building the application; thus the appropriate builder and its libraries are required. The application that should be run (when using "Run Application" in Adobe Flex) should be MapOfTheInternet.as.

Usage

Usage is simple; a grid is displayed with a legend indicating whether or not each /8 subnet is allocated, unallocated, or reserved for other purposes. Hovering over an individual block will give you a tooltip detailing the particular /8 subnet, which Regional Internet Registry (or legacy company) is in charge of delegation, and a text label indicating its status. Clicking on an IP block will perform a whois lookup and provide more information about the subnet; note that we are not corresponding IPs to domain names (something that reverse DNS does) but IP subnets to their owners/delegators.

The starting Legend displays information about each IPs allocation status. Clicking on the Legend will provide an alternative legend and change the color encodings on the visualization, indicating which Regional Internet Registry (RIR) or company owns the delegation rights to each particular subnet, as well as any reserved or multicast addresses. The functionality of clicking on a block to perform a whois lookup is the same, regardless of which Legend is being used.

Double-clicking or Scrolling the mouse wheel forward on a block will trigger a zoom in. You can zoom in all the way to a single IP, but the color encodings are not guaranteed to persist beyond the first level. You can click on the Zoom Out button in the upper left to zoom out a level, or scroll the mouse wheel backwards. Zooming is a feature that could have more potential if we had more detailed data about lower-level subnets. (See below.)

Changes in Plan

We found that access to data that details more than just the /8 prefix requires a rather non-trivial task of applying to each RIR (if each of them have such an application; ARIN, the North American registry, does) for access to "whois bulk data". Despite the open nature of the Internet, this data is not freely available or distributable to anyone. For a /32 prefix, individual reverse Domain Name Service (DNS) queries could be made, but in general not much useful information can necessarily be gleaned off of that when compared to whois information.

With that setback, our perhaps overly ambitious plan of levels for each major prefix was limited from the beginning. Even doing a reverse DNS lookup for an IP address, for example, would not necessarily lead to a working website (or even all the domains; a domain to IP mapping has the possibility of a many-to-many relationship). However, despite these setbacks, our main goal was visualizing xkcd's static graphic and making it interactive in some fashion.

Division of Labor

Documentation of assignment 3 was split evenly, with both students contributing and modifying parts to the initial and final documentation.

Simon spearheaded the initial development of the Map and worked heavily on the initial design and implementation of the storyboards. He focused on much of the interaction and colorization coordinated with multiple legends, and also worked on fixing bugs and adding other features (modification of data formatting and parsing, labeling). He also put considerable effort into making the project extensible, so that if we were to attempt to expand it with more legends or more levels of data, it would be relatively easy.

Calvin worked on acquiring and formatting the data, setting up version control, modifying the original TreeMapLayout.as to make it work to our liking (the original used a particular algorithm that was not necessarily suited for what we wanted), and fixing bugs and adding features to the code (labeling, data parsing, whois lookups).

Commentary

The initial development (brainstorming an idea, storyboarding a visualization) done in part one took approximately 4-5 hours.

Development of the application took much longer than anticipated - about 40 man-hours. While we found that flare provides immensely useful library classes and methods, they were oftentimes not sufficient for something we needed to do. We would often find ourselves implementing custom classes or editing library code directly in order to get what we wanted.

For example, in our visualization, we wanted the "TreeMapLayout" look, but needed some coherent sorting among the leaves so it would be easy to navigate the logical ordering of IP addresses. However, due to the way TreeMayLayout is implemented, we couldn't have what we wanted. We needed to reimplement a CustomLayout class based off of TreeMapLayout to get our sorted squares.

Similarly, we wanted to be able to include the ability to use a mouse scroll wheel for zooming. There was no provision in the flare libraries for control with a scroll wheel (and knowing which direction it scrolled in), so we had to write our own.

Modularization of our code was difficult, because it seemed that all the UI had to be tied to events which trigger functions that affect other UI. In order to separate some of the logic (e.g. file loading), we had to use EventListeners extensively.

It was these diversions away from the main code base that took the most time. If we had a more typical data set and wanted to visualize them in a way flare/prefuse was designed for, we are certain it would have taken considerably less effort.



[add comment]
Personal tools