A2-KesavaMallela
From CS294-10 Visualization Fa07
Contents |
[edit] Domain
The domain I pick is movies, particularly all time box office with ticket price adjusted for inflation. I want to ask these questions in particular:
- Q1. Is there a correlation between movie's IMDB rating and its box office performance?
- Q2. Is the movie industry increasingly targeting teens?
- Q3. Does a certain genre of movies dominate the box office? If so, which one?
[edit] Data Set
I use two data sets for the purpose of this analysis. First, I picked a list of top 100 all-time box office hit movies with figures adjusted to inflation. I also considered the unadjusted gross of these movies. Then I scraped information like IMDB rating, MPAA rating, number of votes, genre of the movie off IMDB's website using a ruby script with hpricot. I then merged both data sets.
[edit] Visualizations
[edit] IMDB rating and box office performance
This section tries to explore if there is a correlation between a movie's IMDB rating and its box office performance. There is a choice of two data sets here: (a) Adjusted gross of top 250 films on IMDB (b) IMDB ratings of top 100 all time box office adjusted to inflation. Both these data sets are not readily available. However I could generate (b) by merging two different data sources.
The following bar graph plots IMDB rating and sum of adjusted gross at a particular rating. There is clearly NO linear correlation between rating and performance. However, there are certain patterns noticeable. Box office performance peaked at an IMDB rating of 7.8. The next highest peaks are at 7.6 and 8.0. In essence, there seems to be a bell curve with crest at 7.8. It is interesting to note that the cutoff for IMDB top 250 is at 7.9, which means some of the most successful movies don't make it to the list. Although IMDB rating is a user generated rating, it is probably biased towards people with Internet access, who may have different tastes from the rest of the public. This may partially account for the absence of linear correlation.
The following graph plots movies with an IMDB rating of greater than 7.9 against unadjusted gross. The size of the marking is proportional to the number of votes they received on IMDB. The graph clearly emphasizes the fact that newer movies receive more attention than older counterparts.
[edit] More PG-13?
The next question I want to explore is if movie industry is increasingly targeting younger audiences for better box office performance. I use a box chart to plot MPAA rating against Year of the release of the movie. The graph lists the median of each rating below the bucket. While PG seems to be spread more or less equally across the years, G and PG-13 have specific time periods in which they are more popular. The median of G is 1963, but it has four outliers all of which are animations from Pixar and were made after 1990. The median of PG-13 is 2003 with 24 (out of 100 all time movies) in the bucket. Also, if you follow the reference lines across Y-axis, you observe that 1990s and 2000s are dominated by PG-13. Movie industry clearly seems to be targeting teens or at least making their movies teen watchable.
It is important we keep in mind the history of rating system before we draw any concrete conclusions. PG-13 was introduced in 1984 prior to which movies were rated with PG or R. Just like adjusting gross to inflation, it is important to adjust movies to cultural inflation and there by rating, to make more meaningful comparisons.
Considering just top-100 movies, further deters us from making definite conclusions that movie industry is targeting teens. Instead, we can conclude that PG-13 movies are increasingly successful at box-office to the point of replacing all other MPAA ratings except G and PG rated animations.
This Trellis plot tries to make many of the same points mentioned for the above graph. Movies rated PG are distributed over the years, but movies rated PG-13 fall after 1990. One additional piece of information it provides is a sense of their box office performance. G rated movies made before 70 seemed to have performed extremely well, while PG rated movies did well during 70s. There is no clear dominant rating post 90s, though PG-13 shows some signs of it.
[edit] Genre performance at Box Office
The following multiple scatter plot categorizes movies on genre and plots year of release against adjusted gross. The point of this exercise is to see if one particular genre dominates the box office. Action clearly rules the roost with consistent presence in all decades. Drama and comedy are other genres that have consistent presence. Genres like musical, mystery and western have no presence after 60s. Renewed interest can be seen in Adventure (Harry potter series) and Animation (Pixar movies).





