In our recent paper (Cutting, DeLong, & Nothelfer, “Attention and the evolution of Hollywood film”, Psychological Science, 2010, 21, 440-447; henceforth AEHF), we were interested in the shot structure in a reasonable sample of films across 70 years and in how that structure has changed. Our main conclusion was that the pattern of shots in more contemporary movies has come to match better the large-scale structure of human attention. In particular, a cut between shots forces the reallocation of attentional resources to the new scene or to the new vantage point, and the pattern of the length of those shots reflects natural tendencies of how attention ebbs and flows over as much as an hour and more. The structure of both attention and, more and more often, contemporary film appears to follow a 1/f pattern, sometimes called pink noise.
The 1/f pattern can be thought of as a simultaneous combination of many transverse, nonsynchronous waves whose up and down-ness, or amplitude (power is proportional to the square of the amplitude), is fixed by their wavelength (or the reciprocal of frequency, hence 1/f). Thus, the big (up and down) waves are also long waves (in time); smaller waves are shorter waves and proportionately so. For example, a relatively big wave might be exemplified by 60 consecutive shots that are relatively long followed by 60 shots that are relatively short, and with that pattern repeating; shorter and longer waves would also occur overlapping and within this original wave.
Blogged science reporting vs. science
Popular representations of science often skew things, and occasionally they simply get things wrong. Indeed, there were a number of howlers that appeared in the blogosphere about our research. As evidence, all one need do is to check out the title of the first popular piece on our work (http://www.newscientist.com/article/mg20527483.900-solved-the-mathematics-of-the-hollywood-blockbuster.html). Fortunately, the article’s content was more reasonable and more accurately reflected what we found.
Now Barry Salt has joined the mix, and we are quite happy with his comment on AEHF. Among all possible responses, one can ask for nothing better than to be “usefully provocative.” Against the misrepresentation by others, Salt (http://www.cinemetrics.lv/salt_on_cutting.php) rightly notes that we did not try to account for the box office success of movies. We didn’t even focus on the highest grossing movies, or attempt to use low grossing movies as a control group. Nor did we try to discern what makes a good film. In fact, we specifically reported that our results did not correlate with IMDb ratings of films in our sample. Salt also noted that our cut-finding process was more time consuming than perhaps necessary. Wary of errors often made in typical cinemetric-style cut finding (see Smith & Henderson, 2008, “Edit blindness,” Journal of Eye Movement Research, 2, 1-17), we deliberately sacrificed efficiency for accuracy, with multiple checks (digital and ocular) throughout the process. But in trying to account for our results Salt also raises two other issues to which we thought it important to respond. He calls them basic (about scene structure) and historical (concerning changes in ASL and shot length distributions). Let me consider them in reverse order.
Historical near parallels: Changes in ASL and in power spectra
Clearly, shot lengths have become shorter over the course of the 20th century and into the 21st. Salt updates his past data with an elegant graph in his commentary. In AEHF, we found that, since about 1950 or so, films have increasingly adopted a shot structure that approaches a 1/f pattern (pink noise). One might think these two are related – and indeed they are correlated. But there is no causal link between them.
Salt (Moving into Pictures, 2006) was first to note that the shot distributions of most films tend to follow a lognormal distribution and generalizing this he produced two new graphs in his commentary, one for Catherine the Great (1934) and one for Derailed (2002). In showing these graphs Salt is concerned about is what psychologists call a floor effect. That is, the average duration of shots may have decreased to a point where they can decrease no further without failing to give the viewer sufficient time to perceive shot content. When plotting and analyzing shot duration data linearly, as Salt and many others have done, this seems like a genuine possibility. However, plotted logarithmically, no floor exists.
What lognormality means is that if one took the logarithm of every value in the distribution and then replotted the data, the distribution would look normal – like a bell-shaped curve, more or less. Shown below are the log-scaled distributions for four films from our sample, two from 1935 and two from 2005:
Despite 70 years, fairly dramatic differences in ASL, and the great differences in number of shots per film, all four distributions look generally the same. The important point is that log-scaled shot-length distributions for films look normal and that normal distributions have no real floor (the logarithm of zero is undefined). Likely, as shot lengths in films get shorter, shots measured in fractions of seconds (not just seconds) will continue to be lognormal.
Our analysis in AEHF started by normalizing the shot distribution of each film. That is, the mean is set to zero (subtracting the ASL from each shot length) and the standard deviation is set to one (dividing the value above by the standard deviation of the shots in the whole film). This creates what is called a unit normal distribution, and it is a standard statistical procedure when comparing the shapes of distributions. This procedure alone would likely nullify the differences shown by Salt for Catherine the Great and Derailed, and it was on such unit-normal data that we first ran our Fourier and power analyses. But just to be certain, in AEHF we also performed that same normalizing analysis after log scaling shot lengths for each film. Results were the same in either case.
Thus, diminished ASLs cannot cause our power spectra results; ASL is factored out before the analysis is done. Also, we found no evidence in the changes in film distributions in our film sample as ASL diminishes, and we also renormalized the distributions before our analysis. Moreover, as counterexamples, consider The 39 Steps (1935) with an ASL of 10.8 s and a slope of .93 (1.0 is the slope for a 1/f pattern) and GoldenEye (1995) with an ASL of 3.6 s and a slope of .96; or consider Annie Get Your Gun (1950) with an ASL of 14.9 s and a slope of 1.18 and the Revenge of the Sith (2005) with an ASL of 3.57 s and a slope of 1.14.
Nonetheless, Salt rightly notes our power spectra results are still correlated with ASLs for our sample of films. It is just that neither has caused the other. One should then ask: What has caused the change in power spectra over the last 50 years or so? Our guess in AEHF was that there has been a kind of cultural transmission among filmmakers about what seems to work in shot composition and that this was at the root of the process. In other words, the increasingly 1/f-like pattern emerged from what would be collectively regarded by filmmakers as good film construction, but without anyone needing to know what 1/f is or really means. Another possible cause, one we hadn’t considered, emerged in my correspondence with Hollywood film editors after AEHF appeared. Editors now have much more film footage to draw upon than they did 50 and more years ago. Thus, they have many more choices they can make in composing shots in and across scenes. It seems possible, then, that the ability to make better choices has also contributed to the development of a 1/f structure in contemporary film.
Also, in discussion of the differences between Catherine the Great and Derailed, Salt also reported the Lag-1 autocorrelations for the two films (about .12 and .20, respectively) and suggested these would make a difference, perhaps contributing to what we found in our power spectra. These lag correlations map the length relations between Shots 1 & 2, 2 & 3, 3 & 4, and so forth along the length of the film. This is a good beginning but Lag-1 data alone can be misleading. The Lag-1 correlations for Goodfellas (1990) and Those Magnificent Men and Their Flying Machines (1965) are .33 and .30, respectively; but their modified autoregression indices (mARs) as we calculated them (autoregression looks at part of the partial autocorrelation function, which we smoothed), using data from Lags 1 through Lags 20 (Shots 1 & 21, 2 & 22, 3 & 23, etc., out to the end of the film) are 2.13 and 4.0. This means that the statistically reliable, local correlational structure across shots in Goodfellas is only about half that of Flying Machines, although their Lag-1 data were about the same. More simply, significant shot-length relations extend to only about 2 shots across the former film (aggregated across Shots 1 to 3, 2 to 4, etc.), compared to 4 shots in the latter (aggregated across Shots 1 to 5, 2 to 6, etc.). The complete autocorrelation function (Lag 1 to Lag n, where n is as much as half the value of the number of shots in a film) gives the record of shot relations across a whole film. The power spectrum, which we calculated for all films to derive our 1/f approximations, is the Fourier twin of the complete autocorrelation function.
Basic film units: Shots, scenes, and beyond
In AEHF we looked at shot structure across films without regard to scene structure. In his essay “Speeding up and slowing down” (http://www.cinemetrics.lv/salt_speeding_up_down.php), Salt performed a moving average analysis of ASLs in several films, particularly Ride Lonesome (1959). He found, not surprisingly, that different scenes in a given film have different ASLs. In a moving average window this creates a wavy pattern on a larger scale than that for shots. Salt also describes this in his comment on AEHF as a “’tension-release’ pattern” often found elsewhere, as in music. We wholeheartedly endorse this idea. More importantly, however, Salt’s moving average result exactly reflects part of what we found in the power spectra analyses.
That is, the Fourier and power analysis that we performed (doggedly and mechanically) looked for waves in the shot-length sequences of each film, where those waves could be of 2, 4, 8, 16, 32, 64, 128, 256, 512 shots long and sometimes longer. Notice that these numbers form a progression in powers of 2. They do so in order that the measured waves be completely independent. These waves are assessed within windows that move along the complete length of the shot vector (the list of consecutive shot lengths in a film). Thus, size-2 wave is fit serially to Shots 1-2, 2-3, 3-4, etc, out to the end of the film; the size-8 wave is fit serially to Shots 1 through 8, 2-9, 3-10, etc; and the size-512 wave is fit serially to Shots 1 through 512, 2-513, 3-514, etc. This can begin to look like a moving average analysis, which Salt endorses, but it is different. It looks at different independent window sizes and it does not average, but finds the best fitting sine wave within each moving window. Salt’s scene analysis of Ride Lonesome shows between about 5 and as much as 100 shots per scene, and with a moving average window he generates loopy waves that correspond to them. By our analysis, any wave with lengths in this range will contribute to the measured power in the size-4 through size-128 shot-length waves of the Fourier and power analysis. In particular, a 100-shot scene that contrasts with prior and subsequent scenes in ASL will contribute to both the size-64 and size-128 wave. In this way, the different-length scenes contribute power to the mid-range of the 1/f pattern.
What we think is more striking about our AEHF results, however, is that there are such waves in film that are considerably larger than mean scene length. That is, for a 1/f pattern to emerge, there have to be waves of similar-length shots coursing through an entire film that are in the range of 256, 512, out to even 1024 shots apart. In contemporary films this can be in a range from 10 to as much as 60 minutes. This places these waves well beyond scene length and roughly puts them at the scale of film acts, endorsed in different ways by Syd Field (1979, Screenplay) and by Kristin Thompson (1999, Storytelling in the new Hollywood). Remember, we introduced our results above in terms of allocating attention to film over as much as an hour and more; this involves “tension and release” at very different scales.
In addition, Salt and others have highlighted our result that action films tend to approach a 1/f structure more than the other genres we explored (adventure, animation, comedy, drama films). It is by no means the case, however, that action films always have close to a 1/f shot-length profile. We recently analyzed the James Bond film Quantum of Solace (2008). Despite its 1.71 ASL (trimming out the glossy credit sequence after the opening scenes), it doesn’t approach a 1/f structure. It fails to produce this structure precisely because it has few long-range similarities in shot-length patterns across the range of 512 to 1024 shots.
In summary and in response to Salt, (1) our power analysis is causally unrelated to ASL even though the two have developed more or less in parallel over the last 50 years or so, (2) we find no evidence for the change in shot distributions in popular films in our sample across time; they are all lognormal to reasonable approximation, and (3) the ASL differences he found in scene-length structure are contained within the 1/f-like patterns that we found, but we also found evidence for longer act-length structure as well. So, do we want to talk about the structure of film units – shots, scenes, and acts – or do we want to talk about 1/f structure? I would hope that there is room to talk about, and to learn from, both. I think that we can all endorse the idea that cinemetrics can be done in many ways.