DISCUSSION TOPIC
"SIDE BY SIDE: DATA ANALYSIS ACROSS FILMS"

BACK TO THE DISCUSSION BOARD
Posted by: Keith Brisson Date: 2011-03-14

Recently, Yuri and I have been focusing our Cinemetrics efforts on two fronts.

The first, and more daunting issue, is that the data used on the website is of varying quality. That is, it is not necessarily completely precise, accurate, or complete. We seek to define what these three terms mean, in the context of the goal of the project and analysis of cutting rates. Unfortunately, this work has been moving very slowly, since it is not a trivial task.

The second issue we've been addressing is the exploration of new analysis techniques. Specifically, I have been working on ways to overcome a limitation of other analysis techniques used on the Cinemetrics data. Previous researchers have focused on one number when making comparisons between films or groups of films: the average shot length (ASL). This is not an unreasonable statistic to use for comparing two films; it provides a broad summary of how short or long the films' shots are. However, it tells us nothing about the distribution of short and long shots over the length of a film.

Other researchers--Barry Salt is a prominent example--have attempted to describe the relative proportions of different length shots within each film. That is, they have attempted to describe distribution of shot lengths by comparing histograms of shot lengths for various films to histograms produced by distributions, such as the lognormal distribution. Although this technique may have use, it still is summarizing data over the whole film. That is, it does not allow for comparison between two films at different points in their respective lengths. For example, we cannot compare two films using this technique and say "both films had approximately the same cutting rate in their first halves, but film A had significantly shorter shots in its second half than film B."

Analyzing films by looking at changes in shot lengths across their lengths has long been a goal of the Cinemetrics project. To this end, Yuri and Gunars developed their inverted shot length versus shot number graph. This graph has many useful properties, and the use of an inverted y-axis was innovative. They continued to develop this technique by introducing features such as a moving average graph, to smooth noise, and estimation using high-degree polynomials, in an effort to numerically describe the shape of these graphs and make comparisons.

There are two primary issues with the shot length versus shot number graph. First is that it is not necessarily intuitive. Because the x-axis is shot number, not time, looking at the middle of the graph does not necessarily mean looking at the middle of the film. In fact, suppose we had a film of 100 shots. The 99 shots occur within the first minute, and the 100th shot is 99 minutes long. Looking at the middle of the graph, we expect to be looking at minute 50 of this 100 minute film. However, we are actually looking at the 50th shot, which occurs somewhere in the first minute. Cinemetrics addresses this by labeling the x-axis by time, not shot number, and allowing the intervals to not be regular. That is, the non-linear time scale is clearly labeled. Unfortunately, using shot length versus shot number has another side effect: it underemphasizes the importance of long shots and overemphasizes the importance of short shots. In our hypothetical example, 99% of the graph will represent 1% of the film, while the only data point we have for the rest of the 99% of the film is one shot length at the end of the graph.

The ideal solution is to somehow plot shot length versus time. This poses several problems. First: the Cinemetrics data is composed of shot number-timecode pairs. Thus we can calculate shot length and time code for each shot. Simply plotting these data points using a scatter plot is a potential way of visualizing this information. However, this still underemphasizes long shots. In our hypothetical example, we would also have no data points for minutes 2-99 of our film, though we intuitively know that it's a long shot, and somehow our data should represent that fact. The second problem with this technique is that films vary in length. It is difficult to make comparisons between a 50-minute film and a 100-minute film.

One potential solution I have devised is to generate a shot length versus fraction of film dataset. By partitioning each film into the same number of equal-length segments we can calculate the average shot length for each segment, regardless of whether or not there was a cut in that segment. For example, we can divide our hypothetical 100 minute film into 100 equal partitions. For each segment, we can calculate the number of shots that occur in each partition. In our example from above, there were 99 shots in the first minute, which in this case happens to correspond to the first partition. So, partition number 1 has 99 shots. We know that this partition is one minute long, so by dividing 60 seconds by 99 shots we calculate that the average shot length in the first partition is about 0.61 seconds. In this way, 0.61 seconds per shots becomes our data point for partition one. Similarly, we look at partition two. We know that 1/99th of a shot occurs in partition two. Therefore the average shot length during the second partition is 60 seconds divided by (1/99 shots) equals 5,940 seconds per shot, the length of our 99 minute shot. Using this method of calculating fractional shots per partition we can continue through every partition in the film, yielding one ASL per partition.

Suppose we have a second film of a different length. We can use the same partition, count shots, then calculate ASL method for each partition in the new film. As long as we use the same number of partitions, the values between the two films are directly comparable. That is, if we use 100 partitions, the 57th partition for each film always represents the time period 57% of the length through the film, regardless of film length. Therefore, using this method we can directly compare the shots lengths across the durations of two films of different lengths.

We do have a few reservations regarding this method. First, if we are comparing two films of dramatically different lengths, say 10 minutes and 100 minutes, then the shot length increasing significantly from one partition to the next means very different things in the context of each film. In the shorter film, it means that change occurred over 6 seconds; in the long film it means that change occurred over 60 seconds. That is a severe limitation of this method. Therefore, it is important to only use it to make statements such as "in both of these films, the shot lengths in their second halves is about twice the shot lengths in their first halves", and to not make statements like "these cutting rates of these films increase equally rapidly". Second, we don't know whether or not it is valid to calculate the average shot length for a partition in which no cuts occur. In one sense it is intuitively valid: even a partition is entirely contained within a shot, the longer that shot is the "slower" it is, and our data should reflect that. Nevertheless, it is a concern.

Using this technique, we have generated ASL by partition data for every film in the Cinemetrics database. We are experimenting with comparisons, both between films and between groups of films. In fact, using this technique we can even generate a "curve" that represents shot length versus fraction of film complete for the average film. That is, given a group of films (or the whole database), we can average the ASL in each partition across all films in that group. Performing this for each partition yields an "Average ASL versus fraction of film" data set. Though these are discrete values, and not a continuous curve like the ones generated by the polynomial interpolation that Cinemetrics currently uses, the result is a novel description of how a group of films tends to change over time. Yuri has generated an example application and analysis and will be releasing that for comments soon.

We have our own reservations regarding this technique and would appreciate any feedback. We believe, however, that this will open Cinemetrics' data to new analysis techniques, particularly those that rely on having sequences of data measured at uniform time intervals--ie time series and other advanced techniques. Input from statisticians on the validity of this method would be particularly helpful. Thanks!                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

Replied by:Barry Salt Date:2011-03-20

I think you should give this new sort of average shot length a special name, since it is quite different to the ordinary sense in which "average shot length" has been used up to this point. Maybe "Average partition shot length", or something else distinctive.

Replied by:Barry Salt Date:2011-03-20

Or maybe just "partition shot length".

And how do you calculate this quantity when there is one cut inside a partition, but neither of the shots before and after the cut are completely included in the partition? Do you add the fractional values for each of them together to get the final fraction for that partition?

And the partition graph for the long Griffith films in the labs looks a bit like white noise to me. Maybe you should check it for that.

 

Replied by:Keith Brisson Date:2011-03-21

I agree; a name would be good.  It's difficult to come up with an elegant, concise term for this.  However, for now I think that "partition shot length" as Barry suggested, or perhaps "partition ASL" would work.  It's something to think about.

The value for each partition can be fractional.  In the example Barry provided, where there is one cut entirely contained within the partition, and portions of the shot before and after, we would add 1 (for the shot entirely contained), the fraction of the previous shot contained (so a value between 0 and 1), and the fraction of the next shot contained (again a value between 0 and 1).

This can be generalized to when there is more than one shot within each partition.  The value for partition N is:

(# of shots entirely contained within partition N) + (fraction of shot that spans beginning of partition N that is contained within partition N) + (fraction of shot the spans end of partition N that is contained within partition N)

If one shot entirely contains the partition (ie, it's long than the partition), then the value is the fraction of the shot that's contained within the partition.

So far, we have not ventured into more advanced analysis of the resulting data, like comparing the data to white noise.  I agree we should check for that.  We'll head towards that in the future, but for now we're working on the fundamentals.

Replied by:Yuri Tsivian Date:2011-03-21

This reply address the problem of unequal representation of long versus short shots on the X-axis which Keith raised. Yes, it is true that the shot length are represented as different on the Y-axis but appear as if different shots took the same amount of time on the X-axis. Here is an old example of how this problem was solved by Vsevolod Pudovkin in 1938 as he was editing his film Pobeda (Victory). This is Podovkin's preminary graph for one of the sequences in this film. As you can see, he represented longer shots "proportionally" to their actual length.

Replied by:Yuri Tsivian Date:2011-03-21

As we can see in the above chart adduced in Kuleshov's 1940 handbook on film direction, thew upper row is for shot numbers, the bottom row gives their length in seconds. The width of each column depends on the number of seconds allotted to each shot. This is something our graphs ignore; indeed, what would happen to trendlines if they didn't?

Replied by:Yuri Tsivian Date:2011-03-21

This reply is with regards to the name of the new method Keith proposes. Barry and Keith came up with "partition ASL" or "partition shot length." Are these terms adequate enough to what the method does? When we say ASL what we usually mean is the arithmetic mean of all the shot lengths in a film expressed as one number per film. For instance, Darby O'Gill and the Little People: (7) ASL 4.4, period. What the graph sparklinized in red shows is not ASL but rahter the variations of ASL over the duration of the film; so each film represented on Cinemetrics is already a summary of an array of ASL's. Now, what the new method does is to create a summary of such summaries, a curve of curves, "partition curve," or "partition average curve" (PAC) perhaps?

Replied by:Yuri Tsivian Date:2011-03-21

A correction to the above: I said "each film represented on Cinemetrics is already a summary of an array of ASL's" which is incorrect. What I should have said is: each film represented on Cinemetrics is a trace of changes summarized as its ASL.

Replied by:Yuri Tsivian Date:2011-03-22

Why don't we call the method of averaging by partitions the "Brisson curve"? There is a Gauss (or Gaussian) curve; there is a tradition in statistics (as elsewhere) of calling methods by names of their inventors. I will be calling this, as I will be calling the "Salt distribution" Barry Salt's discovery desbribed in the last section of Salt's essay "The Metrics in Cinemetrics."

Replied by:Barry Salt Date:2011-03-22

Yuri, curb you enthusiasm! I know you meant well, but you MUST NOT call anything a "Salt Distribution". It would make me a laughing-stock among mathematicians and scientists (if they happened to notice it). I would be both embarassed and angry. Because I have NOT discovered any new theoretical statistical distribution. I can only assume you are referring to the final graph showing what is an EMPIRICAL distribution for the lengths of the intertitles of one silent film. It is unique, as the intertitle length distributions for the other  two films, which I did not illustrate, are different, though having a very vague similarity.

Replied by:Keith Brisson Date:2011-03-23

I agree with Barry: I would strongly prefer that this is *not* called the Brisson curve.  One possible approach to naming would be to consider that it is a profile of shot lengths.  So, something like the "discrete shot length profile" would be a very descriptive and accurate name.

 

Yuri, I'm glad you posted that chart by Podovkin.  Stretching the x-axis by the shot length will resulting in a "curve" that is dramatically different that what Cinemetrics normally generates.  The results will look close to what the partitioning method generates.  In fact, as the number of partitions goes towards infinity, the graph will look more and more like Podovkin's chart, with the shot length on the y-axis.  With infinite partitions, the y-axis at any point in the film's duration is the length of that shot at that point in the film.

 

It's interesting to note that at the other end of the spectrum is the one partition case, in which over the whole length of the film the y-axis would equal the film's average shot length.

Replied by:Barry Salt Date:2011-03-23

The Pudovkin graph is a diagram of what he would see looking down onto his editing bench while working on "Pobeda", with the picture track (top) and sound tracks (Music second from top) running left to right through a synchronizer. It is also the same as  the representation of the tracks on the screen of a Non-Linear Editing system. Keith Brisson's "partition plot" or "partition graph" does, as he says, go over into the top line of this as the number of partitions increases. However, there is an upper limit to the number of partitions (that make any sense) equal to the total number of frames in the film.

For Keith Brisson's graph of Griffith one-reelers in the Labs section, the initial steep rise of the aggregate graph is an artefact of the partition process, resulting from the fact that nearly all Griffith one-reelers have less than 100 shots in them. However the dip might be a real characteristic of the structure of the films theselves. This is interesting. However it eventually needs checking against a number of one-reelers from 1909-1913 made by other American film companies to see if it unique to Griffith.

On naming of laws, equations, distributions, etc.: what happens is that after one of these things has proved its great usefulness over a considerable period of time, the grateful community of mathematicians and scientists start calling it by the name of the person who proposed it. Hence my comments above.

Replied by:Yuri Tsivian Date:2011-03-26

All right, all right, gentlemen. I won't take your names in vain.

In me discussions with Keith Brisson we did not exclude a possibility that the initial steep rise might have have been a side-effect of the paertition method itself, but the longer we look at it the less we think it is. At least, there there is no obvious reason Keith has discovered so far for the method to misfire this way. That's why our next step is to recount a comparable sample of Gritthith's Biographs with and without intertitles and se if perhaps expositary titles at the beginning of the film may have caused what we at first perceived as as "anomaly."

Barry's "artefact" hypothesis mentions the fact that Grifith's one-reelers seldom exceed 100 shots. This may well be the fact; still, we need to know how this fact could have been the factor behind the steep "nose dive" of Brisson's graph.

Speaking of the "100 shots limit." Barry is right is saying that most of Griffith's Biographs is under 100 shots long, but looking at the cinemetrics database we can specify this claim. There are no "three-digit" Biographs found on the database prior to 1911. Starting from 1911, the list is as follows (I did include the occasional two-reelers in this list):

1911
ADVENTURES OF BILLY, THE (1911, USA) 106
BATTLE, THE (1911, USA) 113
ENOCH ARDEN (1911, USA) 120
FIGHTING BLOOD (1911, USA) 101
SWORDS AND HEARTS (1911, USA) 114

1912
AN UNSEEN ENEMY (2ND ATTEMPT) (1912, USA) 130
BILLY'S STRATAGEM (1912, USA) 111
GIRL AND HER TRUST, THE (1912, USA) 138
LESSER EVIL, THE (1912, USA) 105
MENDER OF NETS (1912, USA) 108

1913
BATTLE AT ELDERBUSH GULCH, THE (1913, USA) 237
DEATH'S MARATHON (1913, USA) 122
MOTHERING HEART, THE (1913, USA) 163

 

Replied by:Keith Brisson Date:2011-03-26

I'd like to elaborate on what Yuri and Barry said regarding the possibility that the initial steep rise in the Griffith films is an anomaly.  I think one thing I didn't fully touch on in my initial explanation was the fact that what we are graphing is average shot length within that partition, not number of shots counted within a partition.  We do calculate the number of shots within each partition.  If we were to then graph this as number of shots per partition versus shot number, the y-axis would indeed change based on number of partitions.  However, what we are doing is calculating average shot length in each partition by dividiving the partition length by number of shots in that partition.  This division makes the value invariant to partition length changes, although the value may change if the shots within the shorter or longer partition have a different average.

It's true that logically the shortest partition possible would be one frame.  However, since Cinemetrics uses a timecode based on tenths of a second, not frames, we're doing everything based on time, not frames.  Therefore the shortest logical partition would be one tenth of a second, but there's nothing in the math that limits it to that.

As Yuri mentioned, I'm open to the idea that the initial steep rise in Griffith is a result of the technique, but I see no obvious reason.  It's also worth noting that what the partitioning method generates for each film is a time series - a value collected at regular intervals.  We *might* therefore have the opportunity to use all of the techniques available from time series analysis.  I emphasize the might because I'm not sure if the partitioning method behaves in a way that is incompatible.  What would be useful is looking for a technique that finds common patterns among multiple time series.  Currently we just averaged the value across all the series for each time value to get the "Average ASL by partition" graphs, but there may be a better way to do this.

Replied by:Barry Salt Date:2011-03-27

One quick way to check whether the shape of partition shot length graph for Griffith one-reelers is an artefact of the technique is to randomize the sequence of shot lengths for one or more Griffith one-reelers, and then use your method on them. There are algorithms for producing a random permutation of a list, but the lazy man's way (mine) is to use the "List Randomizer" at www.random.org

Replied by:Barry Salt Date:2011-03-27

Wait --- there is a simple way to do it in Excel. Paste the list of shot lengths into the first column, and then fill the next column down to the last entry with "=RAND()", and then do a sort over the two columns using the values in the random number column.

Replied by:Viktorija Eksta Date:2011-04-13

Hallo everybody! Sorry for a small interruption, but I wanted to comment on a possible weak point in this study - it is difficult to know author's intended projection speed in case of Griffith films. It could vary in different dvds and even in different scenes within one film. This can produce a final sample with unrelaible timing whereas Pudovkin's graph posted above refers to a sound film with a fixed speed. Maybe it is a good idea to test this method on  a more reliable sample of sound film?

Replied by:Yuri Tsivian Date:2011-04-14

Point well taken, Victoria. "What was the projection speed for silent film?"  is a hard question to answer, for answers vary from case to case. The best attempt to tackle this remains "What Was the Right Speed?" by Kevin Brownlow (1980) , you may know this work.

In our case, however, the "right speed" problem matters less. What we are looking at is the way shot lengths oscillate withing the duration of this or that film, and unless we assume that the projectionists changed the speed while projecting a single movie (which is not an absurd assumption by itself, see Brownlow's essay) we are on the safe ground. Remeber the relativity theory by Einstein? Its basic postulate applies to cinemetrics as well.

Replied by:Nick Redfern Date:2011-07-07

I've just added a post to my blog looking at 15 BBC news bulletins from the week beginning 11 April 2011.

It includes a first attempt to analyse shot length data of different motion picture side-by-side based on the clusters of short and long shots that may go some way achieving what you are trying to do with Griffith without the hassle of calculating averages across different groups of films. Basically, it looks at where clusters occur and puts these side by on a normalized horizontal axis. This makes it easier to see where similar feature occur; and could be used, for example, to see if Griffith does have a cluster of rapidly edited shots at simialr points in his films.

 

I have also added the shot length data for the 15 news bulletins to this post as an MS-Excel file.

Replied by:Barry Salt Date:2011-07-17

James Cutting and his collaborators have just published a paper that seems to have some sort of relation to the technique Keith Brisson is using.

How Act Structure Sculpts Shot Lengths and Shot Transitions in Hollywood Film

Authors: Cutting, James E.; Brunick, Kaitlin L.; Delong, Jordan E.

Source: Projections, Volume 5, Number 1, summer 2011 , pp. 1-16(16)

While grappling with reproducing the algorithm for doing Brisson-type calculations, it occurred to me that what Keith is really doing is determining the number of shots per unit time, or rather normalized unit of film duration. This can be done more simply, and generates what could be best called "Shot density" (in time).

 

Replied by:Barry Salt Date:2011-07-17

To be cruder and more explicit, "Shot density" comes in "Shots per minute", and could be called the "Cutting rate", and is the inverse of the ASL (Average Shot Length).

 

Replied by:Barry Salt Date:2011-08-26

If you want to see this suggested new measure of "shot density" in action, I illustrate a graph of it for Westward Ho (1935), where it is calculated for each of 100 equal time partitions down the length of the film in the Keith Brisson manner. The value for each partition is simply the number of shots that end in that partition, so it is very simple to calculate from Cinemetrics type data. In this particular case each partition is 36 seconds long, but obviously the partition duration would vary for films of different lengths. I have added a 6th. degree trendline to the graph.


For comparison, here is a Cinemetrics-type graph of the shot lengths for this film, with the usual inverted y-axis and a 6th. degree trendline.


You can see that the shot density graph is quite close to the inverse of the Cinemetrics graph, with the highs and the lows in the same place and with the same relative size, allowing for their different x-axis resolution. (100 data points in the first, and 537 in the second.) This is also indicated by the almost identical shapes of the 6th. degree trendlines for both graphs. This close similarity happens because although the Cinemetrics graphs do not have an x-axis that is exactly linear in time, nevertheless it is fairly close to being so in general, with a drift from linearity of only several percent up and down their lengths. In using this method to get shot density, the number of partitions should be appreciably less than the number of shots in the film, otherwise you will get many zero values, which is not very informative or useful.
Keith Brisson's other idea for finding general shapes in shot length records by aggregating many films has been taken up in the new article mentioned above by James Cutting and his collaborators. This article contains a number of dubious ideas and results, so to check their results, I have used my new measure on the shot length data for the 50 films made from 1935 to 1955 in their results posted in the Cinemetrics database. In this case, I have used 200 partitions down the length of the films and normalized the resulting values by dividing by the average shot density for the film, and then added together the resulting values for each film in each of the partitions.
The y-axis showing the shot density values has been inverted, so that slower cutting (longer shots) are higher, and faster cutting lower on the graph. A 6th. degree trendline has been added as usual. The result looks like this:-


There is very little large-scale structure apparent, with the exception of the slow beginning, a slowing down in the cutting around three-quarters of the way through, and then a speeding up towards the end. There is no sign of the sharp spikes of slow shots at one-quarter, one-half, and three-quarters of the way through the films detected (or perhaps created) by James Cutting's analysis.
A check on six Alfred Hitchcock films in the sample gives a similar graph:-

The Hitchcock films start off a little faster, and slow down a tiny bit more before the final accelerando than the general collection, but that is it. So this does not look like a good way to isolate authorial characteristics.

Barry Salt, 2011

Replied by:NHL Jerseys Date:2012-01-27


Remember that these actions personalities will most likely be completely totally nothing not possessing the need from the lovers.

NHL Jerseys
mlb jerseys wholesale
NBA Cheap Jerseys
Replied by:Yuri Tsivian Date:2013-09-13

Sorry, the above postage is someone’s spam[1]


[1] Spams happen

Add a reply -- Your name: