DISCUSSION TOPIC
"ONCE AGAIN ON THE ACCURACY OF DATA"

BACK TO THE DISCUSSION BOARD
Posted by: Yuri Tsivian Date: 2013-09-13

The issue I propose to discuss here is, once again, that of data accuracy. Can we trust indiscriminately all data submitted to Cinemetrics? Emphatically, no. Which data are more trustworthy than others? This question crops up each time one of us want to use data submitted by others. I trust my own submissions or at least I know to what degree I can rely on their accuracy. I know Barry Salt, Charles O’Brien and their work; I may not know Armin Jaeger or Cid Vasconcelos personally but the number and quality of their submissions makes me confident in the reliability of their data. Our stock greeting to any newcomer to Cinemetrics is “Welcome aboard!” Yet when one comes across a submission like 2 and a half men: (7) ASL 4; Submitted by x on 2013-04-03”one starts wondering. The title is misspelt, the submitter is nameless, no metadata added, the running time 5 minutes off; what shall we do with stowaway submissions like that?  

There are measuring hands and measurement tools we trust more readily than other hands and tools. One benchmark is the amount of time invested. The more time a contributor to Cinemetrics has spent on measuring a film’s SLs the more credence we tend to give to the result. The Classic click-on-cut Cinemetrics Tool only takes the running time of the film to measure it, but the gain in work-time comes at a price: the human error factor (missing a cut or two) and human reaction time between the cut and the click create a margin of error. When the Classic Tool is used, one trusts old hands (or one’s own) more than one does a novice.

Other measurement tools and methods take longer time to complete. Typically, these will result in more error-proof data submissions that the Classic Tool would. Barry Salt’s set of submissions is a case in point. Using a NLE (Non-Linear Editing) system, and putting some sort of mark on each cut to get the shot lengths it takes Salt 3 to 12 hours per film, depending on the number of shots in it, according to what Salt says in his essay found here . Our own recently added F.A.C.T. (Frame Accurate Cinemetrics Tool) takes around 4 times the length of a film to measure it and submit to the Cinemetrics database.

The longest I know it has taken to measure the SLs of a set of films was by James E. Cutting, Jordan E. DeLong and Christine E. Nothelfer for their 2010 study “Attention and the Evolution of Hollywood Film”. As the article explains, the team used an elaborate three-stage procedure that combined an automatic system of cut detection with the human-time spent on verification of the former. [1] Finding cuts and verifying them took the Cutting team from about 15 to 36 hours per film.

In March 1911 Cutting and his collaborators kindly offered to Cinemetrics their set of data that consisted of SL measurements for 150 Hollywood films, 10 released in each of 15 years, every 5 years from 1935 to 2005. These are easy to find if you search the Cinemetrics database for “James Cutting” as a submitter.  Since then more than one statistical study has been conducted using the “Cutting” set of data, among them studies by Salt and Mike Baxter for the “On Statistics” discussion in the Measurement Theory page of Cinemetrics, and Nick Redfern, Baxter and DeLong in a set of papers published in Literary and Linguistic Computing. Since the “Cutting” data have been intensively used both by the Cutting team in their studies (in the “Attention” article and a series of later ones) and other statisticians, it makes sense to assess the reliability of the “Cutting” data found on Cinemetrics. The urgency of this became clear to me in the course of a working conversation with Mike Baxter about the relevance of the coefficient of variation in film statistics (unpublished). As I noticed in the course of this exchange, there are two full-scale measurements of The Great Dictator on Cinemetrics: Cutting’s and Charles O’Brien’s, the latter made using Cinemetrics’ Classic Tool: Great Dictator, The: (7) ASL 11.7 . Cutting’s measurement The Great Dictator: (7) ASL 15 proved to be very off for some reason: the actual film is shorter by some 20 minutes, and Chaplin’s “final speech” shorter by 30 seconds. To what extent the “Cutting” set of data can be said to be accurate?

[1] “After selection, films were manipulated from files in *.avi format stripped of their audio track. Each frame was stored as a 256- × 256-pixel jpeg file. Excluding all trailing credits and beginning credits without scenic content, the mean film length was 114 min (SD = 26 min), entailing a mean of about 165,000 jpeg files. We needed to divide the films into shots, but we were unimpressed with purely digital methods. Cut-finding algorithms often confuse motion across frames within a shot with spatial discontinuities across shots. They also do poorly with fades, dissolves, and wipes, which are common in films made before 1960 (Carey, 1974). … Such performance was inadequate for our purposes, so we devised a three-stage MATLAB-based (MathWorks, Natick, MA) system. The first stage found candidate cuts and other transitions by tracking frame-to-frame changes … For each candidate transition, the second stage presented the user with an array of six static images—six images before and after a candidate cut or six images during a candidate dissolve, fade, or wipe. The user then accepted or rejected the candidate, and the process continued with the next. If the user felt that content of the six images was discontinuous from one candidate transition to the next, he or she flagged the region. The third stage allowed the user to inspect these flagged regions for possible missed transitions.” (James E. Cutting, Jordan E. DeLong and Christine E. Nothelfer for their 2010 study “Attention and the Evolution of Hollywood Film,” pp. 2-3), see http://pss.sagepub.com/content/early/2010/02/04/0956797610361679 )

Replied by:Mike Baxter Date:2013-09-13

Given comments by Tsivian, Redfern and Salt about the reliability of some of the Cutting data I began to wonder about how prevalent some of the problems they identify might be. Redfern (and subsequently myself and DeLong) have worked on just 134 films in their database because of zero and, more worryingly, negative SLs. If, in addition, you have problems with The Great Dictator you begin to wonder what other problems there might be.

Replied by:Yuri Tsivian Date:2013-09-13

A simple (but longish) way to check those would be to check the summary length of each of Cutting's 150 data sets with the actual length of the corresponding titles as given in IMDb. If mismatches like in the case of The Great Dictator prevail, Cinemetrics users should be aware of it. I think Gunars can write a "robot" program which will help do this fast, or someone might do this manually when he or she has time. There might have been a problem they did not realize about the difference of PAL and NTSC running speeds (24 and 25 fps).

Replied by:Mike Baxter Date:2013-09-13

I'm not sure that Yuri's idea of comparing lengths with the IMDb will work. Cutting et al. omit trailing credits and credits without scenic content and I assume IMDb includes these. I've had a quick look at 36 films for which others have conducted analyses. There are four by Barry Salt who usually states he is working from a PAL DVD corrected to 24fps. Two of his analyses produce results very similar to Cutting; two others - to the extent I can `match' shots - seem to produce consistently lower values in Salt's analysis by a factor of 24/25. There are 15/36 (42%)  films where somebody has reported results where the length is 4-7 minutes shorter than Cutting's and the ratio is roughly 24/25. Another 8/36 (22%) films have lengths differing by 2-6 minutes (excluding The Great Dictator) with no particular pattern. Nick Redfern has noted on the Cinemetrics site that ‘Sunset Blvd.’ has a negative SL of 23.7 seconds, and noted elsewhere the existence of other films with negative values. The negative value is followed by the longest recorded SL, 114.5 seconds.

There are seven such films in total, one of 3 seconds and three of 0.5 seconds or less. `Harry Potter and the Goblet of Fire’ has a negative shot of 39.5 followed by the third largest recorded (43.5). ‘Dances with Wolves’ has two negative shots of 31.7 and 831.8 seconds. The latter is followed by a ‘shot’ of 841.7 seconds.

Replied by:Yuri Tsivian Date:2013-09-13

Is there perhaps a methodical way of assessing the overall accuracy of the “Cutting” set of data?

Replied by:Mike Baxter Date:2013-09-13

Barry Salt, commenting on James Cutting’s analysis of ‘The Grapes of Wrath’ in contrast to his own, says ‘the marked difference in length and ASL to my recording is difficult to understand. It is as though James Cutting's figures have been given the 25/24 correction given to adjust a PAL copy unnecessarily’. Shot-by-shot matching between the two analyses is not straightforward but, for example, in the last 30 shot-lengths (SLs) in each analysis the ratio of the Cutting to Salt SLs is about 25/24 for the great majority of matched shots.

The film is one of 150 submitted by Cutting to the Cinemetrics database. The same phenomenon is apparent for `Sunset Blvd.’. Salt omits a long opening shot that Cutting includes, which affects a direct length comparison. Salt states that his ‘results align well with those of Cutting and collaborators’ but shot-by-shot matching (at the start and end of the film) suggests the same ‘25/24 correction’. Two films common to Salt’s and Cutting’s analyses, ‘The 39 Steps’ and ‘Anchors Aweigh’, produce similar results in terms of length and the summary statistics reported. For his analyses Salt notes that frame accurate recording from a PAL DVD was used with durations corrected to 24 fps.

There are (at least) 36 films in the Cutting database that have been analysed by others. If their recorded lengths are compared with those of Cutting I estimate that about 47% of the films have produced fairly similar results, 42% produce shorter lengths roughly compatible with a ‘25/24 correction’ and 14% differ in a non-systematic way. There is double counting here because four films with multiple analyses have results both similar to and different from Cutting’s. Where the `25/24 correction’ is a possibility the difference in recorded lengths is generally in the range 4-7 minutes.

It is impossible to be certain about what exactly these results mean, since not everyone emulates Salt in listing the recording used, or what exactly was omitted from the analysis. Nevertheless the possibility is raised that there might be something like a 40:60 split between films that have and have not had a ‘25/24 correction’ applied, ‘unnecessarily’ as Salt terms it.

Should this be the case the question of whether or not a factor of 25/24 much matters when films with and without the ‘correction’ are compared. Readers of the On Statistics discussions on Cinemetrics will realise that for the purposes of comparison some analyses standardize for length so the ‘correction’ is not problematic. Where such standardization is not applied there is an issue. For example, for ‘The Grapes of Wrath’ the difference in the ASL is 0.4 seconds, about an ASL of around 10; for ‘Sunset Boulevard’ about 6-7 seconds about an ASL of 15. The judgment could well be made that difference of this magnitude are of no great consequence, but it is as well to be aware that they may exist.

Replied by:Nick Redfern Date:2013-09-15

Editing software can be purchased cheaply (by Sony, Magix, or Corel), and there are some that can be downloaded and used (such AVS Video Editor). I find it surprising that film journals do not provide reviews of editing software that could be useful as research tools. I don't make films, but I don't think I could analyse them properly without using editing software.

Analysing frame by frame is better because there are some cuts it is very easy to miss: for exmaple, Kurosawa cuts between shots of the bandit in the forest sequence in Rashomon while Mifune is behind a bush, and these can be difficult to spot.

Writing out the shot length data and then typing it up is perhaps no quicker than the method used by Cutting et al. but it is more accurate. It is much slower than using Cinemtrics but it is more accurate. Plus, examing a film frame by frame really helps you understand how it is put together.

There are other problems to consider when judging the accuracy of shot length data: which version of the film are you using (e.g. there are four different release versions of Bladerunner); when do fades, dissolves, and wipes begin and end; and how do you deal with composite or split-screen shots, in which part of the image changes while part remains the same?

Replied by:Armin Jaeger Date:2013-09-29

I'm not sure if the difference is between reliable and less reliable measurements and not rather between useful and useless. After all no matter how seasoned the submitter may be and his entry well documented with source, we still have to take at his measurement at face value. Unless there is an obvious mistake like a blatantly different running time a la Cutting's Great Dictator there's no way to tell if not something went wrong. Somebody could submit another measurement, but even if we have a significant difference then there's still the question whom to trust.

To give an example, I compared my data on the Star Trek films with Barry Salt's measurements and they are apart from 2 to 66 shots. The latter is the Abrams ultrafast cutting, so that might be understandable, but for the quieter first film we are still 40 shots apart and this seems quite a lot to me. And then there's Generations, the seventh film and the one case where we are widely apart because either I have 210 shots too much or Barry Salt 210 too less. A typo with the first number wrong (then the difference is 63 shots) or the first two reversed (29 shots)?

Here's the data:

ST I Barry Salt 5.60 (ASL in PAL 5.38) - Me 5.17 (theatrical), 5.23 (director's cut)  Difference 0.21 (theatrical) 0.15 (DC) Too many shots: 53 (theatrical) 40 (DC)   

ST II  6.69 and 6.54 (6.42 and 6.28) - 6.27   0.15 (slower measurement) 0.01 (faster)  Too many shots: 23 (slower) 2 (faster)

ST III 5.18 (4.97) - 5.03   0.06  Too less shots: 12

ST IV 6.19 (5.94) - 5.99   0.05   Too less shots: 9

ST V 5.62 (5.40) - 5.50   0.10   Too less shots: 20

ST VI 4.30 (4.13) - 4.26   0.13    Too less shots: 43

ST VII 5.47 (5.25) - 4.48   0.77    Too many shots: 210 (63)

ST VIII 4.40 (4.22) - 4.25    0.03   Too less shots: 10

ST IX 4.60 (4.42) - 4.51   0.09    Too less shots: 26

ST X 3.09 (2.97) - 2.96   0.01     Too many shots: 10

ST XI 3.13 (3.00) - 3.09   0.09     Too less shots: 66

So for the stuff being reliable we'd need two measurements for every film and if the running time fits well (or is recognizable as a difference due to PAL/NTSC) and there's no difference in shots larger than a dozen or so, then we have a reliable result. Until then it's just hoping for the best. As Barry Salt already pointed out a source field should however be absolutely mandatory, not only in the later director and country addition, but already at the point of data submission.

Considering that we can't do better in matters of reliability, not to forget all the problems with long dissolves and composite shots Nick Redfern mentions which are a serious problem and make data comparisons difficult, we should at least seperate the useless data or those that are only useful for the submitting person from those which are useful for the broader film community. A split between full film measurements and excerpts would be a start. Arguably Advanced and Simple mode should be somehow be easier to seperate. And obviously all the trash cluttering the database has to go.

Replied by:Barry Salt Date:2013-10-22

From my experience, negative lengths can arise when measuring lengths on a Steenbeck, or the like, and then writing them down on paper, then entering them in a spreadsheet or whatever. It is a matter of misreading the numbers in front of one, or mis-keying them in. This was of course long ago in the past.

If one is using a non-linear editor (or Gunars' new program), and then exporting the lengths as an EDL, and transferring the resulting file INSIDE the computer to another application for analaysis, or conversion to a Cinemetrics file, then no negative lengths ever arise in my experience.

Secondly, the ASLs for Star Trek etc. that Armin Jaeger is referring to in the previous comment, he is referring to the ASLs in my database. The results in that database are NOT guaranteed frame accurate, or even count-accurate, since the shot counting to generate those ASLs was done on the fly, from a variable speed video-recording playback, rather in the same way that people use the basic Cinemetrics tool to generate their Cinemetrics shot length data, which is certainly not guaranteed shot-accurate.

Replied by:Armin Jaeger Date:2013-11-02

To sum up previous comments, for a proper database the following steps should be implemented:

1) working accents and umlauts. How difficult it may be this has to be solved, otherwise there's no point in even beginning. It needn't be Kyrillic or Chinese, but displaying proper French and German like the Imdb has to be possible.
2) arguably seperate fields for original and English title though something like Original title - English title might also work
3) more flexible choices than displaying either 50 entries or all chosen which means up to 13000 entries on a page
4) a sorting by director, ASL and so on should carry on to the second and all further pages instead of being deactivated when you click "next page"
5) for film excerpts it should be mandatory to explain which part is measured
6) a source field also has to be mandatory and filled with PAL or NTSC info as as TV/DVD/BluRay info
 

Replied by:Barry Salt Date:2013-11-08

Further to Mike Baxter's last comment, I still find a good correspondence between the Cutting data for Sunset Blvd. and mine. Apart from the fact that most of the shot lengths agree exactly, the ASLs for the two sets only differ by 0.4%, not the 4% that would be expected from a 24/25 fps. difference. Some of the differences in shot lengths between the two sets of data appear to result from the Cutting group seeing a cut where I do not, and vice-versa. As far as The Grapes of Wrath is concerned, Jordan DeLong has informed me that they used NTSC disks exclusively, and I have re-checked my frames per second conversion. So this discrepancy remains a mystery. Altogether, that gives good enough agreement for three of the four films common to their and my results, and ours are the only results for this corpus claiming frame accuracy.

Replied by:Mike Baxter Date:2013-11-13

My observation about Sunset Blvd. was based on analysis of the first and last 30 or so shots that I thought could be reasonably matched in the Salt and Cutting analyses. I’ve now undertaken a more thorough analysis across the whole film (omitting the first SL in both analyses; ignoring the negative SL and the one that follows it in the Cutting analysis; and amalgamating a few SLs where one or other analysis involves a cut not recorded in the other). Barry Salt’s SLs are fairly consistently shorter (97% of over 400 matched shots). If the Salt/Cutting ratio for each shot is averaged it comes out at 0.961 (24/25 = 0.96). The ratio of the ASLs from the original analyses is 14.9/15.6 = 0.96. I think, therefore, my original observation stands.

Replied by:Barry Salt Date:2013-11-14

No. Shot 156 in Cutting's data has a length of -237 deciseconds. His shot 157 is 1145 deciseconds long. Add them together, and you get 908 deciseconds. My shot 156 is 356 deciseconds long, and my shot 157 is 555 deciseconds long, and if you add those two together, you get 911 deciseconds. My values for the length of these two shots are the correct lengths at 24 frames per second. Now, if you take my first shot and Cutting's first shot length out of the accounting, because my first shot is the tail-end of his first shot, starting after the superimposed director credit, and then add up the total of our respective shot lengths for the film (in deciseconds), you get 64775 for cutting, and 64696 for my values. This is a difference of 79 deciseconds. That is only 8 seconds, or a difference of approx. 0.12 %. this is far less than what would result from a 24/25 fps. discrepancy, which is about 4%. This very small discrepancy is probably partially due to where Cutting's team and I put the end of the last shot during the fading in of the end title, and partly due to the fact that I was working with an off-air tape, which was  interrupted by three commercial breaks.

Replied by:Barry Salt Date:2013-11-16

Yes. Mike is right. My data for "Sunset Blvd." on the Cinemetrics database is incorrect. Despite what I noted in my comments there, it is NOT corrected to 24 fps. It is at 25 fps., hence the difference he found. Sorry about that. There is a bunch of about 20 films where I got it wrong in the same way. I will correct the situation as soon as possible. But my data for the film when given the fps 25/24 correction does line it up with the Cutting data as I indicated.
 

Replied by:Yuri Tsivian Date:2013-11-23

Barry: when you submit the corrected data under titles please let Gunars and me know so that he removes the incorrect data from the dbase