[NSRCA-discussion] Judging

Fri Oct 19 09:02:48 AKDT 2007

Derek, I followed your summary pretty well until the first sentence in the third paragraph. "system works very well and in the end the correct order of finish is picked- which is what we want" 
Where is "correct " quantified? 
Who says what is "correct"? 
What says that one judge being low on a maneuver is "wrong"/
Sounds more like the scenario previously mentioned as the "7 to 9 syndrome" is the safe way to score.
How many times have you scored a zero and others gave a score-in error- because the pilot rolled the wrong way? By the way throwing out the old requirement that the judges confer on zeros has only helped get us in this "perceived" problem.
Sure sounds like a "dry lab" approach to me! Easy to massage data and the algorithms to "prove" the result desired. 
I went through the dataseveral years ago, that was available-and it was not all-on the "Judges ratings", and got the same gut feeling.
What the pilot sees/flies, and his caller sees, and what each judge sees are seldom the same for a multitude of reasons. 
Also as a judge, have you ever had a "watcher" come up after a flight and say how great the flight was- and you had just given several zeros/low scores. Amazing how they change their perspective when you point out the obvious errors. That is assuming your memory hasn't blanked out the past pilot yet, and now you are getting ready for the next.
Jerry 
  ----- Original Message ----- 
  From: Derek Koopowitz 
  To: 'NSRCA Mailing List' 
  Sent: Thursday, October 18, 2007 11:25 PM
  Subject: Re: [NSRCA-discussion] D3 Championship - Scoring SystemOverhaul-LONG

  TBL - scores go back to the pilots as is... raw.  TBL is only factored once a round is finished and the results calculated.  With TBL scores are adjusted to the whole score and result in whole numbers.  Remember, the entire score isn't thrown out - just a maneuver score.  When a score is dropped then TBL is run on the entire set of scores again to recalculate the mean and std. deviation - it is an iterative process until all scores fall within the range that are calculated for the pilots/judges.

  TBL was used at the TOC for all years starting in 1999 onwards.  It has been used at the WC since 2001 (I think).

  Bottom line - the system works very well and in the end the correct order of finish is picked - which is what we want.  In all the years of scoring with TBL at the TOC I saw very few judges scores discarded which probably says more about the quality of judging than anything else.  I'd be very curious what the results have been at the F3A WC's since it was implemented.

  I would love to see TBL in use at the Nats for the Finals in both classes - we certainly have enough judges to support its use.

------------------------------------------------------------------------------
  From: nsrca-discussion-bounces at lists.nsrca.org [mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of rcmaster199 at aol.com
  Sent: Thursday, October 18, 2007 8:51 PM
  To: nsrca-discussion at lists.nsrca.org
  Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul-LONG

  Earl,

  If I remember correctly, some of the results from the Judge Evaluation have been presented either in the KF or the website in past years. But I haven't seen any results posted lately. Perhaps you are right...posting might serve a purpose.

  Curious...in the TBL method, do the adjustments to a judge's scores come in whole or fractional numbers? How are the scoresheets that are given back to the pilots handled.....I mean, do they show adjusted scores or as judged? Or both?

  Koop, was TBL used in the TOC? 

  If we have access to the algorithm, I wouldn't mind taking a peek of how the 2007 F3A Nats Final could turn out. I would bet that it wouldn't make much difference on where the pilots placed.

  Matt

  -----Original Message-----
  From: Earl Haury <ejhaury at comcast.net>
  To: NSRCA Mailing List <nsrca-discussion at lists.nsrca.org>
  Sent: Thu, Oct 18 10:24 PM
  Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul -LONG

  We miss an opportunity for judges to evaluate their performance by not distributing the results of the analysis of that performance (at major meets). I, for one, would like the judges performance data to be published in the K-Factor (after all - the pilots scores are published). However, realizing that some are squeamish about this, I think that we should still provide each judge with the analysis of his / her specific performance. At least, each judge would then have an indication of their current skills and any variation over the years.

  Earl
    ----- Original Message ----- 
    From: Derek Koopowitz 
    To: 'NSRCA Mailing List' 
    Sent: Thursday, October 18, 2007 7:53 PM
    Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul -LONG

    I'll post my dissertation on TBL again since this issue seems to crop up time and time again...

    The Tarasov-Bauer-Long (TBL) Scoring method has been around since the 1970’s.  It has been used in the full size arena since 1978 and has been used at every full size IAC World Championship since 1980.  The TBL method applies proven statistical probability theory to the judge’s scores to resolve style differences and bias, and to avoid the inclusion of potential faulty judgements in contest results.  To understand just why we need TBL, and how it works, is if considerable importance to us all.  It is important to the pilots because it is there to reduce the prospect of unsatisfactory judgements affecting their results, and it is important for judges because it will introduce a completely new dimension of scrutiny into the sequence totals, and it will also discreetly engage the attention of the Chief Judge, or Contest Director, if the judges conclusions differ sufficiently from all those other judges on the same panel.
    When people get together to judge how well a pre-defined competitive task is being tackled, the range of opinions is often diverse.  This is entirely natural among humans where the critique of any display of skill relies on the interpretation of rapidly changing visual cues.  In order to minimize the prospect of any “way out opinions” having too much effect on the result, it is usual to average the accumulated scores to arrive at a final assessment, which takes everybody’s opinion into account.  Unfortunately this averaging approach can achieve the opposite of what we really want, which is to identify, and where needed, remove those “way out opinions” because they are the ones most likely to be ill-judged and therefore should be discarded, leaving the rest to determine the more appropriate result.
    In aerobatics the process of judging according to the rulebook normally leads to a series of generally similar personal views.  However, one judge’s downgrading may be harsher or more lenient than the next, their personal feelings toward each competitor or aircraft type may predispose toward favor or dislike (bias), and they will almost certainly miss or see things that other judges do not.  How then can we “judge” the judges and so reach a conclusion, which has good probability of acceptance by all the concerned parties?
    The key word is probability, the concept of a perceived level of confidence in collectively viewed judgements has entered the frame.  What we really mean is that we must be confident that opinions pitched outside some pre-defined level of reasonable acceptability will be identified as such and will not be used.  This sort of situation is the daily bread and butter of well established probability theory which, when suitably applied, can produce a very clear cut analysis of numerically expressed opinions provided that the appropriate criteria have been carefully established beforehand.
    What has been developed through several previous editions is some arithmetic which addresses the judge’s raw scores in such a way that any which are probably unfair are discarded with an established level of confidence.  To understand the process you need only accept some quite simple arithmetic procedures, which are central to what is called “statistical probability”.
    The TBL scoring system in effect does the following:
    ·        Commonizes the judging styles.
    ·        Computes TBL scores
    ·        Publishes results

    Commonizing the judging styles involves remodeling the scores to bring all the judging styles to a common format and removing any natural bias between panel members.  Following some calculations, each judge’s set of scores are squeezed or stretched and moved en-bloc up or down so that the sets all show the same overall spread and have identical averages (bias).  Within each set the pilot order and score progression must remain unaltered, but now valid score comparisons are possible between all the panel judges on behalf of each pilot.
    Computing the TBL score involves looking at the high and low scores in each pilot’s set and throws out any that are too “far out” to be fair.  This is done by subtracting the average for the set from each one and dividing the result by the “sample standard deviation” - if the result of this sum is greater than 1.645 then according to statistical probability theory we can be at least 90% confident that it is unfair, so the score is discarded.  This calculation and the mathematically derived 1.645 criteria is the key to the correctness of the TBL process, and is based on many years of experience by the full size aerobatics organization with contest scores at all levels.  The discarding of any scores of course changes for a pilot the average and standard deviation of their remaining results, and so the whole process is repeated.  After several cycles any “unfair” scores will have gone, and those that remain will all satisfy the essential 90% confidence criteria.
    Publishing the results is derived by averaging each pilot’s scores.  The final TBL iteration therefore has any appropriate penalty/bonus values applied and the results are then sorted in order of descent of the total scores to rank the pilots first to last.  These final scores may, or may not, be normalized to 1000 points, depending on the setting for the selected class.
    Educating and improving the judges is a useful by-product of this process in that it provides all the bells and whistles how each judge has performed by comparison with the overall judging panel average and when seen against the 90% level of confidence criteria.  The TBL system will produce an analysis showing each judge the percentage of scores accepted as “OK”, and a comparison with the panel style (spread of score) and bias (average).
    Unfortunately TBL, by definition, brings with it a 10% possibility of upsetting an honest judge’s day.  The trade-off is that we expect not only to achieve a set of results with at least 90% confidence that are “fair” every time, but that the system also provides us with a wonderful tool to address our judging standards.  TBL will ensure that every judge’s opinion has equal weight, and that each sequence score by each judge is accepted only if it lies within an acceptable margin from the panel average.  TBL, however, by necessity takes the dominant judging panel view as the “correct” one and it can’t make right scores out of wrong ones.  If 6 out of 8 judges are distracted and make a mess out of one pilots efforts, then for TBL this becomes the controlling assessment of that pilots performance, and the other 2 diligent judges who got it right will see their scores unceremoniously zapped.  In practice this would be extremely unusual - from the judging line it is almost impossible to deliberately upset the final results without collusion between a majority of the judges, and if that starts to happen then someone is definitely on the wrong planet.

----------------------------------------------------------------------------

    From: nsrca-discussion-bounces at lists.nsrca.org [mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of vicenterc at comcast.net
    Sent: Thursday, October 18, 2007 8:11 AM
    To: NSRCA Mailing List
    Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul

    Tony,

    Do you know if the TBL system eliminates the high and low scores?  I think that is a good solution but we can not do it in local contests.  Probably we could in some contests since we have many Masters vs. F3A.  

    Do you know "link" where we can read about TBL system?

    --
    Vicente "Vince" Bortone

      -------------- Original message -------------- 
      From: "Tony" <tony at radiosouthrc.com> 

      This TBL will find these problems and is in use at World Champs.  The problem is that you need at least 5 judges on a line.

      Tony Stillman, President
      Radio South, Inc.
      139 Altama Connector, Box 322
      Brunswick, GA  31525
      1-800-962-7802
      tony at radiosouthrc.com

--------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.nsrca.org/pipermail/nsrca-discussion/attachments/20071019/1de0ce71/attachment-0001.html