[NSRCA-discussion] Judging

Sat Oct 20 06:37:15 AKDT 2007

Hi Vince..

    I agree with your observations.  I have also witnessed the same. 

    I have always been taught and read that is pilots responsibility to show the judge the maneuver correctly. As a judge if my mind doubts that it truly didn't see it then I downgrade. 

    I remember zeroing a well known FAI flyer once and I was questioned by some why I thought such a great flyer should be given a zero. I backed it up with the facts and they mostly agreed that they saw that but didn't think they could possibly give him a zero. They did downgrade. 

    It is for this reason I feel strongly that all judges need to be accountable for their scores. If a judge isn't doing the task correctly then work with the judge to fix the issue...  Not try to use TBL or software to fix the problem..,. All judges efforts want to be valuable all the time..  The pilots want honest judges and are knowledgeable and have the conviction to award the score the pilot deserved.

    Del
  ----- Original Message ----- 
  From: vicenterc at comcast.net 
  To: NSRCA Mailing List 
  Sent: Saturday, October 20, 2007 8:45 AM
  Subject: Re: [NSRCA-discussion] Judging

  Is one of the areas snaps rolls?  I think high percentage of us are not judging snaps rolls following the rule book.  No break in snaps earn zero in F3A and 5 points downgrade in AMA.  I believe that high percentage of us (including me) are giving too much the benefit of the doubt rule (that is not in the rule book) in a very important element of this maneuver (break at entry) that has higher k-factor than the average. 

  Regards, 

  --
  Vicente "Vince" Bortone

    -------------- Original message -------------- 
    From: "Del K. Rykert" <drykert2 at rochester.rr.com> 

    I am not one to condone the tossing of one high and one low score either.  Why the need to toss anyone's score?  If they are biased judges you apparently have a system in place that addresses that if you choose to use it. 

    To say that anyone manning the judges chair isn't worthy of having their score count is a slap in the kisser to that judge. 

    TBL to the best of my knowledge has never been used to look for or address a judge who Santa Clauses,  or a judge who is a random number generator or worse the chronic 7/8 judge. 

    To enter into and promote data manipulation only addressing the low scores is wrong in my book. Where do people come up with the data that says that only the low number is the wrong score? 

    If you want to track all the data of all judges over a large window of time and under varied contest and use that data to assist teaching those judges to do a better job of judging, I'm all for it. To remove some judges scores because someone decided that must be a bad score and therefore this will fix that problem is only data manipulation. It doesn't address the problem judge. 

    Course maybe I'm confused again and truly fixing the problem judges isn't the goal. I'm not inferring the human mistake when eyes and brains are fried and a judge goofs. I am talking about the flagrant and repeated judging performance that appears to be a problem in some areas apparently. 

        Del
      ----- Original Message ----- 
      From: Derek Koopowitz 
      To: 'NSRCA Mailing List' 
      Sent: Friday, October 19, 2007 4:03 PM
      Subject: Re: [NSRCA-discussion] Judging

      If you used standard high/low discards then with a 5 judge panel - 2 judges scores will be discarded on every maneuver.  Is that fair?  That is 40% of judges scores being discarded... how would you feel about judging if your scores were being discarded on a regular basis like that?  With TBL the probability is over 90% of judges scores will be kept (on a 5 judge panel) - that makes the judge feel a little more appreciated, don't you think?

      Any judge that plays it "safe" is doing themselves a disservice as well as the pilots they are judging.

--------------------------------------------------------------------------
      From: nsrca-discussion-bounces at lists.nsrca.org [mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of Del K. Rykert
      Sent: Friday, October 19, 2007 11:22 AM
      To: NSRCA Mailing List
      Subject: Re: [NSRCA-discussion] Judging

      I feel the same way Jerry..  Having nailed someone for a top hat down upright, 2 spins for 3 turn, extra half roll..  not doing next proper maneuver.etc.. etc.. Have zeroed 3 maneuvers in row before pilot gets back in proper sequence.  We all know the drill.  But the judge that catches them who is also doing it right is the one who is penalized.. Sure makes one want to man the judges chair doesn't it..  If all your work and effort to do the job right is just going to be thrown out because someone feels the judge who scored low is wrong is not sending the right message to those who try to be honest and accountable judges.  What is promoted is the old play it safe and throw out a 7 or 8 and all are happy. Well except for the pilot who cares...  <tic agn> 

           Del
        ----- Original Message ----- 
        From: Jerry Stebbins 
        To: Discussion -NSRCA 
        Sent: Friday, October 19, 2007 1:03 PM
        Subject: [NSRCA-discussion] Judging

        Derek, I followed your summary pretty well until the first sentence in the third paragraph. "system works very well and in the end the correct order of finish is picked- which is what we want" 
        Where is "correct " quantified? 
        Who says what is "correct"? 
        What says that one judge being low on a maneuver is "wrong"/
        Sounds more like the scenario previously mentioned as the "7 to 9 syndrome" is the safe way to score.
        How many times have you scored a zero and others gave a score-in error- because the pilot rolled the wrong way? By the way throwing out the old requirement that the judges confer on zeros has only helped get us in this "perceived" problem.
        Sure sounds like a "dry lab" approach to me! Easy to massage data and the algorithms to "prove" the result desired. 
        I went through the dataseveral years ago, that was available-and it was not all-on the "Judges ratings", and got the same gut feeling.
        What the pilot sees/flies, and his caller sees, and what each judge sees are seldom the same for a multitude of reasons. 
        Also as a judge, have you ever had a "watcher" come up after a flight and say how great the flight was- and you had just given several zeros/low scores. Amazing how they change their perspective when you point out the obvious errors. That is assuming your memory hasn't blanked out the past pilot yet, and now you are getting ready for the next.
        Jerry 
          ----- Original Message ----- 
          From: Derek Koopowitz 
          To: 'NSRCA Mailing List' 
          Sent: Thursday, October 18, 2007 11:25 PM
          Subject: Re: [NSRCA-discussion] D3 Championship - Scoring SystemOverhaul-LONG

          TBL - scores go back to the pilots as is... raw.  TBL is only factored once a round is finished and the results calculated.  With TBL scores are adjusted to the whole score and result in whole numbers.  Remember, the entire score isn't thrown out - just a maneuver score.  When a score is dropped then TBL is run on the entire set of scores again to recalculate the mean and std. deviation - it is an iterative process until all scores fall within the range that are calculated for the pilots/judges.

          TBL was used at the TOC for all years starting in 1999 onwards.  It has been used at the WC since 2001 (I think).

          Bottom line - the system works very well and in the end the correct order of finish is picked - which is what we want.  In all the years of scoring with TBL at the TOC I saw very few judges scores discarded which probably says more about the quality of judging than anything else.  I'd be very curious what the results have been at the F3A WC's since it was implemented.

          I would love to see TBL in use at the Nats for the Finals in both classes - we certainly have enough judges to support its use.

----------------------------------------------------------------------
          From: nsrca-discussion-bounces at lists.nsrca.org [mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of rcmaster199 at aol.com
          Sent: Thursday, October 18, 2007 8:51 PM
          To: nsrca-discussion at lists.nsrca.org
          Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul-LONG

          Earl,

          If I remember correctly, some of the results from the Judge Evaluation have been presented either in the KF or the website in past years. But I haven't seen any results posted lately. Perhaps you are right...posting might serve a purpose.

          Curious...in the TBL method, do the adjustments to a judge's scores come in whole or fractional numbers? How are the scoresheets that are given back to the pilots handled.....I mean, do they show adjusted scores or as judged? Or both?

          Koop, was TBL used in the TOC? 

          If we have access to the algorithm, I wouldn't mind taking a peek of how the 2007 F3A Nats Final could turn out. I would bet that it wouldn't make much difference on where the pilots placed.

          Matt

          -----Original Message-----
          From: Earl Haury <ejhaury at comcast.net>
          To: NSRCA Mailing List <nsrca-discussion at lists.nsrca.org>
          Sent: Thu, Oct 18 10:24 PM
          Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul -LONG

          We miss an opportunity for judges to evaluate their performance by not distributing the results of the analysis of that performance (at major meets). I, for one, would like the judges performance data to be published in the K-Factor (after all - the pilots scores are published). However, realizing that some are squeamish about this, I think that we should still provide each judge with the analysis of his / her specific performance. At least, each judge would then have an indication of their current skills and any variation over the years.

          Earl
            ----- Original Message ----- 
            From: Derek Koopowitz 
            To: 'NSRCA Mailing List' 
            Sent: Thursday, October 18, 2007 7:53 PM
            Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul -LONG

            I'll post my dissertation on TBL again since this issue seems to crop up time and time again...

            The Tarasov-Bauer-Long (TBL) Scoring method has been around since the 1970's.  It has been used in the full size arena since 1978 and has been used at every full size IAC World Championship since 1980.  The TBL method applies proven statistical probability theory to the judge's scores to resolve style differences and bias, and to avoid the inclusion of potential faulty judgements in contest results.  To understand just why we need TBL, and how it works, is if considerable importance to us all.  It is important to the pil! ots bec ause it is there to reduce the prospect of unsatisfactory judgements affecting their results, and it is important for judges because it will introduce a completely new dimension of scrutiny into the sequence totals, and it will also discreetly engage the attention of the Chief Judge, or Contest Director, if the judges conclusions differ sufficiently from all those other judges on the same panel.
            When people get together to judge how well a pre-defined competitive task is being tackled, the range of opinions is often diverse.  This is entirely natural among humans where the critique of any display of skill relies on the interpretation of rapidly changing visual cues.  In order to minimize the prospect of any "way out opinions" having too much effect on the result, it is usual to average the accumulated scores to arrive at a final assessment, which takes everybody's opinion into account.  Unfortunately this averaging approach can achieve the opposite of what we really want, which is to identify, and where needed, remove those "way out opinions" because they are the ones most likely to be ill-judged and therefore should be discarded, leavin! g the r est to determine the more appropriate result.
            In aerobatics the process of judging according to the rulebook normally leads to a series of generally similar personal views.  However, one judge's downgrading may be harsher or more lenient than the next, their personal feelings toward each competitor or aircraft type may predispose toward favor or dislike (bias), and they will almost certainly miss or see things that other judges do not.  How then can we "judge" the judges and so reach a conclusion, which has good probability of acceptance by all the concerned parties?
            The key word is probability, the concept of a perceived level of confidence in collectively viewed judgements has entered the frame.  What we really mean is that we must be confident that opinions pitched outside some pre-defined level of reasonable acceptability will be identified as such and will not be used.  This sort of situation is the daily bread and butter of well established probability theory which, when suitably applied, can produce a very clear cut analysis of numerically expressed opinions provided that the appropriate criteria have been carefully established beforehand.
            What has been developed through several previous editions is some arithmetic which addresses the judge's raw scores in such a way that any which are probably unfair are discarded with an established level of confidence.  To understand the process you need only accept some quite simple arithmetic procedures, which are central to what is called "statistical probability".
            The TBL scoring system in effect does the following:
            ·        Commonizes the judging styles.
            ·        Computes TBL scores
            ·        Publishes results

            Commonizing the judging styles involves remodeling the scores to bring all the judging styles to a common format and removing any natural bias between panel members.  Following some calculations, each judge's set of scores are squeezed or stretched and moved en-bloc up or down so that the sets all show the same overall spread and have identical averages (bias).  Within each set the pilot order and score progression must remain unaltered, but now valid score comparisons are possible between all the panel judges on behalf of each pilot.
            Computing the TBL score involves looking at the high and low scores in each pilot's set and throws out any that are too "far out" to be fair.  This is done by subtracting the average for the set from each one and dividing the result by the "sample standard deviation" - if the result of this sum is greater than 1.645 then according to statistical probability theory we can be at least 90% confident that it is unfair, so the score is discarded.  This calculation and the mathematically derived 1.645 criteria is the key to the correctness of the TBL process, and is based on many years of experience by the full size aerobatics organization with contest scores at all levels.  The discarding of any scores of course changes for a pilot the average and sta! ndard d eviation of their remaining results, and so the whole process is repeated.  After several cycles any "unfair" scores will have gone, and those that remain will all satisfy the essential 90% confidence criteria.
            Publishing the results is derived by averaging each pilot's scores.  The final TBL iteration therefore has any appropriate penalty/bonus values applied and the results are then sorted in order of descent of the total scores to rank the pilots first to last.  These final scores may, or may not, be normalized to 1000 points, depending on the setting for the selected class.
            Educating and improving the judges is a useful by-product of this process in that it provides all the bells and whistles how each judge has performed by comparison with the overall judging panel average and when seen against the 90% level of confidence criteria.  The TBL system will produce an analysis showing each judge the percentage of scores accepted as "OK", and a comparison with the panel style (spread of score) and bias (average).
            Unfortunately TBL, by definition, brings with it a 10% possibility of upsetting an honest judge's day.  The trade-off is that we expect not only to achieve a set of results with at least 90% confidence that are "fair" every time, but that the system also provides us with a wonderful tool to address our judging standards.  TBL will ensure that every judge's opinion has equal weight, and that each sequence score by each judge is accepted only if it lies within an acceptable margin from the panel average.  TBL, however, by necessity takes the dominant judging panel view as the "correct" one and it can't make right scores out of wrong ones.  If 6 out of 8 judges are distracted and make a mess out of one pil! ots eff orts, then for TBL this becomes the controlling assessment of that pilots performance, and the other 2 diligent judges who got it right will see their scores unceremoniously zapped.  In practice this would be extremely unusual - from the judging line it is almost impossible to deliberately upset the final results without collusion between a majority of the judges, and if that starts to happen then someone is definitely on the wrong planet.

--------------------------------------------------------------------

            From: nsrca-discussion-bounces at lists.nsrca.org [mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of vicenterc at comcast.net
            Sent: Thursday, October 18, 2007 8:11 AM
            To: NSRCA Mailing List
            Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul

            Tony,

            Do you know if the TBL system eliminates the high and low scores?  I think that is a good solution but we can not do it in local contests.  Probably we could in some contests since we have many Masters vs. F3A.  

            Do you know "link" where we can read about TBL system?

            --
            Vicente "Vince" Bortone

              -------------- Original message -------------- 
              From: "Tony" <tony at radiosouthrc.com> 

              This TBL will find these problems and is in use at World Champs.  The problem is that you need at least 5 judges on a line.

              Tony Stillman, President
              Radio South, Inc.
              139 Altama Connector, Box 322
              Brunswick, GA  31525
              1-800-962-7802
              tony at radiosouthrc.com

------------------------------------------------------------------

------------------------------------------------------------------------

        _______________________________________________
        NSRCA-discussion mailing list
        NSRCA-discussion at lists.nsrca.org
        http://lists.nsrca.org/mailman/listinfo/nsrca-discussion

--------------------------------------------------------------------------

      _______________________________________________
      NSRCA-discussion mailing list
      NSRCA-discussion at lists.nsrca.org
      http://lists.nsrca.org/mailman/listinfo/nsrca-discussion

------------------------------------------------------------------------------

  _______________________________________________
  NSRCA-discussion mailing list
  NSRCA-discussion at lists.nsrca.org
  http://lists.nsrca.org/mailman/listinfo/nsrca-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.nsrca.org/pipermail/nsrca-discussion/attachments/20071020/bc65dd2d/attachment-0001.html