[NSRCA-discussion] D3 Championship - Scoring System Overhaul -LONG
Earl Haury
ejhaury at comcast.net
Thu Oct 18 18:21:25 AKDT 2007
We miss an opportunity for judges to evaluate their performance by not distributing the results of the analysis of that performance (at major meets). I, for one, would like the judges performance data to be published in the K-Factor (after all - the pilots scores are published). However, realizing that some are squeamish about this, I think that we should still provide each judge with the analysis of his / her specific performance. At least, each judge would then have an indication of their current skills and any variation over the years.
Earl
----- Original Message -----
From: Derek Koopowitz
To: 'NSRCA Mailing List'
Sent: Thursday, October 18, 2007 7:53 PM
Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul -LONG
I'll post my dissertation on TBL again since this issue seems to crop up time and time again...
The Tarasov-Bauer-Long (TBL) Scoring method has been around since the 1970's. It has been used in the full size arena since 1978 and has been used at every full size IAC World Championship since 1980. The TBL method applies proven statistical probability theory to the judge's scores to resolve style differences and bias, and to avoid the inclusion of potential faulty judgements in contest results. To understand just why we need TBL, and how it works, is if considerable importance to us all. It is important to the pilots because it is there to reduce the prospect of unsatisfactory judgements affecting their results, and it is important for judges because it will introduce a completely new dimension of scrutiny into the sequence totals, and it will also discreetly engage the attention of the Chief Judge, or Contest Director, if the judges conclusions differ sufficiently from all those other judges on the same panel.
When people get together to judge how well a pre-defined competitive task is being tackled, the range of opinions is often diverse. This is entirely natural among humans where the critique of any display of skill relies on the interpretation of rapidly changing visual cues. In order to minimize the prospect of any "way out opinions" having too much effect on the result, it is usual to average the accumulated scores to arrive at a final assessment, which takes everybody's opinion into account. Unfortunately this averaging approach can achieve the opposite of what we really want, which is to identify, and where needed, remove those "way out opinions" because they are the ones most likely to be ill-judged and therefore should be discarded, leaving the rest to determine the more appropriate result.
In aerobatics the process of judging according to the rulebook normally leads to a series of generally similar personal views. However, one judge's downgrading may be harsher or more lenient than the next, their personal feelings toward each competitor or aircraft type may predispose toward favor or dislike (bias), and they will almost certainly miss or see things that other judges do not. How then can we "judge" the judges and so reach a conclusion, which has good probability of acceptance by all the concerned parties?
The key word is probability, the concept of a perceived level of confidence in collectively viewed judgements has entered the frame. What we really mean is that we must be confident that opinions pitched outside some pre-defined level of reasonable acceptability will be identified as such and will not be used. This sort of situation is the daily bread and butter of well established probability theory which, when suitably applied, can produce a very clear cut analysis of numerically expressed opinions provided that the appropriate criteria have been carefully established beforehand.
What has been developed through several previous editions is some arithmetic which addresses the judge's raw scores in such a way that any which are probably unfair are discarded with an established level of confidence. To understand the process you need only accept some quite simple arithmetic procedures, which are central to what is called "statistical probability".
The TBL scoring system in effect does the following:
· Commonizes the judging styles.
· Computes TBL scores
· Publishes results
Commonizing the judging styles involves remodeling the scores to bring all the judging styles to a common format and removing any natural bias between panel members. Following some calculations, each judge's set of scores are squeezed or stretched and moved en-bloc up or down so that the sets all show the same overall spread and have identical averages (bias). Within each set the pilot order and score progression must remain unaltered, but now valid score comparisons are possible between all the panel judges on behalf of each pilot.
Computing the TBL score involves looking at the high and low scores in each pilot's set and throws out any that are too "far out" to be fair. This is done by subtracting the average for the set from each one and dividing the result by the "sample standard deviation" - if the result of this sum is greater than 1.645 then according to statistical probability theory we can be at least 90% confident that it is unfair, so the score is discarded. This calculation and the mathematically derived 1.645 criteria is the key to the correctness of the TBL process, and is based on many years of experience by the full size aerobatics organization with contest scores at all levels. The discarding of any scores of course changes for a pilot the average and standard deviation of their remaining results, and so the whole process is repeated. After several cycles any "unfair" scores will have gone, and those that remain will all satisfy the essential 90% confidence criteria.
Publishing the results is derived by averaging each pilot's scores. The final TBL iteration therefore has any appropriate penalty/bonus values applied and the results are then sorted in order of descent of the total scores to rank the pilots first to last. These final scores may, or may not, be normalized to 1000 points, depending on the setting for the selected class.
Educating and improving the judges is a useful by-product of this process in that it provides all the bells and whistles how each judge has performed by comparison with the overall judging panel average and when seen against the 90% level of confidence criteria. The TBL system will produce an analysis showing each judge the percentage of scores accepted as "OK", and a comparison with the panel style (spread of score) and bias (average).
Unfortunately TBL, by definition, brings with it a 10% possibility of upsetting an honest judge's day. The trade-off is that we expect not only to achieve a set of results with at least 90% confidence that are "fair" every time, but that the system also provides us with a wonderful tool to address our judging standards. TBL will ensure that every judge's opinion has equal weight, and that each sequence score by each judge is accepted only if it lies within an acceptable margin from the panel average. TBL, however, by necessity takes the dominant judging panel view as the "correct" one and it can't make right scores out of wrong ones. If 6 out of 8 judges are distracted and make a mess out of one pilots efforts, then for TBL this becomes the controlling assessment of that pilots performance, and the other 2 diligent judges who got it right will see their scores unceremoniously zapped. In practice this would be extremely unusual - from the judging line it is almost impossible to deliberately upset the final results without collusion between a majority of the judges, and if that starts to happen then someone is definitely on the wrong planet.
------------------------------------------------------------------------------
From: nsrca-discussion-bounces at lists.nsrca.org [mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of vicenterc at comcast.net
Sent: Thursday, October 18, 2007 8:11 AM
To: NSRCA Mailing List
Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul
Tony,
Do you know if the TBL system eliminates the high and low scores? I think that is a good solution but we can not do it in local contests. Probably we could in some contests since we have many Masters vs. F3A.
Do you know "link" where we can read about TBL system?
--
Vicente "Vince" Bortone
-------------- Original message --------------
From: "Tony" <tony at radiosouthrc.com>
This TBL will find these problems and is in use at World Champs. The problem is that you need at least 5 judges on a line.
Tony Stillman, President
Radio South, Inc.
139 Altama Connector, Box 322
Brunswick, GA 31525
1-800-962-7802
tony at radiosouthrc.com
----------------------------------------------------------------------------
From: nsrca-discussion-bounces at lists.nsrca.org [mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of vicenterc at comcast.net
Sent: Thursday, October 18, 2007 10:44 AM
To: NSRCA Mailing List; NSRCA Mailing List
Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul
I agree with George. If I remember right from statistics courses 20 years ago, this type of problem follows the normal distribution or bell shape curve. In order to have any significant precision in the scoring system we will need to have at least 33 judges per round. We all know that it is impossible for us to have 33 judges per round. I also agree with George that at the end of the contest the winner is usually the best pilot and if you ask around very high percentage will agree with the results.
Well, we need a PHD in statistics to help us out.
--
Vicente "Vince" Bortone
-------------- Original message --------------
From: <glmiller3 at suddenlink.net>
> I've said this before and most people don't "get it", but we are asking more
> precision from our scoring system than is mathematically possible.
>
> In the case of FAI scores, the raw scores are rounded to one significant digit
> (whole numbers only from 1-10). This means that after any mathematical
> manipulation, the result is only "Accurate" to one digit and the result should
> be rounded to that digit using some standardized "rounding algorithm".... We are
> manipulating a single digit of significance and basing outcomes on up to EIGHT
> digits (1234.5678 ). Mathematically speaking any "normalized score" between
> 950.0000 and 1000.0000 is the same score.....900 to 949; 850-900; etc. All
> other problems of judgin! g! incon sistency, bias, averaging, etc pale in
> comparison.
>
> As I've said before, I'm astounded that a system that is so mathematically
> flawed can provide results that are as good as they are....for my part, I
> usually feel like I have been ranked pretty fairly compared to the other pilots
> in my class even though the ranking is based on statistically meaningless
> numbers.
>
> George
>
>
> ---- "Woodward wrote:
> > Guys -
> >
> >
> >
> > 1. Why do we average judge's scores together?
> >
> > a. The whole system is predicated on judges being
> > competent/consistent/correct/un-biased. If this first rule is violated
> > ranging from small to 100 point raw score difference, the idea of now
> > "averaging" the score also has no validity.
> > b. Why not just let each judge's score stand as! ! is, < B R>> > unaltered, and produce two normalized scores per round?
> > c. Averaging the scores together may be doing a disservice
> > to everyone.
> > d. You would basically have two sets of scores per round.
> > IE you may end up with 1000 points on one card, and an 800 on the other.
> >
> > e. This would identify immediately any cause for concern.
> > f. Would it provide an immediate training tool back to our
> > system of pilots, judges, and CD?
> > g. Attempting to "un-average" the scores to determine what
> > happened takes place anyway on the flight line, after the scores are
> > printed.
> >
> > 2. Are we asking too much from judges?
> >
> > a. Is applying downgrades, then counting backwards from 10,
> > in the context of "turnaround" pattern where maneuvers can happen
>! ; > back-to-back quickly, too d! ifficul t across the full spectrum of
> > competitor/judges?
> >
> > 3. Dropping Rounds:
> >
> > a. Is this still a good idea?
> > b. I wouldn't mind dropping one round, but it was explained
> > to me last night that this is an artifact from the days of when people
> > would break a prop on touch-n-goes, and in general lower equipment
> > reliability.
> > c. In the age of higher equipment reliability, is the
> > 'round-drop' scenarios still good, left as is?
> >
> > 4. Dropping Rounds - Take 2:
> >
> > a. In the context of point #1, maybe we should be allowed
> > to drop the lowest scored judge from each round, versus the entire
> > round.
> > b. Why should the pilot drop the entire round, when one
> > judge may have scored him 1000 points, and the other 800? !
> ; > c. If you end up with a! tie at the end, you just keep
> > counting "1000's" until the other pilot runs out - tie is now "untied."
> >
> >
> >
> > I hope some smart guys can chime in on potential over-hauling idea.
> >
> >
> >
> >
> >
> > ________________________________
> >
> > From: nsrca-discussion-bounces at lists.nsrca.org
> > [mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of Lisa &
> > Larry
> > Sent: Wednesday, October 17, 2007 10:27 PM
> > To: 'NSRCA Mailing List'
> > Subject: Re: [NSRCA-discussion] D3 Championship
> >
> >
> >
> > OK...I probably shouldn't start this, but I will...
> >
> >
> >
> > I haven't read all the threads, but I have read the ones below
> > in this series. > >
> >
> >! ;
& gt; > The NSRCA has already set the standard and a method to determine
> > judging bias and has held a NSRCA member accountable to this standard
> > this year and the AMA sanctioned the individual. This is fact....Agree
> > with the method to determine bias (or not) it was used to impose an AMA
> > sanction on a member.
> >
> >
> >
> > IMHO this discussion suggests that bias has occurred in the D3
> > championship or possibly another at the same level FAI. If this is the
> > case the NSRCA must review this and apply the same discipline using the
> > same measurables to provide for the same sanctions.
> >
> >
> >
> > If the NSRCA is unwilling to investigate or isn't willing to use
> > the same method to determine bias, then clearly we (the NSRCA and AMA)
> > ha! ve dise nfranchised a NSRCA member and should r! ethink his sanction.
> >
> >
> >
> > Our rules and penalties must check and balance. Then they must
> > be applied to all members equally regardless of status in membership.
> > This is the only way to reduce / eliminate bias. I'm also unwilling to
> > entertain the thought the District Championship is any less important to
> > the NATS. They are both sanctioned contests ran by a CD accountable to
> > the AMA.
> >
> >
> >
> > Flame suit on...
> >
> >
> >
> > Larry Diamond
> >
> >
> >
> > ----- Original Message -----
> >
> > From: Mike Hester
> >
> > To: NSRCA Mailing List
! > &g t;
> >
> > Sent: Wednesday, October 17, 2007 2:24 PM
> > !
> ; > Subject: Re: [NSRCA-discussion] D3 Championship
> >
> >
> >
> > He's not alone. Although he probably should work on the delivery
> > ;)
> >
> >
> >
> > I would support any of the 5 proposals that Ryan listed. Judging
> > FAI can be frustrating enough, but to be told you're not getting it
> > right when you're already doing everything you know how to do, that's a
> > hard pill to swallow regardless of the statement's accuracy.
> >
> >
> >
> > You guys out there do need to realize these guys can fly...and
> > are very good...problem is they're flying against this Jason dude,
> > travels a lot, flys all the time, might even have a national title or 2
> > along the way, not sure. I'm sure ! you kno w the type. *ahem*
> >
> >
> >
> > Because my wife ! general ly keeps scores in D3, we have some
> > pretty good access to each and every score entered. I can tell you guys
> > without a doubt at times there are some SERIOUS differences in scores
> > between judges on the same round. I don't mean a little, I mean like 100
> > points on the RAW score. Even if this Jason character was flying
> > straight 10s, the differences if you work them out mean the others are
> > barely flying a straight line....and that's not the case. I have no
> > doubt these guys don't think they should be beating Jason in a 6 round
> > contest where 2 of the rounds are "F" rounds, but I am sure most people
> > would agree the scoring could use some improvements.
> >
> >
> >
> > Being one of these evil incompetent D3 masters judges *! ahem* I
> > would certainly support more of a cooperative effort than some kind of
>! > p rotest. I have been very supportive of all the FAI guys and especially
> > the scoring, and am usually the guys everybody throws something at
> > during a judging seminar because I'm trying to clarify something that
> > effects mainly FAI.
> >
> >
> >
> > I think to identify the "problem" will take a willingness to
> > recognize that the situation is caused by a LOT of factors, not any one
> > or two. If anyone's interested, I'll outline the ones I see clearly.
> >
> >
> >
> > I'm not sure if this will all have the intended effect that jim
> > was looking for in the end, but if nothing else it does draw some
> > attention to a situation and we should have a closer look.
> >
> &! gt;
> >
> > As for me, soon I'll be practicing, bracing for the onslought of
> > FAI pilots come to master! s to pu nish me =)
> >
> >
> >
> > -Mike
> >
> > ----- Original Message -----
> >
> > From: McLaughlin, Ryan (FRS.JAX)
> >
> >
> > To: NSRCA Mailing List
> >
> >
> > Sent: Wednesday, October 17, 2007 1:50 PM
> >
> > Subject: Re: [NSRCA-discussion] D3 Championship
> >
> >
> >
> > I didn't want you to stand alone in this...it's too
> > important.
> >
> > -----Original Message-----
> > From: nsrca-discussion-bounces at lists.nsrca.org
> > [mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of Woodward,
> > ! Jim
> > Sent: Wednesday, October 17, 2007 1:31 PM
> > To: NSRCA Mailing List
> > Subject: Re: [NSRCA-discussion] D3 Champ! ionship
> >
> > Ryan M.,
> >
> > I think this takes the cake as a first time
> > nsrca-list email. Thank you for the support.
> >
> > Jim W.
> >
> >
> >
> > CONFIDENTIALITY NOTICE: This e-mail message,
> > including any attachments, is for the sole use of the intended
> > recipient(s) and may contain confidential and proprietary information.
> > Any unauthorized review, use, disclosure or distribution is prohibited.
> > If you are not the intended recipient(s), please contact the sender by
> > reply e-mail and destroy all copies of the original message.
> >
> >
> >
> >
> > ________________________________
> >
&g! t; >
> > From: nsrca-discussion-bounces at lists.nsrca.org
> > [mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of
>! ; > McLaughlin, Ryan (FRS.JAX)
> > Sent: Wednesday, October 17, 2007 1:19 PM
> > To: nsrca-discussion at lists.nsrca.org
> > Subject: [NSRCA-discussion] D3 Championship
> >
> > This is my first post to the NSRCA list as I am
> > a bit 'internet shy', but I thought I might be able to add some value to
> > the FAI judging discussion Jim W started. Although I tend to err on the
> > side of diplomacy : ), I believe the feelings Jim expressed are
> > legitimate and shared by many FAI competitors throughout the country.
> > As a long time participant, I realize that bias is not a new problem but
> > I do not think we should accept this is as a "fact of life" and move on.
> > I think we have an excellent opportunity h! ere and we should make the
> > most of it.
> >
> > The primary issue to address in my opinion is
> > not disparity i! n judgi ng standards between judges, though as Earl points
> > out, this is important. Rather, it is the different standard applied to
> > pilots within one score set--i.e.. scoring a pilot lower or higher based
> > on who he is. Our penchant for creating "superstars" is the most
> > discouraging aspect of FAI competition. To remedy this, we must all
> > make a conscious decision to change a long established tradition in our
> > sport. Are we ready to take this on?
> >
> > Complaining isn't the answer and neither is
> > staying quiet, a mistake that has made the FAI competitors as
> > responsible as anyone else for the situation. To this end, I submit for
> > your review the following ideas to specifically target the F! AI bias
> > issue:
> >
> > 1. Sacrifice one FAI round per contest to serve
> > as an "open" round for all contestants e! xpected to judge FAI during the
> > event. Allow everyone to compare notes and use this as a coaching
> > opportunity.
> >
> > 2. Drop one FAI pilot to Masters at each
> > contest to serve as a judge for all rounds and use volunteers from other
> > classes to serve as the others. This would have to be an agreement made
> > among FAI pilots.
> >
> > 3. Extend the pilots meeting to go over
> > specific issues, maybe a new one or two every meet rather than just
> > pointing out the landing zone, etc. Make a "mini" judging seminar
> > mandatory each contest.
> >
> > 4. Certify judges for FAI on a volunteer basis
> > and only use "certified" judges in the contest.
> > > & gt; 5. Utilize peer judging, in other words, have
> > FAI pilots judge themselves. If a pilot is not flying, he is judging
> > his fel! low com petitors.
> >
> > Some of this may seem radical, but I believe
> > there is room for a bit of this. Pattern belongs to us right? I
> > welcome any ideas or critique anyone can offer. I will clarify any of
> > the above upon request.
> >
> > Thank you for your consideration.
> >
> > Ryan McLaughlin
> > Eustis, Florida
> >
> >
> >
> >
> > ________________________________
> >
> >
> > This message w/attachments (message) may be
> > privileged, confidential or proprietary, and if you are not an intended
> > recipient, please notify the sender, do not use or ! share i t and delete
> > it. Unless specifically indicated, this message is not an offer to sell
> > or a solicitation of any investment products or other financial product
> > or service, an official ! confirm ation of any transaction, or an official
> > statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may
> > monitor, review and retain e-communications (EC) traveling through its
> > networks/systems. The laws of the country of each sender/recipient may
> > impact the handling of EC, and EC may be archived, supervised and
> > produced in countries other than the country in which you are located.
> > This message cannot be guaranteed to be secure or error-free. This
> > message is subject to terms available at the following link:
> > http://www.ml.com/e-communications_terms/. By messaging with Merrill
> > Lynch you consent to the foregoing.
! > &g t;
> >
> > ________________________________
> >
> >
> >
> >
> >
--------------------------------------------------------------------------
> >
> > _______________________________________________
> > NSRCA-discussion mailing list
> > NSRCA-discussion at lists.nsrca.org
> > http://lists.nsrca.org/mailman/listinfo/nsrca-discussion
> >
> >
> > ________________________________
> >
> >
> > _______________________________________________
> > NSRCA-discussion mailing list
> > NSRCA-discussion at lists.nsrca.org
> > http://lists.nsrca.org/mailman/listinfo/nsrca-discussion
> >
>
> _______________________________________________
> NSRCA-discussion mailing list
> NSRCA-discussion at lists.nsrca.org
> http://lists.nsrca.org/mailman/listinfo/nsrca-discussion
------------------------------------------------------------------------------
_______________________________________________
NSRCA-discussion mailing list
NSRCA-discussion at lists.nsrca.org
http://lists.nsrca.org/mailman/listinfo/nsrca-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.nsrca.org/pipermail/nsrca-discussion/attachments/20071019/8d710f36/attachment.html
More information about the NSRCA-discussion
mailing list