[NSRCA-discussion] D3 Championship - Scoring System Overhaul - LONG

Derek Koopowitz derekkoopowitz at gmail.com
Thu Oct 18 16:53:50 AKDT 2007


I'll post my dissertation on TBL again since this issue seems to crop up
time and time again...
 
The Tarasov-Bauer-Long (TBL) Scoring method has been around since the
1970's.  It has been used in the full size arena since 1978 and has been
used at every full size IAC World Championship since 1980.  The TBL method
applies proven statistical probability theory to the judge's scores to
resolve style differences and bias, and to avoid the inclusion of potential
faulty judgements in contest results.  To understand just why we need TBL,
and how it works, is if considerable importance to us all.  It is important
to the pilots because it is there to reduce the prospect of unsatisfactory
judgements affecting their results, and it is important for judges because
it will introduce a completely new dimension of scrutiny into the sequence
totals, and it will also discreetly engage the attention of the Chief Judge,
or Contest Director, if the judges conclusions differ sufficiently from all
those other judges on the same panel.

When people get together to judge how well a pre-defined competitive task is
being tackled, the range of opinions is often diverse.  This is entirely
natural among humans where the critique of any display of skill relies on
the interpretation of rapidly changing visual cues.  In order to minimize
the prospect of any "way out opinions" having too much effect on the result,
it is usual to average the accumulated scores to arrive at a final
assessment, which takes everybody's opinion into account.  Unfortunately
this averaging approach can achieve the opposite of what we really want,
which is to identify, and where needed, remove those "way out opinions"
because they are the ones most likely to be ill-judged and therefore should
be discarded, leaving the rest to determine the more appropriate result.

In aerobatics the process of judging according to the rulebook normally
leads to a series of generally similar personal views.  However, one judge's
downgrading may be harsher or more lenient than the next, their personal
feelings toward each competitor or aircraft type may predispose toward favor
or dislike (bias), and they will almost certainly miss or see things that
other judges do not.  How then can we "judge" the judges and so reach a
conclusion, which has good probability of acceptance by all the concerned
parties?

The key word is probability, the concept of a perceived level of confidence
in collectively viewed judgements has entered the frame.  What we really
mean is that we must be confident that opinions pitched outside some
pre-defined level of reasonable acceptability will be identified as such and
will not be used.  This sort of situation is the daily bread and butter of
well established probability theory which, when suitably applied, can
produce a very clear cut analysis of numerically expressed opinions provided
that the appropriate criteria have been carefully established beforehand.

What has been developed through several previous editions is some arithmetic
which addresses the judge's raw scores in such a way that any which are
probably unfair are discarded with an established level of confidence.  To
understand the process you need only accept some quite simple arithmetic
procedures, which are central to what is called "statistical probability".

The TBL scoring system in effect does the following:

.        Commonizes the judging styles.

.        Computes TBL scores

.        Publishes results

 

Commonizing the judging styles involves remodeling the scores to bring all
the judging styles to a common format and removing any natural bias between
panel members.  Following some calculations, each judge's set of scores are
squeezed or stretched and moved en-bloc up or down so that the sets all show
the same overall spread and have identical averages (bias).  Within each set
the pilot order and score progression must remain unaltered, but now valid
score comparisons are possible between all the panel judges on behalf of
each pilot.

Computing the TBL score involves looking at the high and low scores in each
pilot's set and throws out any that are too "far out" to be fair.  This is
done by subtracting the average for the set from each one and dividing the
result by the "sample standard deviation" - if the result of this sum is
greater than 1.645 then according to statistical probability theory we can
be at least 90% confident that it is unfair, so the score is discarded.
This calculation and the mathematically derived 1.645 criteria is the key to
the correctness of the TBL process, and is based on many years of experience
by the full size aerobatics organization with contest scores at all levels.
The discarding of any scores of course changes for a pilot the average and
standard deviation of their remaining results, and so the whole process is
repeated.  After several cycles any "unfair" scores will have gone, and
those that remain will all satisfy the essential 90% confidence criteria.

Publishing the results is derived by averaging each pilot's scores.  The
final TBL iteration therefore has any appropriate penalty/bonus values
applied and the results are then sorted in order of descent of the total
scores to rank the pilots first to last.  These final scores may, or may
not, be normalized to 1000 points, depending on the setting for the selected
class.

Educating and improving the judges is a useful by-product of this process in
that it provides all the bells and whistles how each judge has performed by
comparison with the overall judging panel average and when seen against the
90% level of confidence criteria.  The TBL system will produce an analysis
showing each judge the percentage of scores accepted as "OK", and a
comparison with the panel style (spread of score) and bias (average).

Unfortunately TBL, by definition, brings with it a 10% possibility of
upsetting an honest judge's day.  The trade-off is that we expect not only
to achieve a set of results with at least 90% confidence that are "fair"
every time, but that the system also provides us with a wonderful tool to
address our judging standards.  TBL will ensure that every judge's opinion
has equal weight, and that each sequence score by each judge is accepted
only if it lies within an acceptable margin from the panel average.  TBL,
however, by necessity takes the dominant judging panel view as the "correct"
one and it can't make right scores out of wrong ones.  If 6 out of 8 judges
are distracted and make a mess out of one pilots efforts, then for TBL this
becomes the controlling assessment of that pilots performance, and the other
2 diligent judges who got it right will see their scores unceremoniously
zapped.  In practice this would be extremely unusual - from the judging line
it is almost impossible to deliberately upset the final results without
collusion between a majority of the judges, and if that starts to happen
then someone is definitely on the wrong planet.

 
 
 
  _____  

From: nsrca-discussion-bounces at lists.nsrca.org
[mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of
vicenterc at comcast.net
Sent: Thursday, October 18, 2007 8:11 AM
To: NSRCA Mailing List
Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul


Tony,
 
Do you know if the TBL system eliminates the high and low scores?  I think
that is a good solution but we can not do it in local contests.  Probably we
could in some contests since we have many Masters vs. F3A.  
 
Do you know "link" where we can read about TBL system?
 
--
Vicente "Vince" Bortone
 

-------------- Original message -------------- 
From: "Tony" <tony at radiosouthrc.com> 


This TBL will find these problems and is in use at World Champs.  The
problem is that you need at least 5 judges on a line.

 

Tony Stillman, President

Radio South, Inc.

139 Altama Connector, Box 322

Brunswick, GA  31525

1-800-962-7802

tony at radiosouthrc.com


  _____  


From: nsrca-discussion-bounces at lists.nsrca.org
[mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of
vicenterc at comcast.net
Sent: Thursday, October 18, 2007 10:44 AM
To: NSRCA Mailing List; NSRCA Mailing List
Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul

 

I agree with George.  If I remember right from statistics courses 20 years
ago, this type of problem follows the normal distribution or bell shape
curve.  In order to have any significant precision in the scoring system we
will need to have at least 33 judges per round.  We all know that  it is
impossible for us to have 33 judges per round.  I also agree with George
that at the end of the contest the winner is usually the best pilot and if
you ask around very high percentage will agree with the results.

 

Well, we need a PHD in statistics to help us out.

 

--
Vicente "Vince" Bortone

 

-------------- Original message -------------- 
From: <glmiller3 at suddenlink.net> 

> I've said this before and most people don't "get it", but we are asking
more 
> precision from our scoring system than is mathematically possible. 
> 
> In the case of FAI scores, the raw scores are rounded to one significant
digit 
> (whole numbers only from 1-10). This means that after any mathematical 
> manipulation, the result is only "Accurate" to one digit and the result
should 
> be rounded to that digit using some standardized "rounding algorithm"....
We are 
> manipulating a single digit of significance and basing outcomes on up to
EIGHT 
> digits (1234.5678 ). Mathematically speaking any "normalized score"
between 
> 950.0000 and 1000.0000 is the same score.....900 to 949; 850-900; etc. All

> other problems of judgin! g! incon sistency, bias, averaging, etc pale in 
> comparison. 
> 
> As I've said before, I'm astounded that a system that is so mathematically

> flawed can provide results that are as good as they are....for my part, I 
> usually feel like I have been ranked pretty fairly compared to the other
pilots 
> in my class even though the ranking is based on statistically meaningless 
> numbers. 
> 
> George 
> 
> 
> ---- "Woodward wrote: 
> > Guys - 
> > 
> > 
> > 
> > 1. Why do we average judge's scores together? 
> > 
> > a. The whole system is predicated on judges being 
> > competent/consistent/correct/un-biased. If this first rule is violated 
> > ranging from small to 100 point raw score difference, the idea of now 
> > "averaging" the score also has no validity. 
> > b. Why not just let each judge's score stand as! ! is, < B R>> >
unaltered, and produce two normalized scores per round? 
> > c. Averaging the scores together may be doing a disservice 
> > to everyone. 
> > d. You would basically have two sets of scores per round. 
> > IE you may end up with 1000 points on one card, and an 800 on the other.

> > 
> > e. This would identify immediately any cause for concern. 
> > f. Would it provide an immediate training tool back to our 
> > system of pilots, judges, and CD? 
> > g. Attempting to "un-average" the scores to determine what 
> > happened takes place anyway on the flight line, after the scores are 
> > printed. 
> > 
> > 2. Are we asking too much from judges? 
> > 
> > a. Is applying downgrades, then counting backwards from 10, 
> > in the context of "turnaround" pattern where maneuvers can happen 
>! ; > back-to-back quickly, too d! ifficul t across the full spectrum of 
> > competitor/judges? 
> > 
> > 3. Dropping Rounds: 
> > 
> > a. Is this still a good idea? 
> > b. I wouldn't mind dropping one round, but it was explained 
> > to me last night that this is an artifact from the days of when people 
> > would break a prop on touch-n-goes, and in general lower equipment 
> > reliability. 
> > c. In the age of higher equipment reliability, is the 
> > 'round-drop' scenarios still good, left as is? 
> > 
> > 4. Dropping Rounds - Take 2: 
> > 
> > a. In the context of point #1, maybe we should be allowed 
> > to drop the lowest scored judge from each round, versus the entire 
> > round. 
> > b. Why should the pilot drop the entire round, when one 
> > judge may have scored him 1000 points, and the other 800? ! 
> ; > c. If you end up with a! tie at the end, you just keep 
> > counting "1000's" until the other pilot runs out - tie is now "untied." 
> > 
> > 
> > 
> > I hope some smart guys can chime in on potential over-hauling idea. 
> > 
> > 
> > 
> > 
> > 
> > ________________________________ 
> > 
> > From: nsrca-discussion-bounces at lists.nsrca.org 
> > [mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of Lisa & 
> > Larry 
> > Sent: Wednesday, October 17, 2007 10:27 PM 
> > To: 'NSRCA Mailing List' 
> > Subject: Re: [NSRCA-discussion] D3 Championship 
> > 
> > 
> > 
> > OK...I probably shouldn't start this, but I will... 
> > 
> > 
> > 
> > I haven't read all the threads, but I have read the ones below 
> > in this series. > > 
> > 
> >! ; 
& gt; > The NSRCA has already set the standard and a method to determine 
> > judging bias and has held a NSRCA member accountable to this standard 
> > this year and the AMA sanctioned the individual. This is fact....Agree 
> > with the method to determine bias (or not) it was used to impose an AMA 
> > sanction on a member. 
> > 
> > 
> > 
> > IMHO this discussion suggests that bias has occurred in the D3 
> > championship or possibly another at the same level FAI. If this is the 
> > case the NSRCA must review this and apply the same discipline using the 
> > same measurables to provide for the same sanctions. 
> > 
> > 
> > 
> > If the NSRCA is unwilling to investigate or isn't willing to use 
> > the same method to determine bias, then clearly we (the NSRCA and AMA) 
> > ha! ve dise nfranchised a NSRCA member and should r! ethink his
sanction. 
> > 
> > 
> > 
> > Our rules and penalties must check and balance. Then they must 
> > be applied to all members equally regardless of status in membership. 
> > This is the only way to reduce / eliminate bias. I'm also unwilling to 
> > entertain the thought the District Championship is any less important to

> > the NATS. They are both sanctioned contests ran by a CD accountable to 
> > the AMA. 
> > 
> > 
> > 
> > Flame suit on... 
> > 
> > 
> > 
> > Larry Diamond 
> > 
> > 
> > 
> > ----- Original Message ----- 
> > 
> > From: Mike Hester 
> > 
> > To: NSRCA Mailing List 
! > &g t; 
> > 
> > Sent: Wednesday, October 17, 2007 2:24 PM 
> > ! 
> ; > Subject: Re: [NSRCA-discussion] D3 Championship 
> > 
> > 
> > 
> > He's not alone. Although he probably should work on the delivery 
> > ;) 
> > 
> > 
> > 
> > I would support any of the 5 proposals that Ryan listed. Judging 
> > FAI can be frustrating enough, but to be told you're not getting it 
> > right when you're already doing everything you know how to do, that's a 
> > hard pill to swallow regardless of the statement's accuracy. 
> > 
> > 
> > 
> > You guys out there do need to realize these guys can fly...and 
> > are very good...problem is they're flying against this Jason dude, 
> > travels a lot, flys all the time, might even have a national title or 2 
> > along the way, not sure. I'm sure ! you kno w the type. *ahem* 
> > 
> > 
> > 
> > Because my wife ! general ly keeps scores in D3, we have some 
> > pretty good access to each and every score entered. I can tell you guys 
> > without a doubt at times there are some SERIOUS differences in scores 
> > between judges on the same round. I don't mean a little, I mean like 100

> > points on the RAW score. Even if this Jason character was flying 
> > straight 10s, the differences if you work them out mean the others are 
> > barely flying a straight line....and that's not the case. I have no 
> > doubt these guys don't think they should be beating Jason in a 6 round 
> > contest where 2 of the rounds are "F" rounds, but I am sure most people 
> > would agree the scoring could use some improvements. 
> > 
> > 
> > 
> > Being one of these evil incompetent D3 masters judges *! ahem* I 
> > would certainly support more of a cooperative effort than some kind of 
>! > p rotest. I have been very supportive of all the FAI guys and
especially 
> > the scoring, and am usually the guys everybody throws something at 
> > during a judging seminar because I'm trying to clarify something that 
> > effects mainly FAI. 
> > 
> > 
> > 
> > I think to identify the "problem" will take a willingness to 
> > recognize that the situation is caused by a LOT of factors, not any one 
> > or two. If anyone's interested, I'll outline the ones I see clearly. 
> > 
> > 
> > 
> > I'm not sure if this will all have the intended effect that jim 
> > was looking for in the end, but if nothing else it does draw some 
> > attention to a situation and we should have a closer look. 
> > 
> &! gt; 
> > 
> > As for me, soon I'll be practicing, bracing for the onslought of 
> > FAI pilots come to master! s to pu nish me =) 
> > 
> > 
> > 
> > -Mike 
> > 
> > ----- Original Message ----- 
> > 
> > From: McLaughlin, Ryan (FRS.JAX) 
> > 
> > 
> > To: NSRCA Mailing List 
> > 
> > 
> > Sent: Wednesday, October 17, 2007 1:50 PM 
> > 
> > Subject: Re: [NSRCA-discussion] D3 Championship 
> > 
> > 
> > 
> > I didn't want you to stand alone in this...it's too 
> > important. 
> > 
> > -----Original Message----- 
> > From: nsrca-discussion-bounces at lists.nsrca.org 
> > [mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of Woodward,

> > ! Jim 
> > Sent: Wednesday, October 17, 2007 1:31 PM 
> > To: NSRCA Mailing List 
> > Subject: Re: [NSRCA-discussion] D3 Champ! ionship 
> > 
> > Ryan M., 
> > 
> > I think this takes the cake as a first time 
> > nsrca-list email. Thank you for the support. 
> > 
> > Jim W. 
> > 
> > 
> > 
> > CONFIDENTIALITY NOTICE: This e-mail message, 
> > including any attachments, is for the sole use of the intended 
> > recipient(s) and may contain confidential and proprietary information. 
> > Any unauthorized review, use, disclosure or distribution is prohibited. 
> > If you are not the intended recipient(s), please contact the sender by 
> > reply e-mail and destroy all copies of the original message. 
> > 
> > 
> > 
> > 
> > ________________________________ 
> > 
&g! t; > 
> > From: nsrca-discussion-bounces at lists.nsrca.org 
> > [mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of 
>! ; > McLaughlin, Ryan (FRS.JAX) 
> > Sent: Wednesday, October 17, 2007 1:19 PM 
> > To: nsrca-discussion at lists.nsrca.org 
> > Subject: [NSRCA-discussion] D3 Championship 
> > 
> > This is my first post to the NSRCA list as I am 
> > a bit 'internet shy', but I thought I might be able to add some value to

> > the FAI judging discussion Jim W started. Although I tend to err on the 
> > side of diplomacy : ), I believe the feelings Jim expressed are 
> > legitimate and shared by many FAI competitors throughout the country. 
> > As a long time participant, I realize that bias is not a new problem but

> > I do not think we should accept this is as a "fact of life" and move on.

> > I think we have an excellent opportunity h! ere and we should make the 
> > most of it. 
> > 
> > The primary issue to address in my opinion is 
> > not disparity i! n judgi ng standards between judges, though as Earl
points 
> > out, this is important. Rather, it is the different standard applied to 
> > pilots within one score set--i.e.. scoring a pilot lower or higher based

> > on who he is. Our penchant for creating "superstars" is the most 
> > discouraging aspect of FAI competition. To remedy this, we must all 
> > make a conscious decision to change a long established tradition in our 
> > sport. Are we ready to take this on? 
> > 
> > Complaining isn't the answer and neither is 
> > staying quiet, a mistake that has made the FAI competitors as 
> > responsible as anyone else for the situation. To this end, I submit for 
> > your review the following ideas to specifically target the F! AI bias 
> > issue: 
> > 
> > 1. Sacrifice one FAI round per contest to serve 
> > as an "open" round for all contestants e! xpected to judge FAI during
the 
> > event. Allow everyone to compare notes and use this as a coaching 
> > opportunity. 
> > 
> > 2. Drop one FAI pilot to Masters at each 
> > contest to serve as a judge for all rounds and use volunteers from other

> > classes to serve as the others. This would have to be an agreement made 
> > among FAI pilots. 
> > 
> > 3. Extend the pilots meeting to go over 
> > specific issues, maybe a new one or two every meet rather than just 
> > pointing out the landing zone, etc. Make a "mini" judging seminar 
> > mandatory each contest. 
> > 
> > 4. Certify judges for FAI on a volunteer basis 
> > and only use "certified" judges in the contest. 
> > > & gt; 5. Utilize peer judging, in other words, have 
> > FAI pilots judge themselves. If a pilot is not flying, he is judging 
> > his fel! low com petitors. 
> > 
> > Some of this may seem radical, but I believe 
> > there is room for a bit of this. Pattern belongs to us right? I 
> > welcome any ideas or critique anyone can offer. I will clarify any of 
> > the above upon request. 
> > 
> > Thank you for your consideration. 
> > 
> > Ryan McLaughlin 
> > Eustis, Florida 
> > 
> > 
> > 
> > 
> > ________________________________ 
> > 
> > 
> > This message w/attachments (message) may be 
> > privileged, confidential or proprietary, and if you are not an intended 
> > recipient, please notify the sender, do not use or ! share i t and
delete 
> > it. Unless specifically indicated, this message is not an offer to sell 
> > or a solicitation of any investment products or other financial product 
> > or service, an official ! confirm ation of any transaction, or an
official 
> > statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may

> > monitor, review and retain e-communications (EC) traveling through its 
> > networks/systems. The laws of the country of each sender/recipient may 
> > impact the handling of EC, and EC may be archived, supervised and 
> > produced in countries other than the country in which you are located. 
> > This message cannot be guaranteed to be secure or error-free. This 
> > message is subject to terms available at the following link: 
> > http://www.ml.com/e-communications_terms/. By messaging with Merrill 
> > Lynch you consent to the foregoing. 
! > &g t; 
> > 
> > ________________________________ 
> > 
> > 
> > 
> > 
> > 


  _____  



> > 
> > _______________________________________________ 
> > NSRCA-discussion mailing list 
> > NSRCA-discussion at lists.nsrca.org 
> > http://lists.nsrca.org/mailman/listinfo/nsrca-discussion 
> > 
> > 
> > ________________________________ 
> > 
> > 
> > _______________________________________________ 
> > NSRCA-discussion mailing list 
> > NSRCA-discussion at lists.nsrca.org 
> > http://lists.nsrca.org/mailman/listinfo/nsrca-discussion 
> > 
> 
> _______________________________________________ 
> NSRCA-discussion mailing list 
> NSRCA-discussion at lists.nsrca.org 
> http://lists.nsrca.org/mailman/listinfo/nsrca-discussion 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.nsrca.org/pipermail/nsrca-discussion/attachments/20071019/855be482/attachment.html 


More information about the NSRCA-discussion mailing list