[NSRCA-discussion] World F3A contest

Tue Aug 20 03:46:10 AKDT 2013

Sticking with something a little more abstract.

Does something need to be broken in order to improve it?  

There will be a K-Factor article coming soon on TBL so if you don't know how
it works, you will soon!  I was torn about responding before the article
appears as I would like save the discussion for after you've all had a
chance to take it in.  But maybe the stage can be set for things to think
about.

No changes are needed to use TBL in the F3A Nat's; it's already in the rule
book.  The question is should we use it?  Dropping high/low discards 40% of
the judges' scores when 5 judges are used.  TBL will discard less than 10%.

The only AMA event that can make use of TBL is the Nat's Masters Finals as a
minimum 5 judges are needed.

Judging at the Nat's is very good and we don't see the large spreads like
they do at the Worlds.  This means TBL will not change the overall standing
one iota.  So why not run it?  In the off chance that one judge is deemed
biased, his score will be removed and all can be made aware.  Bias can be
completely unintentional, so it should not be taken as a negative (unless it
happens repeatedly!)   

What I propose will make the process very transparent so everyone can see
how the results were calculated.there is no black box conspiracy, and there
is nothing artificial, contrived or random about TBL.  

From: nsrca-discussion-bounces at lists.nsrca.org
[mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of
rcmaster199 at aol.com
Sent: Monday, August 19, 2013 12:45 PM
To: nsrca-discussion at lists.nsrca.org
Subject: Re: [NSRCA-discussion] World F3A contest

Scott Smith recently reached out to Don Ramsey and me regarding TBL and it's
first born, FPS, (Fair Play System) which is used in IAC competitions. TBL
essentially changes the scores of whole flights in toto for any given judge,
while FPS takes it down a step to the individual maneuver, and changes a
judge's given scores, maneuver for maneuver, as assessed by the statistical
package. 

To me, it is fundamentally wrong to artificially change a judge's score
based on statistics, just to "normalize" the whole panel of judges and
reduce the scoring variation, judge to judge and judge panel to judge panel.
In the examples we discussed, pilots all placed the same with straight up
scores and with TBL applied. But I am not convinced that that will always be
true. In one example in fact, not only did the average scores each judge
gave became constant, so did each judge's standard deviation. TBL is that
powerful and that augmenting.....

I want to keep an open mind about statistical augmentation of the scores a
judge gives. I am leaning towards FPS more but still, it will be judging by
statistics so to speak. BTW---For this to work, Perceived Zeroes will
require input from the Chief Judge in every case before the zeroes become
Hard Zeroes, for all judges. Part of the existing Judge's Rules will require
revision....National Comps are one thing, and local comps something
else??.....

I think there should be lots more conversation before any changes are
implemented. The main question to answer is "Is the present system broken?"

Regards

MattK

-----Original Message-----
From: Ryan Smith <smaragdz at comcast.net>
To: 'General pattern discussion' <nsrca-discussion at lists.nsrca.org>
Sent: Sun, Aug 18, 2013 7:51 pm
Subject: Re: [NSRCA-discussion] World F3A contest

TBL are intials, standing for Tarasov, Bauer, Long; the people that came up
with it. Below is an explanation scalped from a post/email that Derek
Koopowitz wrote a while back.

The Tarasov-Bauer-Long (TBL) Scoring method has been around since the
1970's.

It has been used in the full size arena since 1978 and has been used at
every full size IAC World Championship since 1980. The TBL method applies
proven statistical probability theory to the judge's scores to resolve style
differences and bias, and to avoid the inclusion of potential faulty
judgements in contest results.

Why we need TBL

To understand just why we need TBL, and how it works, is of considerable
importance to us all. It is important to the pilots because it is there to
reduce the prospect of unsatisfactory judgements affecting their results,
and it is important for judges because it will introduce a completely new
dimension of scrutiny into the sequence totals, and it will also discreetly
engage the attention of the Chief Judge, or Contest Director, if the judges
conclusions differ sufficiently from all those other judges on the same
panel.

When people get together to judge how well a pre-defined competitive task is
being tackled, the range of opinions is often diverse. This is entirely
natural among humans where the critique of any display of skill relies on
the interpretation of rapidly changing visual cues. In order to minimize the
prospect of any "way out opinions" having too much effect on the result, it
is usual to average the accumulated scores to arrive at a final assessment,
which takes everybody's opinion into account.

Unfortunately this averaging approach can achieve the opposite of what we
really want, which is to identify, and where needed, remove those "way out
opinions" because they are the ones most likely to be ill-judged and
therefore should be discarded, leaving the rest to determine the more
appropriate result. In aerobatics the process of judging according to the
rulebook normally leads to a series of generally similar personal views.
However, one judge's downgrading may be harsher or more lenient than the
next, their personal feelings toward each competitor or aircraft type may
predispose toward favor or dislike (bias), and they will almost certainly
miss or see things that other judges do not.

How then can we "judge" the judges and so reach a conclusion, which has good
probability of acceptance by all the concerned parties? The key word is
probability, the concept of a perceived level of confidence in collectively
viewed judgements has entered the frame. What we really mean is that we must
be confident that opinions pitched outside some pre-defined level of
reasonable acceptability will be identified as such and will not be used.
This sort of situation is the daily bread and butter of well established
probability theory which, when suitably applied, can produce a very clear
cut analysis of numerically expressed opinions provided that the appropriate
criteria have been carefully established beforehand.

What has been developed through several previous editions is some arithmetic
which addresses the judge's raw scores in such a way that any which are
probably unfair are discarded with an established level of confidence. To
understand the process you need only accept some quite simple arithmetic
procedures, which are central to what is called "statistical probability".
The TBL scoring system in effect does the following:

* Communizes the judging styles.

* Computes TBL scores

* Publishes results

Communizing the judging styles involves remodelling the scores to bring all
the judging styles to a common format and removing any natural bias between
panel members. Following some calculations, each judge's set of scores are
squeezed or stretched and moved en-bloc up or down so that the sets all show
the same overall spread and have identical averages (bias). Within each set
the pilot order and score progression must remain unaltered, but now valid
score comparisons are possible between all the panel judges on behalf of
each pilot.

Computing the TBL score involves looking at the high and low scores in each
pilot's set and throws out any that are too "far out" to be fair. This is
done by subtracting the average for the set from each one and dividing the
result by the "sample standard deviation" - if the result of this sum is
greater than 1.645 then according to statistical probability theory we can
be at least 90% confident that it is unfair, so the score is discarded.

This calculation and the mathematically derived 1.645 criteria is the key to
the correctness of the TBL process, and is based on many years of experience
by the full size aerobatics organization with contest scores at all levels.

The discarding of any scores of course changes for a pilot the average and
standard deviation of their remaining results, and so the whole process is
repeated. After several cycles any "unfair" scores will have gone, and those
that remain will all satisfy the essential 90% confidence criteria.

Publishing the results is derived by averaging each pilot's scores. The
final TBL iteration therefore has any appropriate penalty/bonus values
applied and the results are then sorted in order of descent of the total
scores to rank the pilots first to last.

These final scores may, or may not, be normalized to 1000 points, depending
on the setting for the selected class. Educating and improving the judges is
a useful by-product of this process in that it provides all the bells and
whistles how each judge has performed by comparison with the overall judging
panel average and when seen against the 90% level of confidence criteria.

The TBL system will produce an analysis showing each judge the percentage of
scores accepted as "OK", and a comparison with the panel style (spread of
score) and bias (average). Unfortunately TBL, by definition, brings with it
a 10% possibility of upsetting an honest judge's day. The trade-off is that
we expect not only to achieve a set of results with at least 90% confidence
that are "fair" every time, but that the system also provides us with a
wonderful tool to address our judging standards. TBL will ensure that every
judge's opinion has equal weight, and that each sequence score by each judge
is accepted only if it lies within an acceptable margin from the panel
average.

TBL, however, by necessity takes the dominant judging panel view as the
"correct" one and it can't make right scores out of wrong ones. If 6 out of
8 judges are distracted and make a mess out of one pilots efforts, then for
TBL this becomes the controlling assessment of that pilots performance, and
the other 2 diligent judges who got it right will see their scores
unceremoniously zapped. In practice this would be extremely unusual - from
the judging line it is almost impossible to deliberately upset the final
results without collusion between a majority of the judges, and if that
starts to happen then someone is definitely on the wrong planet.

From: nsrca-discussion-bounces at lists.nsrca.org
[mailto:nsrca-discussion-bounces at lists.nsrca.org
<mailto:nsrca-discussion-bounces at lists.nsrca.org?> ] On Behalf Of Jeff and
Claire
Sent: Sunday, August 18, 2013 6:47 PM
To: 'General pattern discussion'
Subject: Re: [NSRCA-discussion] World F3A contest

What does TBL stand for?

From: nsrca-discussion-bounces at lists.nsrca.org
[mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of Jon Lowe
Sent: Sunday, August 18, 2013 3:54 PM
To: nsrca-discussion at lists.nsrca.org
Subject: Re: [NSRCA-discussion] World F3A contest

After all pilots fly in front of each set of judges on day 4 of prelims, I
think.  That would be part of the normalization process. At least that is
how I remember it from previous WC's.

Jon

-----Original Message-----
From: John Fuqua <johnfuqua at embarqmail.com>
To: 'General pattern discussion' <nsrca-discussion at lists.nsrca.org>
Sent: Sun, Aug 18, 2013 4:40 pm
Subject: Re: [NSRCA-discussion] World F3A contest

When do they do the TBL?

From: nsrca-discussion-bounces at lists.nsrca.org
[mailto:nsrca-discussion-bounces at lists.nsrca.org
<mailto:nsrca-discussion-bounces at lists.nsrca.org?> ] On Behalf Of Jon Lowe
Sent: Sunday, August 18, 2013 3:33 PM
To: nsrca-discussion at lists.nsrca.org
Subject: Re: [NSRCA-discussion] World F3A contest

A click on the Team USA logo on the NSRCA home page takes you to the Team
USA website.  That has a link to Cindy Wickhizer's  page:

https://2013worldsteamusa.shutterfly.com/, 

which has been where Mark is sending info.  He and I both previously
announced that on this list.

Jon

-----Original Message-----
From: Gordon Seeling <gseeling at q.com>
To: nsrca-discussion <nsrca-discussion at lists.nsrca.org>
Sent: Sun, Aug 18, 2013 2:35 pm
Subject: [NSRCA-discussion] World F3A contest

is there a computer man in the NSRCA ??if there is ,please post the 
results on the nsrca web site, so rank & file will know what is going on.
_______________________________________________
NSRCA-discussion mailing list
NSRCA-discussion at lists.nsrca.org
http://lists.nsrca.org/mailman/listinfo/nsrca-discussion
_______________________________________________
NSRCA-discussion mailing list
NSRCA-discussion at lists.nsrca.org
http://lists.nsrca.org/mailman/listinfo/nsrca-discussion
_______________________________________________
NSRCA-discussion mailing list
NSRCA-discussion at lists.nsrca.org
http://lists.nsrca.org/mailman/listinfo/nsrca-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nsrca.org/pipermail/nsrca-discussion/attachments/20130820/83dddf09/attachment.html>