[NSRCA-discussion] Judging
Derek Koopowitz
derekkoopowitz at gmail.com
Fri Oct 19 11:58:50 AKDT 2007
Jerry,
All good points and as you know with any event that is scored in a
subjective manner the results are always correct as scored. Right? Correct
means that the right people ended up in their rightful order. No one is
denigrating anyone who scores low or high but statistics prove that if one
has enough scores from a number of judges (the more judges the merrier) then
a definite mean/average can be determined for a particular maneuver and
essentially one could arrive at what a correct score should be.
Since everything a judge does is subjective the only "control" we have over
them is to ensure they are trained properly and can apply the correct
deductions. Earl's earlier note made this very clear - more training,
preferably interactive, is the only way we can control that judges will
become better judges. TBL, or any other scoring program/mechanism/judge
evaluation cannot and will not ensure better judges, but it will point out
problem judges that hopefully can be corrected with continued training.
-Derek
_____
From: nsrca-discussion-bounces at lists.nsrca.org
[mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of Jerry
Stebbins
Sent: Friday, October 19, 2007 10:04 AM
To: Discussion -NSRCA
Subject: [NSRCA-discussion] Judging
Derek, I followed your summary pretty well until the first sentence in the
third paragraph. "system works very well and in the end the correct order of
finish is picked- which is what we want"
Where is "correct " quantified?
Who says what is "correct"?
What says that one judge being low on a maneuver is "wrong"/
Sounds more like the scenario previously mentioned as the "7 to 9 syndrome"
is the safe way to score.
How many times have you scored a zero and others gave a score-in error-
because the pilot rolled the wrong way? By the way throwing out the old
requirement that the judges confer on zeros has only helped get us in this
"perceived" problem.
Sure sounds like a "dry lab" approach to me! Easy to massage data and the
algorithms to "prove" the result desired.
I went through the dataseveral years ago, that was available-and it was not
all-on the "Judges ratings", and got the same gut feeling.
What the pilot sees/flies, and his caller sees, and what each judge sees are
seldom the same for a multitude of reasons.
Also as a judge, have you ever had a "watcher" come up after a flight and
say how great the flight was- and you had just given several zeros/low
scores. Amazing how they change their perspective when you point out the
obvious errors. That is assuming your memory hasn't blanked out the past
pilot yet, and now you are getting ready for the next.
Jerry
----- Original Message -----
From: Derek Koopowitz
To: 'NSRCA Mailing List'
Sent: Thursday, October 18, 2007 11:25 PM
Subject: Re: [NSRCA-discussion] D3 Championship - Scoring
SystemOverhaul-LONG
TBL - scores go back to the pilots as is... raw. TBL is only factored once
a round is finished and the results calculated. With TBL scores are
adjusted to the whole score and result in whole numbers. Remember, the
entire score isn't thrown out - just a maneuver score. When a score is
dropped then TBL is run on the entire set of scores again to recalculate the
mean and std. deviation - it is an iterative process until all scores fall
within the range that are calculated for the pilots/judges.
TBL was used at the TOC for all years starting in 1999 onwards. It has been
used at the WC since 2001 (I think).
Bottom line - the system works very well and in the end the correct order of
finish is picked - which is what we want. In all the years of scoring with
TBL at the TOC I saw very few judges scores discarded which probably says
more about the quality of judging than anything else. I'd be very curious
what the results have been at the F3A WC's since it was implemented.
I would love to see TBL in use at the Nats for the Finals in both classes -
we certainly have enough judges to support its use.
_____
From: nsrca-discussion-bounces at lists.nsrca.org
[mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of
rcmaster199 at aol.com
Sent: Thursday, October 18, 2007 8:51 PM
To: nsrca-discussion at lists.nsrca.org
Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System
Overhaul-LONG
Earl,
If I remember correctly, some of the results from the Judge Evaluation have
been presented either in the KF or the website in past years. But I haven't
seen any results posted lately. Perhaps you are right...posting might serve
a purpose.
Curious...in the TBL method, do the adjustments to a judge's scores come in
whole or fractional numbers? How are the scoresheets that are given back to
the pilots handled.....I mean, do they show adjusted scores or as judged? Or
both?
Koop, was TBL used in the TOC?
If we have access to the algorithm, I wouldn't mind taking a peek of how the
2007 F3A Nats Final could turn out. I would bet that it wouldn't make much
difference on where the pilots placed.
Matt
-----Original Message-----
From: Earl Haury <ejhaury at comcast.net>
To: NSRCA Mailing List <nsrca-discussion at lists.nsrca.org>
Sent: Thu, Oct 18 10:24 PM
Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul
-LONG
We miss an opportunity for judges to evaluate their performance by not
distributing the results of the analysis of that performance (at major
meets). I, for one, would like the judges performance data to be published
in the K-Factor (after all - the pilots scores are published). However,
realizing that some are squeamish about this, I think that we should still
provide each judge with the analysis of his / her specific performance. At
least, each judge would then have an indication of their current skills and
any variation over the years.
Earl
----- Original Message -----
From: Derek Koopowitz
To: 'NSRCA Mailing List'
Sent: Thursday, October 18, 2007 7:53 PM
Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul
-LONG
I'll post my dissertation on TBL again since this issue seems to crop up
time and time again...
The Tarasov-Bauer-Long (TBL) Scoring method has been around since the
1970's. It has been used in the full size arena since 1978 and has been
used at every full size IAC World Championship since 1980. The TBL method
applies proven statistical probability theory to the judge's scores to
resolve style differences and bias, and to avoid the inclusion of potential
faulty judgements in contest results. To understand just why we need TBL,
and how it works, is if considerable importance to us all. It is important
to the pilots because it is there to reduce the prospect of unsatisfactory
judgements affecting their results, and it is important for judges because
it will introduce a completely new dimension of scrutiny into the sequence
totals, and it will also discreetly engage the attention of the Chief Judge,
or Contest Director, if the judges conclusions differ sufficiently from all
those other judges on the same panel.
When people get together to judge how well a pre-defined competitive task is
being tackled, the range of opinions is often diverse. This is entirely
natural among humans where the critique of any display of skill relies on
the interpretation of rapidly changing visual cues. In order to minimize
the prospect of any "way out opinions" having too much effect on the result,
it is usual to average the accumulated scores to arrive at a final
assessment, which takes everybody's opinion into account. Unfortunately
this averaging approach can achieve the opposite of what we really want,
which is to identify, and where needed, remove those "way out opinions"
because they are the ones most likely to be ill-judged and therefore should
be discarded, leaving the rest to determine the more appropriate result.
In aerobatics the process of judging according to the rulebook normally
leads to a series of generally similar personal views. However, one judge's
downgrading may be harsher or more lenient than the next, their personal
feelings toward each competitor or aircraft type may predispose toward favor
or dislike (bias), and they will almost certainly miss or see things that
other judges do not. How then can we "judge" the judges and so reach a
conclusion, which has good probability of acceptance by all the concerned
parties?
The key word is probability, the concept of a perceived level of confidence
in collectively viewed judgements has entered the frame. What we really
mean is that we must be confident that opinions pitched outside some
pre-defined level of reasonable acceptability will be identified as such and
will not be used. This sort of situation is the daily bread and butter of
well established probability theory which, when suitably applied, can
produce a very clear cut analysis of numerically expressed opinions provided
that the appropriate criteria have been carefully established beforehand.
What has been developed through several previous editions is some arithmetic
which addresses the judge's raw scores in such a way that any which are
probably unfair are discarded with an established level of confidence. To
understand the process you need only accept some quite simple arithmetic
procedures, which are central to what is called "statistical probability".
The TBL scoring system in effect does the following:
. Commonizes the judging styles.
. Computes TBL scores
. Publishes results
Commonizing the judging styles involves remodeling the scores to bring all
the judging styles to a common format and removing any natural bias between
panel members. Following some calculations, each judge's set of scores are
squeezed or stretched and moved en-bloc up or down so that the sets all show
the same overall spread and have identical averages (bias). Within each set
the pilot order and score progression must remain unaltered, but now valid
score comparisons are possible between all the panel judges on behalf of
each pilot.
Computing the TBL score involves looking at the high and low scores in each
pilot's set and throws out any that are too "far out" to be fair. This is
done by subtracting the average for the set from each one and dividing the
result by the "sample standard deviation" - if the result of this sum is
greater than 1.645 then according to statistical probability theory we can
be at least 90% confident that it is unfair, so the score is discarded.
This calculation and the mathematically derived 1.645 criteria is the key to
the correctness of the TBL process, and is based on many years of experience
by the full size aerobatics organization with contest scores at all levels.
The discarding of any scores of course changes for a pilot the average and
standard deviation of their remaining results, and so the whole process is
repeated. After several cycles any "unfair" scores will have gone, and
those that remain will all satisfy the essential 90% confidence criteria.
Publishing the results is derived by averaging each pilot's scores. The
final TBL iteration therefore has any appropriate penalty/bonus values
applied and the results are then sorted in order of descent of the total
scores to rank the pilots first to last. These final scores may, or may
not, be normalized to 1000 points, depending on the setting for the selected
class.
Educating and improving the judges is a useful by-product of this process in
that it provides all the bells and whistles how each judge has performed by
comparison with the overall judging panel average and when seen against the
90% level of confidence criteria. The TBL system will produce an analysis
showing each judge the percentage of scores accepted as "OK", and a
comparison with the panel style (spread of score) and bias (average).
Unfortunately TBL, by definition, brings with it a 10% possibility of
upsetting an honest judge's day. The trade-off is that we expect not only
to achieve a set of results with at least 90% confidence that are "fair"
every time, but that the system also provides us with a wonderful tool to
address our judging standards. TBL will ensure that every judge's opinion
has equal weight, and that each sequence score by each judge is accepted
only if it lies within an acceptable margin from the panel average. TBL,
however, by necessity takes the dominant judging panel view as the "correct"
one and it can't make right scores out of wrong ones. If 6 out of 8 judges
are distracted and make a mess out of one pilots efforts, then for TBL this
becomes the controlling assessment of that pilots performance, and the other
2 diligent judges who got it right will see their scores unceremoniously
zapped. In practice this would be extremely unusual - from the judging line
it is almost impossible to deliberately upset the final results without
collusion between a majority of the judges, and if that starts to happen
then someone is definitely on the wrong planet.
_____
From: nsrca-discussion-bounces at lists.nsrca.org
[mailto:nsrca-discussion-bounces at lists.nsrca.org] On Behalf Of
vicenterc at comcast.net
Sent: Thursday, October 18, 2007 8:11 AM
To: NSRCA Mailing List
Subject: Re: [NSRCA-discussion] D3 Championship - Scoring System Overhaul
Tony,
Do you know if the TBL system eliminates the high and low scores? I think
that is a good solution but we can not do it in local contests. Probably we
could in some contests since we have many Masters vs. F3A.
Do you know "link" where we can read about TBL system?
--
Vicente "Vince" Bortone
-------------- Original message --------------
From: "Tony" <tony at radiosouthrc.com>
This TBL will find these problems and is in use at World Champs. The
problem is that you need at least 5 judges on a line.
Tony Stillman, President
Radio South, Inc.
139 Altama Connector, Box 322
Brunswick, GA 31525
1-800-962-7802
tony at radiosouthrc.com
_____
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.nsrca.org/pipermail/nsrca-discussion/attachments/20071019/013e7cde/attachment-0001.html
More information about the NSRCA-discussion
mailing list