Ahli matematika pencetus ELO Rating

Pencetus Elo Rating, Prof. Arpard Elo

Arpad Elo adalah seorang master catur yang aktif di Federasi Catur Amerika Serikat (USCF) yang didirikan pada tahun 1939. Para USCF menggunakan sistem peringkat numerik, dirancang oleh Kenneth Harkness, untuk memungkinkan anggota untuk memantau kemajuan individu dalam hal lain selain menang dan kalah dalam sebuah pertandingan. Sistem Harkness itu cukup adil, tapi dalam beberapa situasi menimbulkan penilaian yang dianggap banyak pengamat tidak akurat. Atas nama USCF, Elo merancang sebuah sistem baru dengan basis yang lebih statistik.

Elo yang mengganti sistem-sistem sebelumnya menempatkan penghargaan kompetitif dengan sistem berdasarkan estimasi statistik. Penilaian sistem untuk penghargaan nilai olahraga sesuai dengan evaluasi subjektif dari ‘kehebatan’ dari banyak prestasi tertentu. Misalnya, memenangkan turnamen golf penting dapat bernilai lima kali poin sebanyak memenangkan sebuah turnamen yang lebih rendah.

Sebuah usaha statistik, sebaliknya, menggunakan model yang berhubungan hasil permainan untuk variabel yang mendasari mewakili kemampuan setiap pemain.

Asumsi sentral Elo adalah bahwa kemampuan catur setiap pemain dalam setiap permainan adalah variabel acak yang terdistribusi normal. Meskipun pemain bisa tampil jauh lebih baik atau lebih buruk dari satu pertandingan ke berikutnya, Elo mengasumsikan bahwa nilai rata-rata kinerja dari setiap perubahan nilai kemampuan pemain hanya diberikan perlahan-lahan secara bertahap. Elo memikirkan keterampilan yang benar pemain sebagai mean dari variabel acak yang kemampuan pemain.

Konsep Teori Matematika

Kemampuan tidak dapat diukur secara mutlak, hanya dapat disimpulkan dari menang, kalah, dan remis melawan pemain lain. Peringkat Seorang pemain tergantung pada peringkat lawan nya, dan hasilnya diperoleh ketika melawan mereka. Perbedaan relatif dalam Peringkat antara dua pemain untuk menentukan perkiraan skor yang diharapkan di antara mereka. Baik rata-rata dan penyebaran peringkat yang dapat diberikan. Elo menyarankan skala penilaian sehingga perbedaan 200 poin rating dalam catur akan berarti bahwa semakin kuat pemain memiliki skor yang diharapkan (yang pada dasarnya adalah skor rata-rata yang diharapkan) sekitar 0,75, dan USCF awalnya ditujukan untuk pemain klub rata-rata memiliki rating 1500.
Sebuah skor pemain yang diharapkan adalah kemungkinan untuk menang ditambah setengah kemungkinan remis Jadi skor yang diharapkan dari 0,75 dapat mewakili 75% kesempatan untuk menang, kesempatan 25% kalah, dan kesempatan 0% remis. Secara ekstrim hal ini dapat mewakili kesempatan 50% untuk menang, 0kalah kesempatan, dan kesempatan 50% remis. Kemungkinan remis, dari lawan yang memiliki hasil yang menentukan, tidak ditentukan dalam sistem Elo. Jadi remis dianggap setengah setengah menang dan kalah.

Teks asli:

If Player A has true strength RA and Player B has true strength RB, the exact formula (using the logistic curve) for the expected score of Player A is

E_A = \frac 1 {1 + 10^{(R_B - R_A)/400}}.

Similarly the expected score for Player B is

E_B = \frac 1 {1 + 10^{(R_A - R_B)/400}}.

This could also be expressed by

E_A = \frac{Q_A}{Q_A + Q_B}

and

E_B = \frac{Q_B}{Q_A + Q_B}

where Q_A = 10^{R_A/400} and Q_B = 10^{R_B/400}. Note that in the latter case, the same denominator applies to both expressions. This means that by studying only the numerators, we find out that the expected score for player A is QA / QB times greater than the expected score for player B. It then follows that for each 400 rating points of advantage over the opponent, the chance of winning is magnified ten times in comparison to the opponent’s chance of winning.

Also note that EA + EB = 1. In practice, since the true strength of each player is unknown, the expected scores are calculated using the player’s current ratings.

When a player’s actual tournament scores exceed his expected scores, the Elo system takes this as evidence that player’s rating is too low, and needs to be adjusted upward. Similarly when a player’s actual tournament scores fall short of his expected scores, that player’s rating is adjusted downward. Elo’s original suggestion, which is still widely used, was a simple linear adjustment proportional to the amount by which a player overperformed or underperformed his expected score. The maximum possible adjustment per game (sometimes called the K-value) was set at K = 16 for masters and K = 32 for weaker players.

Supposing Player A was expected to score EA points but actually scored SA points. The formula for updating his rating is

R_A^\prime = R_A + K(S_A - E_A).

This update can be performed after each game or each tournament, or after any suitable rating period. An example may help clarify. Suppose Player A has a rating of 1613, and plays in a five-round tournament. He loses to a player rated 1609, draws with a player rated 1477, defeats a player rated 1388, defeats a player rated 1586, and loses to a player rated 1720. His actual score is (0 + 0.5 + 1 + 1 + 0) = 2.5. His expected score, calculated according to the formula above, was (0.506 + 0.686 + 0.785 + 0.539 + 0.351) = 2.867. Therefore his new rating is (1613 + 32· (2.5 − 2.867)) = 1601, assuming that a K factor of 32 is used.

Note that while two wins, two losses, and one draw may seem like a par score, it is worse than expected for Player A because his opponents were lower rated on average. Therefore he is slightly penalized. If he had scored two wins, one loss, and two draws, for a total score of three points, that would have been slightly better than expected, and his new rating would have been (1613 + 32· (3 − 2.867)) = 1617.

This updating procedure is at the core of the ratings used by FIDE, USCF, Yahoo! Games, the ICC, and FICS. However, each organization has taken a different route to deal with the uncertainty inherent in the ratings, particularly the ratings of newcomers, and to deal with the problem of ratings inflation/deflation. New players are assigned provisional ratings, which are adjusted more drastically than established ratings.

The principles used in these rating systems can be used for rating other competitions—for instance, international football matches.

Elo ratings have also been applied to games without the possibility of draws, and to games in which the result can also have a quantity (small/big margin) in addition to the quality (win/loss). See go rating with Elo for more.

Mathematical issues

There are three main mathematical concerns relating to the original work of Professor Elo, namely the correct curve, the correct K-factor, and the provisional period crude calculations.

Most accurate distribution model

The first mathematical concern addressed by the USCF was the use of the normal distribution. They found that this did not accurately represent the actual results achieved by particularly the lower rated players. Instead they switched to a logistical distribution model, which the USCF found provided a better fit for the actual results achieved. FIDE still uses the normal distribution as the basis for rating calculations as suggested by Elo himself.

Most accurate K-factor

The second major concern is the correct “K-factor” used. The chess statistician Jeff Sonas reckons that the original K=10 value (for players rated above 2400) is inaccurate in Elo’s work. If the K-factor coefficient is set too large, there will be too much sensitivity to just a few, recent events, in terms of a large number of points exchanged in each game. Too low a K-value, and the sensitivity will be minimal, and the system will not respond quickly enough to changes in a player’s actual level of performance.

Elo’s original K-factor estimation was made without the benefit of huge databases and statistical evidence. Sonas indicates that a K-factor of 24 (for players rated above 2400) may be more accurate both as a predictive tool of future performance, and also more sensitive to performance.

Certain Internet chess sites seem to avoid a three-level K-factor staggering based on rating range. For example the ICC seems to adopt a global K=32 except when playing against provisionally rated players. The USCF (which makes use of a logistic distribution as opposed to a normal distribution) has staggered the K-factor according to three main rating ranges of:

  • Players below 2100 -> K factor of 32 used
  • Players between 2100 and 2400 -> K factor of 24 used
  • Players above 2400 -> K factor of 16 used

FIDE uses the following ranges:[13]

  • K = 25 for a player new to the rating list until he has completed events with a total of at least 30 games.
  • K = 15 as long as a player’s rating remains under 2400.
  • K = 10 once a player’s published rating has reached 2400, and he has also completed events with a total of at least 30 games. Thereafter it remains permanently at 10.

In over-the-board chess, the staggering of the K-factor is important to ensure minimal inflation at the top end of the rating spectrum. This assumption might in theory apply equally to an online chess server, as well as a standard over-the-board chess organisation such as FIDE or USCF. In theory, it would make it harder for players to get much higher ratings if their K-factor was reduced when they got over 2400 rating. However, the ICC’s help on K-factors indicates that it may simply be the choosing of opponents that enables 2800+ players to further increase their rating quite easily. This would seem to hold true, for example, if one analysed the games of a grandmaster on the ICC: one can find a string of games of opponents who are all over 3100. In over-the-board chess, it would only be in very high level all-play-all events that this player would be able to find a steady stream of 2700+ opponents – in at least a category 15+ FIDE event. A category 10 FIDE event would mean players are restricted in rating between 2476 to 2500. However, if the player entered normal Swiss-paired open over-the-board chess tournaments, he would likely meet many opponents less than 2500 FIDE on a regular basis. A single loss or draw against a player rated less than 2500 would knock the GM’s FIDE rating down significantly.

Even if the K-factor was 16, and the player defeated a 3100+ player several games in a row, his rating would still rise quite significantly in a short period of time, due to the speed of blitz games, and hence the ability to play many games within a few days. The K-factor would arguably only slow down the increases that the player achieves after each win. The evidence given in the ICC K-factor article relates to the auto-pairing system, where the maximum ratings achieved are seen to be only about 2500. So it seems that random-pairing as opposed to selective pairing is the key for combatting rating inflation at the top end of the rating spectrum, and possibly only to a much lesser extent, a slightly lower K-factor for a player >2400 rating. (source Wikipedia).

Tentang percasintb

Pengurus Persatuan Catur Seluruh Indonesia (Percasi) Provinsi NTB Ketum: M. Ikhsan Gepala Putra Sekum: Vidi Eka Kusuma, S.IP., M.Si., WNP
Pos ini dipublikasikan di Sport. Tandai permalink.

Tinggalkan komentar