User:ChenzwBot

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Crystal Clear action run.png This user is a bot run by Chenzw (talk).

It is used for helping to make many small changes that would take a long time for a person to do alone.
Administrators: if this bot isn't working right or doing bad things, please stop it.


ChenzwBot This user is a Bot
ChenzwBot (Talk · Contribs)
ChenzwBot.png
ChenzwBot patrols the sea of recent changes.
Operator:Chenzw
Flagged?Yes (11 April 2008)
Edit rate:Variable (Anti-vandalism)
Edit period/s:Always
Automatic or manual?Automatic
Programming language/s:PHP and Python
Source code published?https://gitlab.com/antivandalbot-ng (partial)
Emergency shutoff-compliant?No

This bot reverts vandalism on the wiki. It is a rewrite of User:GoblinBot4's (and User:Chris G Bot) code, and powered by the revscoring library.

History

  • Since 2010: Anti-vandalism task begins, using Chris G Bot's code. Vandalism detection was achieved by evaluating edits using regular expressions. Prone to low detection and high false positive rates.
  • Approximately Dec 2015: Bot begins using the revscoring library (which powers ORES) to extract features (e.g. numbers of characters added/removed) about each edit. Vandalism probability is predicted by a Random forest classifier.
  • Mid-2016: Bot core rewritten in line with the reactor design pattern.
  • 7 May 2018: Bot core rewritten (again) in Python, which is vastly more efficient than the original PHP implementation. Classifier changed to XGBoost.
  • Mid-October 2019: Classifier changed to LightGBM, with substantial improvements to how each diff is evaluated. Words added by editors are transformed to tf–idf vectors, and fed into a separate Bayesian classifier. These words are also tagged as nouns/verbs/pronouns etc., with the counts of the various categories becoming new inputs for the main LightGBM classifier.

Summary of algorithm

Write-up in progress.