Tardis:What SpellBot actually corrects

Because even the most conscientious of editors will occasionally make spelling errors, there is a need to have bot enforcement of the spelling policy. A comprehensive list of the differences between British and American spellings has been compiled, and is being coded for bot use as of the second week of June, 2011. This page will see heavy updating throughout that week as the list is fully coded.

Following is the raw code of that boy routine, so that all users may see what exactly the bot is checking for.

Words the bot will not check for
Some words are beyond the capability of the bot, because they are valid spellings (even if of different words) in British English. This list includes:
 * Check. Americans use this word to mean not only a verb meaning to investigate, but also a the noun, which is a financial instrument.  Because the first meaning is valid in BrEng, the bot can't be program to correct the other usage.  We'd end up with sentences like:
 * The Doctor chequed on Sarah Jane in her hospital room before going to the pathology lab.


 * Tire. Both sides of the Atlantic use tire as a verb.  It's again the noun that's problematic.  Americans view tire as the correct spelling for what the British would call a tyre.  The bot can't figure this one out, so it doesn't even try.

How to read the code
The code works by telling the bot to look for the word described before the comma. Then it replaces it with the word after the comma. A most basic expression would be:
 * {u'color',u'colour')

This looks for the American "color", then replaces it with the British "colour".

Because typing every permutation of a word, including all words that share the same root and capitalised variants, would be very time-consuming, most of the code won't work in such a simplistic way. Most of it uses a "regular expression" — or regex — to find a lot of hits with just one line. Here's an explanation of the regex used in this code:
 * The expression ([Cc]) means "look for either capitalised or lowercase versions of the letter C
 * (.?) means, "You, Mr. Fancy Computer bot thing, might find some more letters to the right of this point. Grab 'em all up to the next space only."
 * /1 means, "take whatever is in the first parentheses and put it here"
 * /2 means, "take whatever is in the second parentheses and put it here"

Thus, if we have the expression,
 * (r'([Cc])apitaliz(.?)', r'\1capitalis\2')

It means, roughly,
 * Look for all words, beginning with either a capital or lowercase C, which are followed by the letters "apitaliz" + any other letters you find until the next space. Then, keep the form of the letter c that you find, stick on "apitalis", and add back in any letters you orginally found after the "z".

In other words, find, Capitaliz-, keep the C capitalised, switch the z to an s, then stick on "-e', "-ing", "-ed", or "-ation", as appropriate.

Many differences in British/American spelling have to do with just the sort of one-letter-before-the-suffix switchout. Some are more complicated, and have to be dealt with on a more individual, and less automated, basis.

The code
The following code will change over time, as more words are added. The final word in the English language that has a British/American difference is yogurts. Once you see that word on this list, you'll know the bot is fully programmed.

fixes['spelling'] = { 'regex': True, 'recursive': True, 'msg': { 'en':u'Enforcing spelling policy.' },   'replacements': [ (u'accessorize', u'accessorise'), (u'accessorized', u'accessorised'), (u'accessorizes', u'accessorises'), (u'accessorizing', u'accessorising'), (u'acclimitization',u'acclimatisation'), (u'acclimatize',u'acclimatise'), (u'acclimatized',u'acclimatised'), (u'acclimatizes',u'acclimatises'), (u'acclimatizing',u'acclimatising'), (u'accounterments',u'accoutrements'), (u'eon',u'aeon'), (u'eons',u'aeons'), (u'aerogram',u'aerogramme'), (u'aerograms',u'aerogrammes'), (u'esthete',u'aesthete'), (u'esthetes',u'aesthetes'), (u'esthetic',u'aesthetic'), (u'esthetically', u'aesthetically'), (u'ethetics', u'aesthetics'), (u'etiology',u'aetiology'), (u'aging',u'ageing'), (u'aggrandizement',u'aggrandisement'), (u'agonize', u'agonise'), (u'agonized',u'agonised'), (u'agonizes',u'agonises'), (u'agonizing',u'agonising'), (u'agonizingly',u'agonisingly'), (u'almanac',u'almanack'), (u'almanac',u'almanacks'), (u'aluminum', u'aluminium'), (u'amortizable',u'amortisable'), (u'amortization',u'amortisation'), (u'amortizations',u'amortisations'), (u'amortize',u'amortise'), (u'amortized',u'amortised'), (u'amortizes',u'amortises'), (u'amortizing',u'amortising'), (u'ampitheater',u'amphitheatre'), (u'ampitheaters',u'amphitheatres'), (u'anemia',u'anaemia'), (u'anemic',u'anaemic'), (u'anesthesia',u'anaesthesia'), (u'anesthetic',u'anaesthetic'), (u'anesthetics',u'anaesthetics'), (u'anesthetize',u'anaesthetise'), (u'anesthetized',u'anaesthetised'), (u'anesthetizes',u'anaesthetises'), (u'anesthetizing',u'anaesthetising'), (u'anesthetist',u'anaesthetist'), (u'anesthetists',u'anaesthetists'), (u'analog',u'analogue'), (u'analogs',u'analogues'), (u'analyze',u'analyse'), (u'analyzed',u'analysed'), (u'analyzes',u'analyses'), (u'analyzing',u'analysing'), (u'anglicize',u'anglicise'), (u'anglicized',u'anglicised'), (u'anglicizes',u'anglicises'), (u'anglicizing',u'anglicising'), (u'annualized',u'annualised'), (u'antagonize',u'antagonise'), (u'antagonized',u'antagonised'), (u'antagonizes',u'antagonises'), (u'antagonizing',u'antagonising'), (u'apologize',u'apologise'), (u'apologized',u'apologised'), (u'apologizes',u'apologises'), (u'apologizing',u'apologising'), (u'appall',u'appal'), (u'appalls',u'appals'), (u'appetizer',u'appetiser'), (u'appetizers',u'appetisers'), (u'appetizing',u'appetising'), (u'appetizingly',u'appetisingly'), (u'arbor',u'arbour'), (u'arbors',u'arbours'), (u'archeological',u'archaeological'), (u'archeologically',u'archaeologically'), (u'archeologist',u'archaeologist'), (u'archeologists',u'archaeologists'), (u'archeology',u'archaeology'), (u'ardor',u'ardour'), (u'armor',u'armour'), (u'armored',u'armoured'), (u'armorer',u'armourer'), (u'armorers',u'armourers'), (u'armories',u'armouries'), (u'armory',u'armoury'), (u'artifact',u'artefact'), (u'artifacts',u'artefacts'), (u'authorize',u'authorise'), (u'authorized',u'authorised'), (u'authorizes',u'authorises'), (u'authorizing',u'authorising'), (u'ax',u'axe'), (u'backpedaled', 'backpedalled'), (u'backpedaling', 'backpedalling'), (u'banister', u'bannister'), (u'banisters',u'bannisters'), (u'baptize',u'baptise'), (u'baptized',u'baptised'), (u'baptizes',u'baptises'), (u'baptizing',u'baptising'), (u'bastardize',u'bastardise'), (u'bastardized',u'bastardised'), (u'bastardizes',u'bastardises'), (u'bastardizing',u'bastardising'), (u'battleax',u'battleaxe'), (u'balk',u'baulk'), (u'balked',u'baulked'), (u'balking',u'baulking'), (u'balks',u'baulks'), (u'bedeviled',u'bedevilled'), (u'bedevling',u'bedevilling'), (u'behavior',u'behaviour'), (u'behavoral',u'behavioural'), (u'behaviorism',u'behaviourism'), (u'behaviorist',u'behaviourist'), (u'behaviorists',u'behaviourists'), (u'behaviors',u'behaviours'), (u'behoove',u'behove'), (u'behooved',u'behoved'), (u'behooves',u'behoves'), (u'bejeweled',u'bejewelled'), (u'belabor',u'belabour'), (u'belabored',u'belaboured'), (u'belaboring',u'belabouring'), (u'belabors',u'belabours'), (u'beveled',u'bevelled'), (u'bevies',u'bevvies'), (u'bevy','bevvy'), (u'biased',u'biassed'), (u'biasing',u'biassing'), (u'binging',u'bingeing'), (u'bougainvillea',u'bougainvillaea'), (u'bougainvilleas',u'bougainvillaeas'), (u'bowdlerize',u'bowdlerise'), (u'bowdlerized',u'bowdlerised'), (u'bowdlerizes',u'bowdlerises'), (u'bowdlerizing',u'bowdlerising'), (u'breathalyze',u'breathalyse'), (u'breathalyzed',u'breathalysed'), (u'breathalyzer',u'breathalyser'), (u'breathalyzers',u'breathalysers'), (u'breathalyzes',u'breathalyses'), (u'breathalyzing',u'breathalysing'), (u'brutalize',u'brutalise'), (u'brutalized',u'brutalised'), (u'brutalizes',u'brutalises'), (u'brutalizing',u'brutalising'), (u'busses',u'buses'), (u'bussing',u'busing'), (u'cesarean',u'caesarean'), (u'cesareans',u'caesareans'), (u'caliber',u'calibre'), (u'calibers',u'calibres'), (u'([Cc])aliper(.?)',u'\1calliper\2'), (u'([Cc])alisthenics',u'\1allisthenics'), (u'canalize',u'canalise'), (u'canalized',u'canalised'), (u'canalizes',u'canalises'), (u'canalizing',u'canalising'), (u'([Cc])ancelation',u'\1ancellation'), (u'([Cc])ancelations',u'\1ancellations'), (u'([Cc])anceled',u'\1ancelled'), (r'([Cc])anceling',r'\1ancelling'), (u'([Cc])andor',u'\1andour'), (r'([Cc])annibaliz(.?)',r'\1annibalis\2'), (r'([Cc])anibaliz(.?)',r'\1annibalisi\2'), (r'([Cc])anibalis(.?)',r'\1annibalis\2'), (r'([Cc])anoniz(.?)',r'\1anonis\2'), (r'([Cc])apitaliz(.?)',r'\1apitalis\2'), (r'([Cc])arameliz(.?)',r'\1aramelis\2'), (r'([Cc])arboniz(.?)',r'\1arbonis\2'), (r'([Cc])arolled',r'\1arolled'), (r'([Cc])arolling',r'\1arolling'), (r'([Cc])atalog','\1atalogue'), (r'([Cc])atalogs','\1atalogues'), (r'([Cc])ataloged','\1atalogued'), (r'([Cc])ataloging','\1ataloguing'), (r'([Cc])atalyz(.?)','\1atalys\2'), (r'([Cc])ategoriz(.?)','\1ategoris\2'), (r'([Cc])auteriz(.?)','\1auteris\2'), (r'([Cc])avil(.?)','\1avill\2'),