User:CzechOut/Sandbox

Useful bot syntax
To strip a page entirely of its HTML, use pywikipeia bots like this: python replace.py -cat:'18th century years' -regex "<.*>" "" -summary:'Getting rid of HTML table at bottom' -always Note that this works only when the html begins and ends with a sharp bracket. This is a nice and easy way to replace text that includes quotation marks and pipes. The text I was trying to replace was 1em" | which I was then trying to exchange for 1em" |'', so as to quickly add in italics to a series of similarly-formatted pages. python replace.py -cat:'whatever that may be' -regex "1em\" \|" "1em\" |''"  -summary:'italics per MOS'

The \ acts as an escape switch, allowing you to specify things like double quotes, which would otherwise parse as a command. Note that the first pipe did need to be excaped, the but the second didn't. I'm not quite sure why, yet. This one comes largely from user:sulfur, and is a way to change a specific element of an infobox, in this case the race variable: python replace.py -subcat:'Time-Space Visualiser' -summary:'human --> Human repair, in infoboxen only, per request of User:Tangerineduel' -regex 'race( *)=( *)\[\[human\]\]' 'species\1=\2Human' The expression ( *) means "look for a space, and find everything from 0 to an infinite number of spaces. Hence this will find everything from: race=human to race         =           human python replace.py -cat:'Doctor Who seasons'  -summary:'Getting rid of wiki markup table at bottom - Do not replace - use template instead' -regex "(\{+\|+class)([\S\s]*?)(\}+)" ""

This neatly gets rid of any wiki markup table (i.e., one that is enclosed within {| markup that appears on a page. Note that the word "class" is the thing which follows {| .  Note that [\S\s]*? is particularly important, as that's how you create a multi-line table.  Sadly, though, this regex leaves a big gap where the table used to be. There's probably a cleverer way to do this, but the extra whitespace can then be eliminated by running the bot a second time to specifically remove the whitespace: python replace.py -cat:'Doctor Who seasons'  -summary:'Getting rid of blank space between ext links and season temp' -regex "(\]\s\s+)" "]\n" This seems to do a decent job of scrubbing blank lines from pages, especially where the blank lines occur after a closing bracket (]).  python replace.py -cat:'Dates' -summary:'Making HTML table at bottom standard format so I can later remove it in preparation for implementing template:DayNav' -regex "(\]\s+\<)" "]<" This takes a table that has a sloppy structure like January and converts it to January This then makes it much easier to delete the whole table with -regex "<.*>" "" This regex, however, does not take into account whether there are any line breaks after the html commands. If there are spaces after the commands, and you use the above regex, then what'll happen is that you'll end up with a lot of extra, empty space. To take care of this, you need to add a "space scubber" command like this: -regex "<.*>\s+" "" This finds every instance of a line space, but there has to be one there for the attempt at scrubbing to take place, as that's what + means in regex-speak. -regex "\*( *)\[.*\]\s+" "" This gets rid of stuff in this format: * Blah It says, look for something that starts with an *. Then look for any number of spaces after the *, including no spaces at all. Then look for everything between [ and  ]. Then look for any number of line spaces from at least 1. Then delete the lot. The tiny problem with this syntax is that if it encounters something like this: it'll strip the sentence of Abraham Lincoln. So you have to use it with some care. But it's good for stripping simple lists from an article. Now, let's imagine I had the following: 10 and I wanted to convert it to just DWBIT 10 The useful regex for stripping a pipetrick is this: -regex "(DWBIT \d+)\|\d+\]\]" "\1]]" To strip quotation marks from something that is also italicized -regex "\"('+)" "\1" (for one side) and "('+)\"" "\1" (for the other) To strip a DEFAULTSORT from a page also containing a NameSort: -regex "(\{\{[Dd][Ee][Ff][Aa][Uu].*\}\})\s+(\{\{NameSort\}\})" "\2"
 * Abraham Lincoln was the 16th President of the United States

user-fixes.py

 * see starwars: User:Xwing328/Pywikipedia

Neat tricks with parser functions
This little bit of code will find the space in a page title, then return the first four letters after it. When added to a DEFAULTSORT instruction, it should allow for autosorting by the first four letters of the last name. Gotta test it on names that have more than one space (i.e. names where the middle name/initial is included) Yep, this works fine on names that are shorter than 4 letters (used on William Ash, for instance, it returned "Ash"), and it worked well on Russell T Davies, where it returned "Davi". This little beauty searches for a letter in a title, then returns a yes or a no depending on whether it finds it:

Infoboxen
I've been quite unsatisfied with the infoboxes here for a while. They require you to type in way more than you need to in order to put up a picture, they produce a lot of blank space if you don't fill in every variable and they're "thin black line"-heavy. So here's a side-by-side comparison of the exact same variables in two different styles of infobox. The one on the left if my proposal of what we should change to (although colors can obviously be changed easily).

Note the differences between the two boxes. The new one has no border around the whole box, but a fully colored interior that frames the picture.

The new version also automatically links the picture. To get the picture up, all I did was type in image=Tegan.jpg. Simple, easy, no worries about inverted brackets, missing punctuation or anything else. As long as you know the simple name of the file, the picture appears. More importantly, it gives editors no choice as to the width. All pictures using this infobox are 250px, period. This will achieve uniformity across the stie, something another thread has been complaining about.

But you can see the major downside. It'll obviously mean that every single existing infobox will have to have its brackets edited out. Controversial, to be sure, but ultimately massively beneficial. Yes, the new box can be rewritten to handle the existing format, but we'd lose the ability to set the width automatically.

But here's the unambiguously better bit. Variables not entered do not produce a blank line, or try to substitute a word like "unknown". The exception is "mentioned in" and "appeared in", because it actually is useful to have the box positively assert that there are no mentions or appearances, I think. There are a ton oBut this can be easily changed so that the lines beneath the "Appearances" subhead can also be set to disappear when someone hasn't filled it out.

This sort of thing can be done for every infobox on the wiki. We could get real control over our infoboxes by putting a whole lot more style into them. We can even put little icons into the infoboxes, if we wanted, so that on series episode pages there's the logo of the programme in question. Maybe a small TARDIS for DW, and the faces of Captain Jack and SJS for their series. Check out the SMDM episode template for an idea of what that might look like.  Czech Out  ☎ | ✍  07:42, 14 May 2009 (UTC)

! style="font-size: 125%; ; text-align: center;"