Monday, August 30, 2010

Mediawiki search and replace automation in python

As part of my translator work for Fedora I noticed a page with a stylistic issue with an Italian word, 'obbiettivo' which is better spelled as 'obiettivo'. Now, I quickly edited the page and the few other found by searching for the term, then realized there were a lot more with the plural form, so it was not practical to do the work manually.

Fortunately, Mediawiki (the program powering Fedora's wiki) has a very good API, which is accessible with python; in particular in Fedora we ship python-mwclient and it turned out I could search and replace all occurrences in 10 lines of code. Magic!

If someone is wondering, here is the script I used:

#!/usr/bin/python
# Wiki Search And Replace

import mwclient, re
site = mwclient.Site('fedoraproject.org')
site.login( 'login', 'password' )

pages = site.search('obbiettivi', what='text')

for pagedata in pages:
    page = site.Pages[ pagedata['title'] ]
    text = page.edit()
    newtext = re.sub( r'([o|O])bbiettivi', r'\1biettivi' , text )
    page.save( newtext, summary='Obbiettivo->Obiettivo', minor=True)


I found the mwclient documentation a bit lacking so the hard part was figuring out the value for some parameters in the function calls. At the end, I mostly relied on the help(mwclient) content, available from the python interpreter, in conjunction with the Mediawiki API reference you get by pointing the browser to http://fedoraproject.org/w/api.php

No comments:

Post a Comment