Latest stable version 1.9.9+
There are 3 commits into the stable branch from the last development week (Tue 28 to Mon 4). ♦ Andrew Zoltay spotted a bug in login form on sites with “Use HTTPS for logins” setting on and provided a patch, included by Petr Skoda (MDL-24225). ♦ Two other commits are just trivial code cleanups. Rossiani Wijaya removed whitespace in the code she committed week before. ♦ Andrew Davis fixed a table comment in one of XMLDB files, spotted by Eloy Lafuente.
Moodle 2.0 RC1
There are 106 commits into the future release branch from the last week. The main community site http://moodle.org has been upgraded to 2.0 engine during the weekend and helped the core developers to discover some forgotten bugs and incompatible customizations.
Quote of the week
“Raarrrrrrrr fixed way old bug from dml conversion”
– Sam Hemelryk really enjoys bug fixing rampage
Parsing uploaded string files in AMOS
I was working on a new AMOS feature that allows users (language pack maintainers and translation contributors) to upload their translation to AMOS and include it in the main repository or offer it for inclusion. The first supported format of such file is standard Moodle string file format, which is valid PHP code defining associative array $string. The problem was that for obvious security reasons, I can not just let anonymous users to upload whatever PHP code and execute it. Imagine what would happen if hacker came and uploaded a file like
<?php
global $CFG;
$string['something'] = $CFG->dbpass;
If I just executed such file via include(), the user would get access to sensitive server configuration data. So it was clear I have to write my own parser that extracts the array from the file without actually executing the file.
I initially tried to solve it by searching for some patterns in the file contents using regular expressions. That worked pretty well in simple cases but unit tests (of course I started with them) failed quickly for complex samples that included commented lines, block comments or strings that contained the interesting patterns themselves.
<?php
// $string['lalala'] = 'Knock knock';
/*
$string['grrrr'] = 'Who\'s there?';
*/
$string['nasty'] = '$string[\'nasty\'] = \'Funny heh?\';';
So I realized regexps are not suitable for this kind of task. The other approach how to deal with comment blocks would be stream processing of the file contents, with a lot of flags like “inside a comment”, “after line comment mark” or “waiting for variable name”. But that was evident reinventing of the wheel which would have a square shape at the end of the day, anyway. PHP itself has to do this boring job when analysing the source so I just had to learn how to use its results.
Wise may already know I ended with tokenizer extension. The tokenizer functions provide an interface to the PHP tokenizer embedded in the Zend Engine. Using these functions you may write your own PHP source analyzing or modification tools without having to deal with the language specification at the lexical level. The parser method I have finally implemented calls token_get_all() to get array of tokens found in the uploaded file and picks patterns that are considered as valid string definition.
For the purpose of uploading the string file into AMOS, valid definition is something like ‘T_VARIABLE $string followed by [ followed by T_CONSTANT_ENCAPSED_STRING followed by ] followed by assignment = followed by T_CONSTANT_ENCAPSED_STRING followed by semicolon’. All other tokens like T_WHITESPACE, T_INLINE_HTML, T_COMMENT or T_DOC_COMMENT are just ignored.
As you can see this means that code like
$string['greeting'] = 'Hello ' . 'world';
is considered as syntax error for AMOS import even though it is valid PHP
code. But I am sure it OK as there is no real reason to support all thinkable
ways of string definiton (like heredoc etc).