Monday, February 27, 2012

PHP fun with numbers

An integer in PHP is from the set Z (..-2, -1, 0, 1, 2..). Typically on a 32-bit machine this will be 2^31 (signed int) which is equal to 2,147,483,648 (approx 2 billion). You can get the size of an int from the constant PHP_INT_SIZE, which on my machine = 4. There is another constant PHP_INT_MAX which tells you the max value given above (roughly 2 billion). In PHP any value above that limit is automatically a float value.
$x = PHP_INT_MAX; 
var_dump($x); // gives int(2147483647)

$y = PHP_INT_MAX + 1; 
var_dump($y); // gives float(2147483648)
If you attempt to cast a float to an integer and the value is within the bounds for an integer value then it will cast as expected (always rounded towards zero).
$x = 34.5;
$y = (int)$x;
var_dump($y); // gives int(34)
Now comes the funny business...If you attempt to cast a float to an int that is outside the boundaries for an integer value you get...well, officially an undefined output. With no warnings and no errors. In my experiments it is always zero.
$x = PHP_INT_MAX + 1;
$y = (int)$x;
var_dump($y); // gives 0. Nice...
Here's an even crazier one. The official docs recommend NEVER casting an unknown fraction to an int or you will suffer the consequences and hell shall rise through the earth and buildings will fall and all will perish. Or something similar.
$x = (0.1+0.7) * 10;
var_dump($x); // gives float(8). Okay.

$y = (int)$x;
var_dump($y); // gives int(7). WTF?!
And ALWAYS be careful when comparing floating point values. Because of the way a base10 float is stored internally on a machine in base2 you can get some quirky results.
$x = (0.1+0.7)*10;
$y = 8.0;
var_dump($x); // gives float(8)
var_dump($y); // gives float(8)
So they are equal right? Wrong.
echo($x == $y); // gives 0 (i.e. false)
$x is stored internally probably as something like 7.9999999999999991118. From php.net offical documentation

"Floating point numbers have limited precision. Although it depends on the system, PHP typically uses the IEEE 754 double precision format, which will give a maximum relative error due to rounding in the order of 1.11e-16"

The workaround is to provide an epsilon value that provides an allowable difference for two floats to still be considered equal.
if (abs($x-$y)<0.00001) 
Also, as a footnote, converting a string to an int can also result in zero if the string is not an integer value.
$x = "peter";
$y = (int)$x;
var_dump($y); // gives int(0)
:)

Thursday, February 23, 2012

JavaScript variable scope and closures

I wrote a bug recently. I didn't mean to, it just happened. I thought the code looked good and two other code reviewers did too. Of course none of us are JavaScript programmers and clearly none of us understood closures and variable scope properly. This is one of the basic things about JavaScript that seems to catch people out all the time. THERE IS NO BLOCK SCOPE - ONLY FUNCTION SCOPE! I blame the C-like syntax for giving people a false sense of security. Take the following contrived example. Three buttons called One, Two and Three. Add a click event handler to each that simply displays the name of the element. The following code DOES NOT WORK. Some people (me included, not so long ago) would have expected that clicking One would show '#one', Two would show '#two' and Three would show '#three'. But they all show '#three'. Why? Because scope is function-based in JavaScript. And 'url' is declared in the document.ready function. Clicking on any of the buttons will result in the last value assigned to the url (i.e. '#three') to be displayed.

$(document).ready(function() {
    var links = ['#one', '#two', '#three'];
    for(var i=0; i<links.length; i++) {
       // oops! The scope of url is the function - not the for loop block
       var url = links[i]; 
       $(url).click(function(e) {
           alert(url); 
       }); 
    }
});

To fix this, we introduce a closure to limit the scope of that variable

$(document).ready(function() {
    var links = ['#one', '#two', '#three'];
    for(var i=0; i<links.length; i++) {
       displayId(links[i]);
    }
});

var displayId = function(url) {
    $(url).click(function(e) {
        alert(url); // now the scope of url is within this function
    });
}
So lessons learned.
  • Have a good grasp of the fundamentals of a language before ploughing ahead coding in it
  • Only do code reviews with people who UNDERSTAND THE LANGUAGE!
  • Normal advice of declaring variables as late as possible can be BAD ADVICE in JavaScript - declare at the top of a function instead.

Wednesday, February 22, 2012

Modifying an item in an array while iterating over it in PHP

So you want to iterate over an array in PHP and change each item as you do.
In PHP5 there are two ways of doing this.

1) Pass by reference
Note the ampersand in the declaration of the foreach loop before the variable $i, passing each item by reference and allowing it to be changed directly.
$arr = array('a', 'b', 'c');
foreach($arr as &$i) {
    $i = 'changed';
}
2) Using the key of each item
This is the classic way of achieving the same, using the key of the item
$arr = array('a', 'b', 'c');
foreach($arr as $k=>$v) {
    $arr[$k] = 'changed';
}

Monday, February 20, 2012

Weighted Random Selection

I recently came across an interesting problem. Given an array of items where each item has a name and a weight (integer value), select a random item from the array based on the weight. So items with a larger weight value are more likely to be returned.

One solution I thought of was to create a new array where the weight determines how many times an element appears in the new array. You then just pick an item from that array at random. However this is limited due to the alarming size of the new array when the weights and the number of items in the original increase.

So here is my solution to the problem - there are many ways of solving this problem but I think this one is easy to understand.

Essentially the algorithm is
1. Add up all the weights.
2. Pick a number at random between 1 and the sum of the weights.
3. Iterate over the items, decrementing the random number by the weight of the current selection.
4. Compare the result to zero, if less than or equal to break otherwise keep iterating.

The key is that the larger the weight the more likely to be less than zero when compared to the random selection between zero and the sum of weights. Remember - that random decrements each pass through the array of items, so you will always hit zero i.e. it is guaranteed that something will be picked every time.

I will use PHP for this (now my primary language day-in-day-out) so I'm sure there are improvements to be made in this code - I'm very new to PHP still!

In this example say the array contains 4 foods and each has a weight. This code demonstrates the distribution over a 1000 iterations.

// We don't like fruit in this example, so give a low weighting
$foods = array('vegetables'=>4, 'fruit'=>1, 'chips'=>8, 'burgers'=>10);
$sumOfWeights = array_sum($foods);

// hold the results of a thousand random selections. 
$count = array('vegetables'=>0, 'fruit'=>0, 'chips'=>0, 'burgers'=>0);

for($i=0; $i < 1000; $i++) {
    // choose a random between 1 and the sum of the weights.
    $random = rand(1, $sumOfWeights); 
    foreach($foods as $name => $weighting) {
        // ***The next two lines are the heart of this algorithm***
        // decrement the random by the current weighting.
        $random -= $weighting;        
        // The larger the weighting, the more likely random is less than zero.
        if($random <= 0) {
            $count[$name]++; 
            break;  
        }
    }
}

foreach($count as $name => $result) {
   echo($name."=".$result);
}


This will give results similar to
  • vegetables=210
  • fruit=157
  • chips=278
  • burgers=355

Changing the weighting value for each item will yield the expected distributions, for example increasing the weighting for vegetables to 80 and fruit to 60 (let's be healthy) would result in something similar to
  • vegetables=553
  • fruit=421
  • chips=19
  • burgers=7