Peter Kelly

Friday, May 9, 2014

Is TDD dead?

Was it ever alive? Who actually does/did TDD exactly as prescribed?

Write a test before you even write the class or method in the class under test
Run the tests, test fails
Do enough, and only enough, to make the test pass
Run the tests, test pass
Write more tests and continue to improve the design of the code

Not one developer I have ever worked with in any team in 10 years truly followed TDD as prescribed. I tried it for a few months and while it never slowed me down too much (a common complaint from people who have never attempted it) I felt it resulted in absolutely no benefit. I also felt I was doing something wrong when I wrote code without a test, I gave out to myself, deleted the code and started again. I write pretty good code I think and I know what bad code looks like.

Don't get me wrong. I write tests and almost every developer in every team I have worked in wrote unit tests. We typically did this as we wrote the code, probably towards the end after writing some feature and verifying it through a browser manually. The unit tests would check expected output and some boundary conditions not likely to show up in a quick check in the browser. All good? Code review, check-in, verify CI build runs and all tests pass. The key is that "all tests pass" includes unit tests and integration tests.

I never experienced the promised enlightenment of better design through TDD. Honestly, I did not. I like unit tests and well-written tests that run all the time help me refactor later and have spotted bugs being introduced very early. But writing tests first, before code and only in baby-steps seems (and always has seemed to me) to be overkill. It could be a useful training exercise for mentoring junior developers but for experienced developers and teams there is likely to be more value in writing code, adding some unit tests for sure, building out a continuous integration environment, having integration tests using servers, VMs, databases, browsers and whatever else and running integration tests anytime code is committed, checking user-focused behaviour.

One thing I have realised is that having a Test Engineer building automated integration tests can be frustrating while development is churning and APIs might be changing. So we came up with a rule recently in a team I worked in - if a developer writes some code that breaks the automation tests (developers can run these locally before committing, takes about 1 minute for 200+ tests), the dev fixes the integration test and sends a pull request to the Test Engineer. This frees the Test Engineer to work on new integration tests and maintaining that test and integration environment. We also have tasks that generally go no longer than 2 days so a developer will code away for no more than 2 days and then add some tests, check the integration tests, then commit and move on. Not perfect but works great, ship regularly and we have very low bug count. People over process, team figuring out what works best for the team. That is the true spirit of agile right?

Thursday, January 16, 2014

Python **kwargs appear immutable in method

This is an interesting one.

You can pass an arbitrary number of keyword arguments to a method in Python using **kwargs.
This is a dictionary and you can iterate over items(), keys() and values() and do all the usual dict stuff.
However if you say, delete an item from the dictionary from within a method it does not change the dict that was passed in.

Example with ordinary dict.


def filter_args(kwargs):
    for k, v in kwargs.items():
        if v > 10:
            del kwargs[k]

args = {"a":12, "b":2}
filter_args(args)
print args

This prints the following


{'b': 2}

Now an example with **


def filter_args(**kwargs):
    for k, v in kwargs.items():
        if v > 10:
            del kwargs[k]
args = {"a":12, "b":2}
filter_args(**args)
print args

This prints


{'a': 12, 'b': 2}

It only changed the dict inside the method.

Tuesday, January 14, 2014

Scaling a Web App - my experience

At a previous company I worked for we had 30 million users with over 2.5 million uniques a month and 1 million downloads a day. Not a massive scale but big enough - mid-level web scale? Anyway, it is a read-intensive website comprised of PHP, Apache, many back-end services (in PHP and Java), memcached, ActiveMQ and MySQL.This is short blog post about some of the ways in which scaling was handled.

Load-Balancing

F5 load balancer splitting traffic around 30 web servers.
Nothing stored in session anywhere allowing horizontal scaling at the app level.

Caching

20 memcached servers (with 32GB RAM each) split into different clusters (web, mobile etc.) caching result of anything that was slow e.g. database queries or web service calls. Generally split into clusters of 4 servers giving >100GB memory for caching.
We also cached generated HTML (non-dynamic HTML like headers and footers) on file system on each web server.
We cached nothing in memory on the web servers so the memory is only used by the web app and not taken up by cache.
We used consistent hashing algorithm for memcached to make cache server losses or additions less impactful.
Cache control headers set to allow browser caching for certain service calls and requests.

- Database

We used a Master/Slave architecture (MySQL) where Master is a single DB you write to and is then replicated to multiple Slaves for reading. Lag was normally less than 100ms. Works well for read-intensive services.
We also used sharding when writes became too numerous in certain cases essentially taking that Master/Slave model and replicating it to N shards – you know which master to write to based on user_id e.g. users 0 – 5m to one server, 5m – 10m to another etc. Each shard kept a similar size dataset making queries predictable. This is a pain in MySQL 5.1.
There was also use of the new MySQL NDB (or MySQL Cluster) which has auto-sharding and other big-scale goodness
DBAs code reviewed all stored procedures and gave feedback RE: indexing and optimisations

Message System

We used JMS on ActiveMQ for firing off events for Business Intelligence and other consumers (100s of events a second).

Static Assets

These were all served from lighttpd servers and not from the web servers themselves.
These were also then pushed out to a Content Delivery Network (Akamai) to make sure they are served as close to the requester as possible.

Offline Processing

We tended towards crons and daemons for any processing that could be done offline and not slow down a web request e.g. daemon to pick up new stuff in a table and fire off web requests per row to auditing service rather than calling the audit service as part of the original web request.

Deployments

Dev, QA, Integration, Staging and Production deployments mostly driven through Jenkins. Staging and Production were near-identical environments - firewall rules etc. to reduce surprises and rollbacks when deploying to prod
Half the production cluster is taken out of rotation from the load balancer and the code is deployed and smoke tested there first, then the cluster is flipped and deployed to other half. The idea is to have no downtime.

Downloads

Served from dedicated download servers and not from web cluster

Load Testing

We had multiple JMeter servers for load testing the websites and services. We did this testing on lower spec environments than production and our thought process was "if it is good enough there, it will be at least as good on production". This mantra held true.

Overall a pretty solid architecture and we worked hand-in-hand with a brilliant Ops team (sys admins, DBAs, NOC) all the time on this. It was not without problems and the odd firefight but it was pretty good. We never properly addressed the dog-pile effect when cache was flushed and the databases were overloaded but that was a problem with the way memcached was (ab)used early on.

Wednesday, March 20, 2013

Reset Layout in Eclipse

Just a super-quick simple tip to fix something that drove me mad.
I am not a power-user of Eclipse, in fact I rarely use it. When I do I tend to mess up the perspectives pretty quickly. To reset the layout...

Click on Window in the top nav, then click on Open Perspective, then click on Other, then click on Java EE (or you could use Java). Then click on Window again, and click on Reset Perspective.

Monday, September 17, 2012

Putting on your Test Hat

I have noticed a real divide between development and test in every company I have worked in. Perhaps this is an Irish attitude? I say this because I don't see much about it coming out of the US in any blogs I follow etc. The tell-tale sign I most often see is a really poor attitude from developers towards the QA team. There is often arrogance and border-line bullying taking place. It can be summarised as "Devs are better than test" and "test should do what we say". There is a real under-appreciation for the role of QA and SDET and a belief that QA are somehow below Developers - because if they were any good they'd all be developers right? This attitude completely stinks.

I have worked in a bank, automotive diagnostics, enterprise software and consumer web and I have found this attitude present in all industries to varying degrees. "Throw it over the wall" is commonly heard in teams like these. Devs write some code, check it quickly in whatever version of whatever browser they happen to have open and throw it over to QA to test and move on to their next task. They are then surprised when there are several bugs opened against the code. I often see devs try to influence QA or weasel out of responsibilities. "It's not a bug really", "We don't have time to fix that now" etc. This is typical. There is a quite a lot of time wasted logging bugs and screenshots, communicating and tracking issues and the actual re-work to fix issues that could have been prevented if the developers cared more about testing (not just unit testing, although that would be a start...). Testing is not something that is solely the responsibility of the QA team. If you are developer and you do not test your code then I do not want to work with you.

I found it very interesting reading the following blog post on Google Testing Blog that Google Developers perform testing to a certain standard before handing over testing to the Software Engineers in Test. They call this "testing 1.0" using tools such as WebDriver and FIO. This raises the bar on quality and reduces the amount of re-work necessary (the schedule-killer...). I love this idea. We are all one team and we are all responsible for creating the best product we can to the highest standard we can. There is no "throwing it over the wall" in a team like that. I will push to add some testing 1.0 to our criteria for committing new code. The developers will benefit from putting on their QA hats and gain an appreciation of the subtleties of testing, not to mention reducing the amount of bugs they will have to work on.

We run fast in the company I work for. We run fast and sometimes quality suffers. Sometimes bad things happen. We fix them and move on. Keep moving forward, keep shipping. The most difficult part of introducing the type of changes that would improve quality is convincing people who enjoy the fast-paced nature of the culture that they need to slow down a little. This is a tough sell to some developers and senior managers. I would say the best chance of succeeding to introduce change in an environment like this is not to criticize the current process or try to formalize things too much. Just a little incremental improvement here and there and try to shift attitudes a little. For example, write a wiki article on how you installed and configured tool-X to help with testing your code. Get your fellow devs to read the article and make it as simple as possible for them to follow you. Measure. Measure the bug count against you in the previous project against the bug count using a higher standard of pre-commit testing in a new project. Everything is easier to sell with hard data to back it up. I have thought long and hard about how to influence and introduce change from the bottom-up in companies. I at one time came to the conclusion that you simply can't. It's too hard. That it has to come from the top down. Well maybe it does. But if I can influence people even a tiny bit in my team and our little bubble improves quality maybe we can influence other teams and managers that this is an approach worth following.

Monday, May 21, 2012

Unit Testing - Is 100% Code Coverage Useful?

I recently had a short twitter debate with @unclebobmartin of clean code fame about 100% code coverage. He is of course much smarter than I as well, is vastly more experienced and enjoys a well-earned reputation in the software community that I could only dream of. However, I felt I had a point. I disagree with the 100% dogma as far as Unit Tests go. In fact, I disagree with it strongly. I argue it is more important to unit test some of your code and ship it quickly and iterate - certainly in web development. That does not mean your application is not thoroughly tested by the QA Engineers through automated and manual testing on several environments before hitting production - we are talking about Unit Testing here.

In my experience it can be quite difficult to achieve 100% code coverage in the real world (remembering what a unit test actually is and keeping in mind when it stops becoming a unit test and starts being a system test - database, file system, network). Think of all the plumbing and infrastructure code involved, think of all the getters and setters, think of all the external interaction with databases etc. Developers, if they are tasked with 100% code coverage, will probably find a way to hit the mark. Will those tests be useful? Does 100% actually guarantee anything? Should the tests not focus on key behavior and key user scenarios at the function level? Unit tests should cover some of the code and a good company will have great test engineers who write great automated system tests and do great manual testing and a great company will have constant feedback loop with product owners demonstrating the software as it progresses.

Of all the code out there running some of the biggest sites and most popular applications in the world, does anybody really think they have 100% unit test code coverage? Of course there are some situations when perhaps 100% code coverage may make perfect sense such as extremely critical systems in finance or medicine. And of course it also can depend on the culture and size of the company - startups are going to focus less on code coverage and more on just shipping code and features whereas enterprises are going to focus on quality more.

But I question the engineering fixation on 100% and it's actual value and ROI. It is NOT something I would blindly follow or suggest for every situation.

P.S. I like Kent Beck's answer on Quora.

Sunday, March 18, 2012

Delete a row from a table in Objective-C

Here's a gotcha I came across while deleting a row from a UITableView in iOS. Basically, the order you delete the record from the view and the underlying data source can be crucial.

To delete an item from a UITableView, for this example, I'm going to assume your delegate and datasource are both set to be your TableViewController. This would be pretty common anyway. To give your tableview delete/insert support is actually very straight-forward.

Simply implement the tableView:commitEditingStyle:forRowAtIndexPath: method (on the data source - in this case the controller).
You must then send deleteRowsAtIndexPaths:withRowAnimation: or insertRowsAtIndexPaths:withRowAnimation: to the table view to direct it to adjust its presentation.
And then update the corresponding data-model array by either deleting the referenced item from the array or adding an item to the array.

This is described in the official documentation here

BUT WAIT! If you attempt to remove the item from the view first, then the data source you may run into an error. Something like this (for example if you tried to delete 1 row from a list of 10)...

Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'Invalid update: invalid number of rows in section 0. The number of rows contained in an existing section after the update (10) must be equal to the number of rows contained in that section before the update (10), plus or minus the number of rows inserted or deleted from that section (0 inserted, 1 deleted).

Internally deleteRowsAtIndexPaths calls the tableView's numberOfRowsInSection. If the data source has not changed then this will return the same count as before the delete (or insert for that matter). The runtime does not like this and tells you as much in the exception - it expected something to change.

So, simply remove the item from the underlying data source first and then call deleteRowsAtIndexPaths. Which is the other way around to the way it is described in the Apple docs (which are usually excellent). Here's some code...

- (NSInteger)tableView:(UITableView *)tableView numberOfRowsInSection:(NSInteger)section
{
    // Return the number of rows in the section.
    return [recentImages count];
}

- (void)tableView:(UITableView *)tableView commitEditingStyle:(UITableViewCellEditingStyle)editingStyle forRowAtIndexPath:(NSIndexPath *)indexPath {

    // Delete from underlying data source first!
    recentImages = [recentImages removeObjectAtIndex:indexPath.row];
    
    // Then perform the action on the tableView
    if (editingStyle == UITableViewCellEditingStyleDelete)
    {   
        [tableView beginUpdates];
        [tableView deleteRowsAtIndexPaths:[NSArray arrayWithObject:indexPath]
                             withRowAnimation:UITableViewRowAnimationFade];        
        [tableView endUpdates];
    }
    
    // Finally, reload data in view
    [self.tableView reloadData];
}