Monday, July 18, 2016

Ansible and DigitalOcean: setting up

The playbook in my previous blog gives an idea of what I do, but it doesn't get you working, and misses out lots of configuration. Let's fill in some gaps by looking at the infrastructure round the playbook. But before that, some rationale so you can see some of the reasons for my decisions.

When I need to do precise work over and over again, I look around for a tool to help me do that work faster, more accurately and more repeatably*. 

Plenty of tools exist to help in setting up servers; here's Wikipeda's big list. Chef and Puppet are common choices. I've chosen Ansible over Chef or Puppet because it instructs servers over ssh, so doesn't need to install client software before communicating. I'm told that Ansible is easier to learn.

I've leant heavily on other guides – chiefly 

Crucial gaps were filled in by 

View them as definitive, and my account below as flaky.


Most of the following is done in the terminal of my MacBook under OS X.11.5 (El Capitan), with an admin user. I used Atom as my primary editor. 

First, we'll need Ansible. I want to run it locally.

I used homebrew to install Ansible, installing 2.1.0.0
brew install ansible

Irritatingly, there's a bug in the tool that enables Ansible's communication with DigitalOcean. While working on the playbooks, you may see an error NameError: name 'DoError' is not defined\r\n".

To get over the hump, I needed to downgrade the version (0.3.7 for me) of dopy (DigitalOceanPYthon) that comes with Ansible.
sudo pip install 'dopy>=0.3.5,<=0.3.5'


I set up a base directory for this project, deep in my home folder. I've not set any special permissions on the new directory. When I run an Ansible command, I run it from that directory, and I keep all my Ansible configuration, playbooks and templates there**.

In that base directory, I've put following ansible.cfg file to instruct Ansible to use nearby files to read inventory and write logs.
[defaults]
inventory = ./hosts
log_path=./ansible.log

Please note: here, and below, I'm using ./ to indicate explicitly to the system (and to you, reader) that we're starting from whatever directory we're in – which is the base directory, most of the time.

The inventory file holds information about the servers you're asking Ansible to manage. I'm using Ansible locally, so my inventory file ./hosts looks like this: 
[local]
localhost ansible_connection=local

You might imagine that I'd need to put all my DigitalOcean servers into this inventory file. I can't, because I don't know them. We'll need to use Ansible's "in-memory inventory" instead.

So that's Ansible set up. Let's hope. 


Ansible has a concept of playbooks – short, readable files that basically link a list of hosts and set of instructions about what to do with them. These playbooks are written in YAML – they're readable, but not as easy to build as to read. I've got a short posting coming on .yml files for Ansible. Ansible's example playbooks use roles – which is good for reuse, but harder to read.

My playbooks share three sets (currently) of configuration information. Each of these are in their own file, so that they can make changes in one place only, and so that I can strip the information out from change control. They're all together in a ./vars directory. 
./vars/sensititve.yml  – the API key (which I don't want to share)
./vars/sshInfo.yml – the ssh information (which I want to share temporarily)
./vars/droplets.yml – the list of servers to build (which will include different configuration options).


Let's get the bits together to allow my setup to identify itself to DigitalOcean as a valid account owner, and to the servers as a valid controlling account.

The DigitalOcean API key will identify my account from any playbook that sets up (or destroys) servers. I got it from DigitalOcean - API Tokens. I don't want to share it with anyone, ever. If I do, I need to revoke it and get a new one. 
./vars/sensititve.yml looks like:
---
  sensitive:
    do_token: « 64 characters of hex signal that looks like noise but isn't. »
...

My ssh key information is used in any playbook that communicates with servers. If a server has the public key, and I have the private key, Ansible can log in over ssh without a password. 
./vars/sshInfo.yml  looks like this:
--
  sshInfo:
    do_ssh_key_name: TestLabEuroSTAR2016
    local_private_ssh_key: ~/.ssh/TestLabEuroSTAR2016
...

This needs a bit of matching infrastructure, and a rationale. I want to have the option of sharing the public key with TestLab people, so it needs to be separate from my usual keys. Sharing means I don't want to use my default keys, so I need to make and name it for this task, and specify it explicitly. 

I made a custom-named key like this:
ssh-keygen -t rsa -f TestLabEuroSTAR2016
which builds two files in ~/.ssh/ for my key. 
The public part is TestLabEuroSTAR2016.pub and the private TestLabEuroSTAR2016
Note: this command will ask for a passphrase. Don't forget the passphrase; OS X (not Ansible) may ask you to enter it if you're re-using the key after a couple of days.

I uploaded the public part of the key to DigitalOcean (via: DigitalOcean - Settings ), so that DigitalOcean can put it on any new server. I gave the key a name on upload, and it's that name I'm using in do_ssh_key_name above. When Ansible sets up a new server, it will ask DigitalOcean to add this uploaded public key to the server as it's being made.

When my tool communicates with the new server to load and configure software, it will use the private key ~/.ssh/TestLabEuroSTAR2016 to get in via ssh. If I don't specify this in my playbook, Ansible will quietly default to id_rsa.pub, and that won't get in.

While we're considering shared information, here's one more. I want to have a script that destroys the servers I set up (and only those servers), so I'll need to share information about those, too.

My ./vars/droplets.yml file looks like:
---
  droplets:
  - name: TestLab01
  - name: TestLab02
...
I can add more servers as I need them. I'm currently limited to 50 droplets. I expect that I'll add more details to each server (an indented list for each - name: line) as I differentiate my servers.

When I get my servers set up, I want each to be doing something that differentiates it from its neighbours. I'd also prefer not to be faffing around with IP addresses. I've set up a template to build a web page for each server. Each server's page will have its own name at the top of the page, and a list of named links to the others.
I've put my template at ./siteStuff/index.html , and it looks like
 <!DOCTYPE html>  
 <html lang="en">  
 <head>  
  <meta charset="utf-8">  
  <meta http-equiv="X-UA-Compatible" content="IE=edge">  
  <meta name="viewport" content="width=device-width, initial-scale=1">  
  <title>Basic HTML Template</title>  
 </head>  
 <body>  
  <h1>James's TestLab stuff for {{WPL_server_info}}</h1>  
  <ul>  
  {% for item in otherServers %}  
   <li><a href="http://{{ item.droplet.ip_address }}">{{ item.droplet.name }}</a></li>  
  {% endfor %}  
  </ul>  
  <p>Ansible set up this index from a template</p>  
 </body>  
 </html>  

This is a jinja2 template, and will make bare HTML. The set up index task in my playbook will generate an index.html file for each of the hosts we set up in the newServers group in the in-memory inventory. Look back to see that I set up a bunch of host variables for those servers – the template substitutes the stuff in {curly brackets} with those host variables. It uses the server name (which came from the names in the list of desired droplets), then builds a list of links to all the servers in the group. The playbook uploads the built page to the server.


When I run the makeDroplets.yml playbook with  ansible-playbook makeDroplets.yml, I get plenty of information about what's happening. I won't paste it here. Occasionally, one of the post-server-creation steps fails – I may need to add a wait_for. However, Ansible is idempotent, so if a step fails I can simply run the playbook again, and it should fill in the gaps.

Be aware:
- Ansible and DigitalOcean take about a minute to set up each server.
- Each of these smallest-possible servers costs $0.007 an hour to run. Which is piddling, until you fire up 50 and forget to destroy them.

I can check my handiwork by browsing to one of the IP addresses. I hope to see a page with links to all my newly-minted servers – and when I click through, I hope to observe that the server name changes – and so does the IP address.




* And with my soul intact (but my yaks shaved).
** here's an edited listing to give you an idea of the shape

./ansible.cfg
./ansible.log
./hosts
./destroyDroplets.yml
./makeDroplets.yml
./siteStuff/index.html
./vars
./vars/sensititve.yml
./vars/sshInfo.yml
./vars/droplets.yml


Saturday, July 16, 2016

Using Ansible and DigitalOcean to provision TestLab servers

Here's an Ansible playbook that I use to spin up and provision DigitalOcean droplets.

There's a longer article to follow, if you're interested – but the salient points are:

- Spin up the droplets with Ansible's DigitalOcean module
- Put their details into Ansible's "in-memory inventory" with Ansible's add_host module
- Use those details when you provision the droplets with the apt module and more.

I used homebrew to install Ansible 2.1 on my OSX.11 MacBook. I needed to revert to dopy 0.3.5 (there's a bug in the 0.3.7 version that comes with Ansible 2.1)


The playbook below
- uses a custom ssh key where necessary
- keeps the ssh keys and the API out of the main file
- takes an external file of names for the hosts
- avoids irritating known-host checking by setting the following variable for each new server ansible_ssh_common_args='-o StrictHostKeyChecking=no'
- sets up apache / php / git on each server, and uses a jinja2 template to make a unique-ish page on each host.
- takes about 90 seconds per server
- goes with a matching "destroyDroplets.yml" playbook

---
- name: provision servers

  hosts: local

  vars_files:
    - ./vars/droplets.yml
    - ./vars/sensitive.yml
    - ./vars/sshInfo.yml

  tasks:
  - name: Get DigitalOcean's ID of ssh key
    digital_ocean:  #note avoidance of = signs...
      command: ssh
      state: present
      name: "{{ sshInfo.do_ssh_key_name}}"
      api_token: "{{ sensitive.do_token }}"
    register: my_DO_ssh_key
    #
  - name: make droplets, if they don't exist already
    digital_ocean: >
      state=present
      command=droplet
      name={{item.name}}
      unique_name=yes
      size_id=512mb
      region_id=lon1
      image_id=ubuntu-14-04-x64
      ssh_key_ids={{ my_DO_ssh_key.ssh_key.id }}
      api_token={{ sensitive.do_token }}
      wait=yes
    with_items: "{{droplets}}"
    register: droplet_details
    #
  - name: Add named droplet to  group newServers #   variables set user (needed), use right key, stop wretched dialog with known_hosts
    add_host: >
      groupname=newServers
      hostname="{{ item.droplet.ip_address }}"
      ansible_user=root
      ansible_private_key_file="{{sshInfo.local_private_ssh_key}}"
      ansible_ssh_common_args='-o StrictHostKeyChecking=no'
      WPL_server_info="{{item.droplet.name}}"
      otherServers="{{droplet_details.results}}"
    with_items: '{{droplet_details.results}}'
#
- name: set up servers
  hosts: newServers
  tasks:
  - name: install packages
    apt:  >
      name={{item}}
      state=present
      update_cache=yes
    with_items:
      - apache2
      - libapache2-mod-php5
      - git
  - name: remove existing web stuff
    file: >
      path=/var/www/html/index.html
      state=absent
  - name: set up index
    template: src=./siteStuff/index.html dest=/var/www/html/index.html force=yes
  - name: start Apache
    service: name=apache2 state=running enabled=yes

...

If you want to use this, you'll need a DigitalOcean account (get yours here), a DigitalOcean API key, a public/private key pair for ssh (and you'll upload the public one for DigitalOcean to use as you set up, a bunch of configuration files that can be inferred from the playbook, and a template for a web page. Wait about and I'll post them.


Sunday, August 04, 2013

Black Box Machines: Puzzles 7 and 8

Just like busses, you wait for ages, then two turn up at once. Here are Puzzles 7 and 8. The similarities with Puzzle 6 / 6a are entirely intentional.

Email or DM me your simple-as-possible description of what it's doing. If it's close to reasonable, I'll send out a kudos tweet to the world. I'll stop after a few.

You'll note that there is a new "Camera" icon. This isn't an intentional part of the puzzle (though I expect there's an unexpected interaction with something in Puzzle 8). I've added a facility that I'll be using at my upcoming EuroSTAR tutorial: it lets you have a different perspective on what you've been doing. I'd be delighted to hear your views and discoveries. It'll need some improvement before it hits primetime, and I don't expect to get it right without your help.

Cheers -
James




Here's a big Puzzle 7 and a big Puzzle 8, if you prefer.

Monday, March 11, 2013

New Black Box Machine – Puzzle 6a

Puzzle 6 seems to have been a bit "easy" for some.

Here, then, is Puzzle 6a. Two more buttons.

Again, email or DM me your simple-as-possible description of what it's doing. If it's right close to reasonable, I'll send out a kudos tweet to the world. I'll stop after a few. Note: I won't be online much over the next few days.

Puzzle 6 had good descriptions from David Greenlees (@MartialTester, martialtester.wordpress.com), Jahira Banu (@ajbanu), and Vince Seese.

My Exploratory Testing Workshop in the Netherlands went rather well. I'm wondering where to do the next, and I'm considering June in Eastern Europe or July in London or Oxford. If you want to tell me where to go next (and get a hefty discount when registration opens), tell me over here.

 Cheers - James


Here's a big one, if you prefer.

Monday, February 25, 2013

New Black Box machine (Puzzle 6)

I've cobbled together a new black box machine, for your amusement.

If you email or DM me your simple-as-possible description of what it's doing, and if it's right, I'll send out a kudos tweet to the world. I'll stop after a few.

Report bugs widely, but please include me on their distribution. I know of at least three: one with fonts, another on resize, and the differing UI on the logo/? buttons.


Here's a big one, if you prefer.

I think I may have finally got far enough with the transition to AS3 (which may make it simpler to port to HTML5).

If you like this, get the last seat on my Amsterdam workshop. Orange button, top left of the blog.

Cheers - James

P.S. Apologies for the formatting problems which I assume will turn up here. MacJournal's decided to change how it blogs, so I'm using Blogger directly. Yikes. Formatting problems, default behaviour troubles, "can't save" warnings. Shocker.

Friday, December 14, 2012

Modelling super powers

tl; dr it's not just the tactics that matter


In this scenario, I've modelled five testers. Each has a super power.

  • One logs bugs more easily than the others – effectively, they log as many bugs as they can see. The others can only log 10 bugs for each bit of budget they consume.
  • One only logs big bugs – bugs with a cost of 10 or more. The others log any bug they find.
  • One learns three times more effectively than the others.
  • One switches tactic twice as often.
  • One finds it easier to retain their skills after switching tactic.
What difference might each of these qualities make?

Run the exercise a few times. Make some changes. You may find it easier fullscreen, and of course the XML is available to play with. Does the model match your experience?

More to the point, perhaps, how are you comparing the different testers in the model? How do you compare real testers on your team?

Friday, December 07, 2012

Diversity matters, and here's why

tl;dr – It ain't what you do, it's the way that you do it


We've got a model which tells us we have an hopeless problem. I promised some perspective.

Let's try throwing people at our problem. In the exercise below, we're using five testers. If a bug has a 1:100 chance of being found by one tester in one cycle, surely five testers should have a better chance.*

How much better? Run the thing and find out.


Less than impressed? That's because hard-to-find bugs are still hard to find, even with a few more people. Your one-in-five-million shot is not going to jump into your lap if you've only managed to make the chance of finding it one-in-a-million.

There's a key quality I've not changed in this model. We've said that some bugs are harder to find than others. We've not yet mentioned, or modelled, that my Mum merrily finds problems that have eluded me. The way that you don't see my bugs on your machine. The way that performance testing jiggles bugs by the bucketload out of systems which seemed to be working just fine, or the way that unit testing and usability studies find (mostly) entirely different collections of bugs.

Our model should reflect that any individual bug** should have lots of different likelihoods of being found. For this model, we're going to make the choice of likelihood depend on the tactic that is in use. Indeed, for this model, that's what differentiates and defines "tactic" – changing tactic changes the distribution of likelihoods across the collection of bugs.

Below, you'll find an exercise which again has five testers working in parallel. This time, each tester has their own individual profile of chances, and a bug that one finds easily may be much less likely to be found by another.

In the model, we do this by setting up tactics. Each tester has one tactic, which they use exclusively. Each tactic is set up in the same way across the full population of bugs – it's just a distribution of probabilities. If you were to look at one bug, you'd find it has five separate probabilities of being found. Have a play.


The difference is clear.

Diversity matters*. In this model, it matters a lot; more than budget, more than the number of testers.

For those of you who prefer analysis over play, also clear if you think about the chances of finding an individual bug. Tactic 1's million-to-one chance bug may be a billion-to-one for tactic 2, too, but tactic 3 might well see it as a hundred-to-one. Ultimately, the no-chance-for-that-bug tactic would continue to have no chance whatever your budget (or patience), but by having many tactics, one increases the chance of having a technique that with easily find that particular bug easy.

QED – but I hope the toys help the demonstrandum at least as much than the argument.

Note that a key assumption hidden in this model of diverse approaches is that the different tactics are utterly different. In the real world, that's hard. There's plenty of refinement to do to our model to make it a more accurate reflection of the world. However, the central idea remains: in this model of discovery, you get much more from changing your variety than from changing your effort.

This then is the perspective – in this exploratory environment, persistence is what leads to hopelessness. Variety gets you closer. Just for fun, here's a model with the five tactics, and just one tester – but this tester can switch tactics. I'll be mean, so they switch randomly, and each time they switch, their skill slides backwards. Look at the poor beggar ticking away; hardly ever gets over 50%.

See how well this works with just one tester.


One random tester does better***** than five monotonic testers? You're surprised by that conclusion? Enough with the rhetoricals: I have (metaphorical) knobs for you to play with.

The sharp-eyed will notice an extra button – I've finally given you a reset. Indeed, this is a rather more interactive machine than you've had so far – you can change the number of bugs and the cost model. You can also give (not entirely reliably) the machine a (not entirely reliable) "seed" to start from as it builds the model, which lets you replay scenarios. Be aware that the I've not sorted out a fully-intuitive workflow around set/start/stop/change/reset, nor have I tested this well (it's mine, and I'm too close to do a job to be proud of). I'd appreciate any feedback – be aware that behaviours may change in the near future.

If you want to dig deeper into the model, I've made a change that allows you to play with the machine offline. Download the .swf, and the Exercise.xml file from the same directory. Bung them in the same folder on your own machine, and the .swf will pick up your local copy. Have a play with Exercise.xml and see what you can come up with. I'll share interesting exercises and conclusions here, and you're welcome to post them on your own site. I'd like to hear about your postings, because then I'll know who to send updated machines to. I'll open-source this sometime,

There's lots further one can go with this model, and over the next few posts, we'll explore and illustrate the effects of some common constraints and balances.

It looks like I'll be teaching exploratory testing in Amsterdam early next year. I'm just about to set the dates. If you want 30% off the price for simply telling me you're interested, you've got a couple of days to catch the opportunity.

Cheers -

James



* maths people will know; (1- (1-0.01) ^ 5) ~ 4.9%, which is just a tad more unlikely than a 1:20 chance.
** for the purposes of this explanation, let's assume we can identify at least one bug.
*** this panders directly to my prejudices, so I'm pleased to reach this conclusion, and find it hard to argue against effectively. I'd be grateful**** if you felt able to disagree.
**** through gritted teeth, but still grateful.
***** better?

Wednesday, November 28, 2012

Enumeration hell

tl;dr some bugs are beyond imagining


"Rational people don't count bugs."

There's a rash statement. Let's say that rational people who do count bugs ought to count other, less pointless more meaningful things, too.

Bugs* are rotten to count. There are plenty of posts** about this, and I won't go over the same ground here. Counting bugs is a bit like counting holes – superficially obvious until someone takes a shovel to your cheese.

But the big problem with a bug count is that it summarises a potentially useful collection of information into a number that is mostly meaningless. A single nasty that makes the wheels fall off is worth any number of niggles that make the horn too loud. Unless you're driving a clown car.

In our idealised model, we're counting surprises because it's interesting to see how many are left. None is none on any scale, and if there's none, we're done. We're still not done if we've got one left, because that one might be a stinker.

You've noticed that I've only given you one knob to twiddle*** on these toys. You only get to change the budget – you don't get to change the context****. This is a cheap manipulation on my part, because I've been asking you to concentrate on where you might set that budget to feel reasonably confident that the thing is tested.

So far, we've not considered bug stink in our model. It's time that changed.

In the same way that our model gives each bug a chance of being found, it gives each bug a quality I'll call cost. That's probably not the best word, but it's the one I've chosen for now*****. I'll give it a local meaning. Cost is the amount by which the value of the system goes down when it contains the bug. Quality is value to someone. Trouble makes that value go down. Cost, here, is not cost of fixing the bug. It's the cost of leaving it in, and it's the cost to the the end users.

Bugs aren't made equal, so we'll need to consider a distribution again, this time of (our local definition of) cost. Experience leads me to believe that most bugs have low cost, some bugs have higher cost, and a very few (so few that they might not exist in a given system) have astronomically large costs that outweigh the value of the system.

In earlier examples, each bug had the same cost. The distribution I've chosen to use in this model, to match my experience, is called a "power law" distribution. Power law distributions fir lots of things observed in the real world, such as city sizes, distribution of wealth, and the initial mass of stars. Power law maths underlie the Pareto Principle (aka the 80:20 rule), and Taylor's Law****** (and , more incomprehensibly, phase changes). If you want to dive into this, set your head up with this handy note comparing the similarities of Power/Zipf/Pareto in a real (if rather antique) context.

Why have i picked this distribution? Because it feels right. Instinct is no justification, so you can expect that we'll have a look at other distributions later. For now, though here's a fourth assumption:

4        The cost of a bug to (all the end users over the life of a product) has a power law distribution.

Enough of the hands-waving. Let's play.

Below you should find an identical machine to last time's closing toy, but with costs set to match a pareto-style distribution. You'll quickly see that there are two "stuff found" numbers, and that the size of the yellow dot is related to the cost. Run this a few times.


Don't be surprised if, occasionally, you see a simply huge yellow dot. Try hovering over the top right of the square set of 400 circles, and click on the ? you see to reveal a god-like understanding of how much trouble this system is hiding. Know that, generally, you'll see the total trouble is around 1000*******. If you see around 2000, expect that one of the bugs has a cost of 1000. If you happen to see around 11000, you've probably got a fat 10K bug hiding away.

In our most recent outing, I hope you got a feel for why it's hard to use a bug rate to say that you're done testing. If you play with the models in this posting, you may get an idea for how 'not done' feels in terms of the cost of what you've left behind.

I hope you're still considering where your omnicognisant self would set a reasonable budget so you could say with confidence that you'd done enough. Have a look at the left-hand graph of what's been found. It's still very front-loaded, but you'll see the occasional big spike as a particularly troublesome bug is revealed.

Let's rack up the difficulty another notch. I set up the model above so that the budget and the bug distribution meant that you got to find most of the bugs in a relatively brief exercise. Of course, that's no use at all. Here's another; more bugs, smaller budget. Crucially though, in this model plenty of the bugs are very hard to find indeed. You're not going to find the lot, so that's what this model looks like.


Hopeless, isn't it? If the real world looks anything like our model, how can anyone be bothered to give a sensible answer when asked to set out a budget?

Next time, all being well, we'll approach these frustrations sideways on. We won't find clarity, but we may find perspective.


* I'm not going to define "bug", because it's a vague word, and therein lies its power. But if there's a scale that runs through vague to countable, then I suggest these two ideas are at opposite ends.
** Try Michael Bolton's Another Silly Quantitative Model and Elisabeth Hendrickson's What Metrics do you use in Agile.
*** there's lots more interactivity to come. For now though, mull on how it must feel to be a leader whose only effective control is over budget-setting, then be nicer to your poor distant Chief Money Officer next time.
**** suggestions accepted, but without any guarantee they'll be used.
***** "Law" appears to be used by some scientists in a similarly-imprecise way to the way some lawyers use "Proof". Business people naturally choose to use both words with abandon. I would treat the word "Law" here with as much scepticism as you might treat it in Moore's Law. They're empirical laws, and describe, rather than necessarily account for, system behaviour.
******* 1000 what? I don't care. Stop your whining and go count the number of things in this list.

Monday, November 19, 2012

Models, lies and approximations

tl;dr – Some of these bugs are not like the others


Here's hoping you've enjoyed playing with trucks and bowls and your imaginations. If we're going to be able to use our model as an illustration of much value , we have to recognise that in relation to software testing it contains a useful approximation, and a misleading lie.

There's a limited collection of things to find. This can be a useful approximation for exploration in software testing – if one assumes that a system has a finite collection of possible behaviours, then the set of possible but undesirable behaviours is limited too (as compared with the vast set of rubbish things that it doesn't actually do). This is good to consider when told "there's always a bug" by an idiot*.

You might further refine this by adjusting your view from the large mass of observably rotten behaviour to the smaller selection of practical changes that make the system more desirable. You'll also recognise that the collection, while limited, is only fixed if everything else is fixed. In our model, the collection of bugs is fixed – so we need to be clear that the approximation and the model assumes that, just of now, no one's changing stuff**.

The rate of finding things falls, as the number of things that can be found falls. This is obviously true, but is perversely also a misleading lie***. Idiots (sometimes, the same idiots who believe "there's always a bug") assume, because of the statement's obvious truth, that when the rate of finding bugs drops, the system is close to clean. Bonnnnng.

Sometimes, and it feels like often, it's because the people testing it have run out of imagination. While we may understand more as we reveal more, and while a system may become cleaner as it gets fixed, a dropping bug rate certainly does not imply you've found all the bugs.

Some testing is done by an unbending list of dull asserts, which run clear and green when they've not managed to either trigger or observe any trouble. Michael Bolton reasonably calls these "checks" rather than tests. Some testers, limited by corporate focus or personal blandness, don't do much better, demonstrating simply**** that a system meets expectations.

As any fule kno, some bugs are harder to find than others. If you find you've run out of bugs, it's likely you've run out of bugs that you're set up to find. Sometimes, that's OK. But sometimes, a bug that's hard for you to find is easy for someone else to find. If that someone else isn't a paid tester, but is heaven forfend, a paying customer, we get the "why didn't you find that" conversation.

So, then. A couple of approximations for this model.

1        Some bugs are harder to find than others.

I'll model this by giving some bugs a high chance of being found, and other a low chance. The way that easy-to-hard works amongst the bugs is called a probability distribution. We can pick a distribution. Our earlier example, the trucks and bowls, would be a fixed, or flat distribution, where everything has the same chance, because we assume that trucks/bowls are effectively equal within the model. That's the stupid but easy assumption that lies under the misleading lie. Bugs are different.

2        We don't have a find-fix-retest cycle in our model. Nothing is being changed.

This makes the model easier to understand, because we're not looking at two things going on at the same time. Of course it's inaccurate. The trick is to use the imagination to wonder how that inaccuracy might appear. Models should be visceral, so go with your emotion if you don't want to be analytical.

Finally, a wrinkle. Exploring is about learning. As we discover more, we get better at discovering, not worse. We start out rubbish, and one trick that distinguishes good testers is how quickly they get better (not how good they start). This leads us to

3        Everything's harder to find early on.

In our model, we have a tester. The chance of them finding any given bug starts at a particular value (0, say) and increases. In this model, it increases over time, and it's much easier to go from nothing to halfway good than it is to go from halfway good to perfect. There are lots of different ways of modelling this – again, use your imagination to think how the model might change.

So – here's a model of a tester discovering bugs. I've fixed it so that there are 400 bugs to find, but some are harder than others. The tester gets better over time.


* Not that they'll listen. Indeed, that's close-on the definition of an idiot, which is apparently from a Latin word meaning "ignorant person". Clearly they are, if they're ignoring you.
** I'm aware that this is impossible and in many ways undesirable in the real world. My model, my rules. Just making them explicit.
*** something to have in mind whenever someone says "obviously"
**** but oh, in such a complicated way

Friday, November 02, 2012

Broken Trucks

tl;dr – you still need your imagination, even with real-life examples


Temporary note – the truck graphic has gone, the graphs are back. I'll remove this note when I restore the graphics...

Magic pennies? Pshaw.

Let me put this another way.

This problem has been put to me frequently in my testing life. Here's one close-to-actual situation.

My client has a hundred trucks. Each has a bit of kit, and I've ben told that the bit of kit needs to be replaced occasionally. Actually, not so occasionally – it's new kit, and I'm told that it's likely to fail at least once in the first hundred days use.

So, how many trucks will experience that failure in their first hundred days? All of them? Also, how long should we test for? How many rigs should we use? How reliable is that suspiciously-round 1 in 100 figure?

As it happens, there's a bit of maths one can do. If the chance of a truck failing is 1%, then the chance of it not failing is 99%. The chance of it not failing for 2 days in a row is 99% * 99% (just over 98%). For 3 days, 99% * 99% * 99% (a tad over 97%).

Can you see where I'm going? The chance of a truck not failing for 10 days in a row is 99% * [99% another 9 times]. That's 99%^10.

For 100 days in a row, it's 99% ^ 100. Which is about 37%*.

So after a hundred days, I'm likely to still have 37 trucks, more or less, that haven't failed yet.

Which makes around 63 trucks that I need to go and mend**.

The maths is satisfying, but it doesn't tell me any more than the question I was first asked. Nonetheless, we know that all good testers have an practically unlimited supply of extra questions to ask, so we're probably not completely satisfied.

However, if go grab my hi-viz jacket and get to work on the trucks, I'll get a better idea of what happens. I'll find that some days everything works as well as it did yesterday, and occasionally three new trucks phone in failed. I'll get an idea that I'll see more failures when there are more things that work – so as the period goes on, I'll see fewer and fewer. Some trucks could go on for ages (I'm sure that you've all heard of immortal lightbulbs, too. Survivorship bias – mostly.)

Working on the trucks allows a visceral, complex experience. It takes a while to get, it's not terribly transferrable, and it's hard to forget. You know it deeply and in many different ways. You are "experienced". The maths approach is different; the result is ephemeral, and you may remember the method more easily. To imagine its implications, you'll have to think hard. You are "expert"***, and because you can remember the method, you might be able to re-apply it in a different context.

In between these two, there are models and simulations. Models aren't reality, but neither are they primarily symbolic (at least, not on the outside). I hope that the right model might engender something between experience and expertise. For what it's worth, I think that asking "How long should I test for to be confident that I'm not going to see problem X much in real life" is a fair question, and I think that "It depends" is a rotten answer without some idea on what "it" might depend.

I've given you three machines below. 10 trucks, 100 trucks, 1000 trucks. I've knocked out various noisy bits, but it's otherwise the same simulation. Have a play. You can change the budgets. Think about what the frequency of failure tells you, especially over time. While you play, just have in the back of your mind the ways that this kind of failure differs from the failures that we discover when exploring...

Right now, I'm posting this from EuroSTAR – it's looking good! Follow @esconfs on twitter, or watch for the #esconfs hastag. And @TheTestLab, of course.


 
 * We're assuming here that a once-broken truck is no more likely (or less likely) to break down again. We're also assuming that the non-broken trucks are at no greater chance of breaking. In one of the cases I'm thinking of, the "broken" truck was entirely functional as far as most people wee concerned, so the broken trucks didn't get less use, and the working trucks didn't get more use. If you're thinking of an un-enlargeable fleet of trucks with broken axles, we've got different models.
** If I'm swift to mend, some of these probably will have needed to be mended more than once.
*** Nobody said that being experienced and being expert were mutually exclusive. You can be both, you can be either, most of us are neither outside our fields of interest.



Thursday, November 01, 2012

An experiment with probability

It's been a busy day.

So, just for now, here's a very abstract experiment (and I'll give you the kit to play with the experiment)

Imagine you've got a hundred bowls in front of you.

In each bowl, you've put a hundred pennies – 99 dull ones, and one magic one.

Every day, you get to look at one random penny from each bowl. You drop the penny back in its bowl when you've looked.

If you had a hundred days, how many magic pennies might you see?

If you wanted to see all the magic pennies, how long would you plan to spend to be reasonably confident? How confident is reasonably?

If you want to work this out, do. You could find out empirically and viscerally, too, but you don't necessarily need a hundred quid in coppers and a couple of seasons: Play with the thing below. There are a hundred purple circles, that go yellow with a chance of 1:100 every tick of the red number. You can change the red number before you start. Press the arrow in the circle to set things going. Treat "work done" as "days spent" for now – no one thinks finding magic pennies is a real job.

Cheers -

James

Wednesday, October 31, 2012

Resurfacing...

...not in the sense of putting a new veneer over an old worktop, but in the sense of one's head breaking into fresh air after a long, deep, dive.

There may be a bit of bobbing up and down and gasping for a while.

I have a daughter. A new one. The first one, for me. Indeed, the first child of any flavour for ether my wife or I. Very new. Very lovely. We're all happy and healthy, all's good. Those of you who want to know more may know that I have pictures on tumblr, and some of you already have the password. Thank you for all the good wishes. Buzz me (directly, not here) if you want access. I'm trying to keep specific details off the internet until the whole identity/privacy thing shakes itself down. Or until she's old enough to vote for herself. Whichever comes first.

Anyway, excuses aside, adult life comes back in with a crunch next week.

I'm on the program committee for EuroSTAR. Along with the other committee members*, I'll need to be visible and available and therefore I'll need to be in Amsterdam form Monday to Thursday. That'll be a shock. One of the reasons I'm here is to get the dust off the testing neurones – I've spent a few weeks mainly thinking about babystuff.

We were very enthusiastic about the program in Galway in March. The EuroSTAR elves have been working like crazy to put together the actual conference, and it's going to be excellent to see it happen. These days, I tend to spend my time at conferences in @TheTestLab. I can't do two things, so Bart is running it with Martin Janson (Martin is one of the TestEye bloggers, worked with us on the EuroSTAR TestLab in 2010, and helmed a fantastic TestLab at Let's Test last May). I can't think of better hands for it to be in. The TestLab will be, we're told, central to the conference, and (at last!) easy to find. I'll be spending time there, if you want to come and find me, but I'll also be going to tracks and sessions, having conversations in corridors and bars, and basically making a nuisance of myself.

It's worth noting that, after the conference theme "Innovate: Renovate" was announced, both Shmuel and I told our colleagues that our wives were due in the weeks before the event. Indeed, I think they had the same due date, so both of us knew while we set the theme, but neither of us could say. Shmuel's done the Dad thing rather more (hats off to him) so I'll be the one looking more startled. Of course, the other difference is that he has a beard. It's harder to look startled with a beard.

Then, for those interested in my life in a furry hat, I charge back to London and change costume and language. After singing on one of this week's top-ten soundtracks (we're on Halo 4) on Friday 9th the London Bulgarian Choir will hit the ground running for their big gig of 2012. It's also the first gig for the choir after their leader had a baby** and we have a whole new way of doing a show. Stories throughout the first half, a wedding in the second half, presumably breastfeeding in the interval. We're at a gorgeous 500-seat venue in central London. In 2010, we sold it out to the last chair. This time, who knows... Anyway, here's a facebook event (one of at least four doing the rounds). Invite yourself.

Last thing on my list – I'm playing with systems again. I should have something a bit special for you, starting tomorrow. For now, let's see if I can get flash to embed here...

Cheers -

James





* Zeger van Hese is our program chair, and Julian Harty and Shmuel Gershon are the other committee people. Here, look.
** Same baby as my baby? Same baby. Our baby. Blimey.