Saturday, October 11, 2014

What is an Appropriate Software Testing Standard?

Recently, I wrote a blog at the Software Testing Club on why I didn't sign the petition opposed to the ISO 29119 standard. In the blog, I laid out my opposition to the manner in which the ISO organization developed and published the standard. The process should have been much more visible with free drafts of the standard available for public review. As a result, I took the stance that I would simply not review the standard - either by paying for draft copies of the standard or by reviewing second-hand descriptions. That led me to the difficult position of not being willing (or ethically in a position) to oppose it.

The ensuing discussions in the comments were lively and interesting. I invited - and actually promoted - criticism of my decision in order to gain some insights into the discussion. Many of the answers focused on the mechanics of getting me (and thus others) to sign the petition, mainly based on the limited information freely available. I agreed with many of the arguments and actually said many times that indications are that I would oppose using the standard when it actually was published. All of the comments were valuable and I thanked everyone for contributing.

One extremely interesting topic that I think hasn't gotten a lot of press is whether testing can never be standardized (or regulated) in any way. In fact, a couple of times I actually asked that question explicitly. In general, no one was willing to tackle this question with a common answer being that testing is a purely intellectual process. That answer is helpful in separating the content of software testing from the goals or intent. One person actually said that they felt software testing could never be standardized. I laid some bait by actually stating the opposite stance: I had no problem with industries such as automobile manufacturers, militaries, and hospitals standardizing their software testing. (Note the lack of any qualifiers on my part!) Regretably, nobody took the bait. :(  That could have led to an interesting discussion to shed some light on the boundaries of the topic.

I think the question of standardization is an important question for two reasons. First, there are many heavily regulated industries existing now that governments and industries feel compelled to regulate in general. Secondly, the specific topics of automation (such as drones and self-driving cars) and machine intelligence will lead inevitably to the regulation of  software content. Will that regulation spill over to mandating testing for machine intelligence or control? How does that differ from mandating testing practices? Where do they overlap?

One point I would like to make here is that I don't differentiate between industry standards, government regulations, or internal corporate guidelines. Each address a given context (an important point that I will expand on later) and all mandate behavior of teams (developers and testers) in some way. The only difference between them is the authority behind the mandates. As a result, for the purpose of this blog, I will refer to them all as "standards" in a more generic sense.

There are many blogs written opposing ISO 29119 that give some interesting insights into the boundaries of standards. In general, the authors don't oppose standards in general. One interesting point was that many appropriate standards deal with interfaces and not content - very similar to the definition of software testing as an intellectual process mentioned earlier. That is a very important point - almost every standard I could think of did not deal with mandates of content or behavior and only with the interface. A good example of this is the US government Section 508 standards for accessibility. Although they contain a shopping list of development coding practices (such as using the alt-text fields in web development), these are treated as development suggestions and the overall implementation concentrates on functional testing of the products with actual users.

Based on what I have determined so far, here is a list that (to me) measures whether a proposed software testing standard would be appropriate (any criticisms or suggestions for expanding the list would be appreciated):

  • Measurable - The standard should clearly outline the problem or issue it is attempting to resolve. It should be stated in a way that allows the creation of one or more metrics that can be used to determine the effectiveness.
  • Context - To be effective, a standard has to have a clear context that shows the limits of the intended domain of the problem written above. If the standard is then extrapolated to other contexts, then some statement of the new context and problem description should be created. An example would be the US military adopting a standard to address weapon safety that was originally written to address patient safety in a hospital outpatient setting. This implies that a context of "universal" is inappropriate for a standard.
  • Interface vs Content - The standard should address target goals and end effects and not mandate specific processes or methods. An example would be mandating sufficient testing to insure PII is protected in a banking industry environment. As with the Section 508 standards, it may address some suggested methods or practices for helping to achieve the goals, but implementation of the practices should not be used to measure compliance with the standard.
  • Temporal vs Indefinite - Another indication of an acceptable standard is a mandate to evaluate the standard at pre-defined periods to evaluate the effectiveness and potential changes (e.g. every four years or so). That gives the targeted industry time to prepare potential changes and collect metrics to support those changes.

Based on what I know of the proposed ISO 29119 standard, it violates every one of these criteria I have outlined. Is that enough to finally sign the petition opposing the standard? I'll have to think about that.

Saturday, February 22, 2014

Top Ten Rules for Software Testing

This was a serious post at the Software Testing Club. Instead of piling on the discussion, I decided to vent here instead:
  1. There are only 10 types of testers, those that know binary and …
  2. Never marry a tester for their money.
  3. Never work for a company that hires you for your test case writing skill (akin to “Never get involved in a land war in Asia.”).
  4. All developers know where the lines are.
  5. Always test outside the lines.
  6. There are no lines.
  7. All of the real rules are written in pencil in a plain black notebook in a corner shelf in a back room of your company. On the door is a sign that says “Beware of the Leopard”.
  8. Never trust a tester who follows those rules.
  9. Water pistols are not just useful for training cats, but are also effective on people who utter terms such as defect-free, always, never, complete, 100%, and “testing end date”.

    And of course …

  10. Never ask a tester what they would do in any general case, because it always depends on the context and they end up not taking it seriously.

Saturday, January 11, 2014

Regression: A Test By Any Other Name ...

I recently posted a blog on how context influenced my analysis of a book I recently read: . In the post, I referred to a discussion on context I had read a long time ago in a book on web services. (I think it was titled "Web Services", but weren't they all?). At that time, the state of web services was immature to say the least: many implementations were essentially the same old stove-piping that was repackaged in a new wrapping. The author referred to these as "the Legacy" and went on to identify other contexts that he would discuss in the book:

  • The Legacy refers to the way things were done in the past. ("We used to ...")
  • The Now refers to the way things are currently done. ("We are ...")
  • The Future refers to the way things may be done in the future. ("We (eventually) plan to ...")
  • The Ideal refers to the ultimate concept of how it should be (ideally) done. ("We should ...")
The author made the point that mixing these contexts in the same discussion can lead to misunderstanding and confusion. I have found that to be a good way to analyze arguments both online and offline. When someone starts switching contexts in the middle of an argument, it is time to call them out on it before proceeding.

In that post, I associated software regression testing to the Legacy and test planning to the Future. After thinking about it, I decided that test planning actually belongs to the Now with the reasoning that anything that can be conceived of Now belongs to the Now. I added a comment to the post that the Future belongs in that area of mind maps and test plans that should be labeled as "Beyond Here Lies Dragons", to borrow a 16th century map legend.

Then I looked at regression testing. Does regression testing belong in the Now? It is certainly part of test planning, but what exactly differentiates it from "normal" testing? Why even use the term if it is simply "testing"?

There are probably twice as many definitions of regression testing as there are testers in the world. As a result, I will be talking about what I have referred to as regression testing on previous software efforts. Specifically, I define regression testing as having the following characteristics:

  1. It consists of tests procedures that have previously been identified as useful for automated checks. These were once tests that were converted to checks for specific procedural paths, what I like to call "trip wires".
  2. These automated checks have been incorporated into one or mores suites of automated checks that are run on a regular basis during feature development for the application under test.
  3. The automated checks are run "as is" with no metrics associated with them. In other words, a successful  run of the automated checks is not used to determine the relative maturity of the current feature development. (Thus, the association as "trip wires".)
  4. An unsuccessful run of the automated checks is investigated to determine the reason for the failure. Once the cause is determined, one of the following actions is performed.
    • The check is re-run to determine if the failure is intermittent.
    • The check is modified to make it more robust or to change the procedure to reflect recent (approved) changes in the software.
    • The check is retired as no longer useful in the context of the current software business rules. This is done when the changes would essentially create a new check. Instead, the new check is created and the old check is dumped.
  5. The population of automated checks are controlled to retain only the most useful and is culled sufficiently to support maintainability and run time requirements (in the case of GUI automation checks.
Note that even though each individual script or automated procedure is technically a "check", the overall interactive investigation of check failures of characteristic #4 make the overall process a "test". As a result, I refer to them as "checks" or as "tests" depending on the context of the discussion I am having, sometimes using both terms in the same discussion.

Finally, the nature of the automated checks as retention of the legacy procedures in the current environment firmly places them in the context of the Legacy. Essentially, they answer the question "Does the current application build act the same as the legacy version in the context of the legacy test procedures. Their use in the test planning of the Now makes it necessary to separate them from other tests based on this Legacy context.