Monday, March 31, 2014

Some Thoughts Triggered By Yet Another Bug-Ridden Applicant Tracking System

Around 2008, I became interested in Applicant Tracking Systems or, rather, what I thought back then an ATS should do and how it should do it. I even toyed with the idea of developing my own, but the interest must have been not strong enough for this idea to go beyond a modest in-house working prototype used to train students. I am still interested in this type of software applications, which now manifests itself mostly in what I call a recurring irresistible itch to find bugs in them (Once a Software Tester, Always a Software Tester). So, here is one for your... amusement.



Let's say you are an employer using an ATS from a reputable SaaS (software as a service) provider.

A job seeker visits your web site, goes to its career section and from there is taken to your ATS.
    Note: Technically, it isn't really your ATS since you just "rent" a "slice" on a multi-tenant ATS provided by a SaaS vendor. The applicant may or may not be aware of the fact that he/she is using third-party software, which depends, among other things, on how tightly the ATS is integrated into your web site and how familiar with this type of systems he/she is.
The candidate registers, begins the job application submission process and, a few minutes later, sees something like this:


Screenshot 1 (click to enlarge)


In case you didn't get it, let me show you another one. The screenshot below is from the site of another company, but the ATS SaaS provider and the bug are the same (pay attention to where the red arrows are pointing):


Screenshot 2 (click to enlarge)




Here is what happened (the visible part):

When parsing an uploaded resume, the ATS populated the date fields in the "Education" section of the on-line job application incorrectly. The user corrected the values in the "Graduation Date" fields manually and submitted the HTML form. The form returned a validation error stating that the "Graduation Date" is before the "Start Date". Of course, the requirement that graduation dates must be greater than start dates would be perfectly logical if it weren't for one little problem: there are no "Start Date" fields on the above forms.

The only ways around this turned out to be
  • either to delete the pre-populated education records and re-create them manually
  • or to clear the values in the "Graduation Date" fields (luckily, they are not required fields) and save the records without the dates.
Neither of those two workarounds are obvious unless the user happens to be a compulsive software tester like yours truly.



If you are curious what happened under the hood, read on, but first, without getting too technical (and, to some extent, intentionally oversimplifying things), let me explain how these software applications work.

1. There is absolutely no "rocket science" behind any software that collects data submitted by users (that is precisely what an application-submission part of any ATS does). It's just a bunch of input controls (e.g., text boxes, text areas, drop-down lists, radio buttons, check boxes, etc.) that a user can fill in, check/uncheck, select/deselect, etc. They are grouped into one or more HTML forms. Each control instance is mapped to a database column. When a user fills out, say, a registration form and clicks "Submit", the values from the fields on the form are written to the corresponding columns of a database table.

For example, a user fills out something like this:

    First name:

    Last name:

    E-mail address:

    I accept the terms and conditions... blah-blah...

When he/she clicks "Submit", a SQL script similar to the one below
    INSERT INTO candidate (firstname, lastname, email, acceptterms) VALUES ('John', 'Smith', 'jsmith@example.com', 1);
runs on the database server and populates the candidate table with the values from the HTML form (we assume here that id values are generated automatically and the "1" in the acceptterms column stands for "yes"):
    id firstname lastname email acceptterms
    12345 John Smith jsmith@example.com 1

2. Most ATS today allow some level of customization. As you may have noticed, the above screenshots (representing two different companies that use the same ATS) look very similar, but they are not exactly the same: the second one has some fields that are not present on the first.

In order to provide clients, also referred to as "tenants", with the ability to customize a multi-tenant application, the most common design approach is to create a hugely redundant data model that can accommodate all (or, at least, the most common) business practices of all (or, at least, the majority of) prospective clients. Simply put, it means that the database behind an ATS can store almost any data: human names, company names, e-mail addresses, phone numbers, dates, social security numbers, academic degrees, GPA, security clearances... hair colors... numbers of missing teeth... (I am slightly exaggerating, of course). In addition to the pre-defined columns, there may be some "spare columns" to store something the software designers never though of. All those columns are mapped to corresponding input fields. Customization basically means that each client can choose which of the pre-defined fields to use. On some systems clients may be able to rename pre-defined fields. Some systems may also provide user-defined fields that correspond to the "spare columns" mentioned above. Some systems allow clients to modify the default grouping of fields.

So, whatever minor differences in how this customization is done on specific platforms may be, en général, clients "assemble" job applications from what their ATS SaaS vendor makes available to them. Fields that are not used simply do not appear on the HTML forms. The data model behind all client-customized interfaces on the same ATS is exactly the same. In most cases, the actual physical database is shared by all clients as well.

3. Another thing that is shared by all clients is the ATS resume parsing engine.

Most ATS today claim to be able to parse resumes and automatically populate on-line job application forms. Since about 2008, when I became interested in web-based ATS, I have checked out dozens of them, and I have not seen even one that does it well, but that is not the topic of this post (besides, to be fair, I must say that it is not as easy to do as some may think).

Anyway, the resume parsing script of an ATS tries to split uploaded resumes into elements that fit the database structure underneath the ATS, which, again, is the same for all clients. The parser is not aware of any customization made by individual clients.

Usually, an ATS will allow candidates to fill out forms manually without uploading their resumes, but most candidates are likely to prefer that at least some of the work be done "automagically".

Because all ATS vendors know that their resume parsing scripts are far from perfect, they pre-populate application forms as best as they can and let candidates edit them before final submission.

4. Before input values are written to the database, validation occurs.

Its primary and the most important purpose is to make the ATS (or any other application that involves user input) secure and prevent malicious code from being injected via HTML forms (we are not going to discuss security here).

The second most important purpose of validation is to make sure that the data collected make sense (at least, formally). For example, it is supposed to make sure that e-mail addresses actually are e-mail addresses and not just some useless strings of characters, that social security numbers contain nine digits, that there are no dates like "February 31", etc.

If a validation script finds invalid input values, it is supposed to show the user a meaningful error message that informs him/her how to correct the problem. Obviously, the above screenshots show very poor examples of such error messages.

Writing validation rules is a pretty tedious job, so, in order to keep their clients happy, ATS SaaS vendors tend to provide most input fields with default validation scripts that make sense to the software designers.



Now that you understand the basics of how these software applications work, I will try to explain what happened under the hood.

Let's say, a user uploads a resume. Under "Education", it contains two entries. The ATS resume parser parses those entries as best as it can and stores the "raw" data (I assume here that the values are written to a table, but they may as well be put into a variable - it doesn't really matter), but flips the dates of the two entries:
    id candidateid institution major degree gpa startdate graddate graduated
    123 12345 University of Hof Political Science B.S.   1999-01-01 2000-01-01 1
    124 12345 Hospitality School of Var Tour Guide     1992-01-01 1996-01-01 1

Next, the "raw" values are sent back to the user's web browser and displayed in an editable HTML form. However, because whoever "assembled" the form decided not to include the "Start Date" field, the "startdate" value is not displayed. It is still there - it's just not shown and, therefore, cannot be edited. So, the user/candidate/applicant corrects the "Graduation Date" values and submits the form, but the update fails because one of the new dates violates the hard-coded validation rule according to which the value in the "Graduation Date" field must be greater than that in the invisible "Start Date" field. Had the update not failed, the database table rows would look kind of like this (updated values are shown in green):
    id candidateid institution major degree gpa startdate graddate graduated
    123 12345 University of Hof Political Science B.S.   1999-01-01 1996-01-01 1
    124 12345 Hospitality School of Var Tour Guide     1992-01-01 2000-01-01 1
The incorrect "raw" values (shown in red) are still in the table as they cannot be updated through the HTML form because the "Start Date" fields are not on it.

As I stated above, there are two ways users can bypass the meaningless error message the ATS returns.

One is to delete the pre-populated record that triggers the validation error and create a new one instead. In this case the table rows may look like this:
    id candidateid institution major degree gpa startdate graddate graduated
    124 12345 Hospitality School of Var Tour Guide     1992-01-01 2000-01-01 1
    125 12345 University of Hof Political Science B.S.     1996-01-01 1
In reality, since it is not obvious to the user which record causes the error, he/she will most likely have to recreate all of them:
    id candidateid institution major degree gpa startdate graddate graduated
    125 12345 University of Hof Political Science B.S.     1996-01-01 1
    126 12345 Hospitality School of Var Tour Guide       2000-01-01 1
By the way, this is the only case when incorrect values are not stored in the database table.

The other is to clear the "Graduation Date" field (which in this case, luckily, is not a required field) of the offending record. This will make the database table rows look kind of like this:
    id candidateid institution major degree gpa startdate graddate graduated
    123 12345 University of Hof Political Science B.S.   1999-01-01   1
    124 12345 Hospitality School of Var Tour Guide     1992-01-01 2000-01-01 1
Again, since it is not obvious to the user which record causes the error, he/she will most likely have to clear the "Graduation Date" field in all of them:
    id candidateid institution major degree gpa startdate graddate graduated
    123 12345 University of Hof Political Science B.S.   1999-01-01   1
    124 12345 Hospitality School of Var Tour Guide     1992-01-01   1



Thoughts, lessons, conclusions...

i. Design of software applications of the "Swiss-Army-knife type" is always based on some assumptions about what users really need and how they are going to use your software product. "Imaginary" requirements are dangerous, but, with such products, there is no way to eliminate them completely. Of course, you should minimize them by interviewing as many end users and stakeholders as possible in order to have mostly "real" requirements.

ii. Speaking of "imagination", software testers should think of different, even seemingly improbable, scenarios of how things may go wrong. Let me give you an example.
    Here are "imaginary" (and intentionally oversimplified) requirements:
      R1. Administrator shall be able to hide any one or all of the following input fields on the "Education" HTML form: "Start Date", "Graduation Date".

      R2. Upon submission of the "Education" HTML form by Candidate, validation script shall check whether the value of the "Graduation Date" field is greater than the value of the "Start Date" field.
    Of course, in real life, the two requirements will most likely be in different places, and it may not be so obvious that they may negatively affect each other.

    And here are the corresponding tests (again, "imaginary" and intentionally oversimplified):
      T1. Logged in as Administrator, open the "Education" form in Form Design view. Verify that any one or all of the "Start Date" and "Graduation Date" fields can be hidden and the form can be saved. Logged in as Candidate, verify that the hidden field/s is/are not displayed.

      T2. Logged in as Candidate, fill out the "Education" form. The value in the "Graduation Date" field must be set to a date earlier than the date in the "Start Date" field. Submit the form. Verify that the entered values are not written to the database and that the appropriate error message is displayed.

    Again, just like the requirements, the two tests are likely to be in different places, and it may not be so obvious that there are some serious "holes".
The major problem with the above example is that the tests are based on an assumption that the requirements are correct, complete, unambiguous, and logically consistent, which in real life is rarely, if ever, the case. In fact, properly implemented requirements-based testing must first validate that the requirements are indeed correct, complete, unambiguous, and logically consistent. If not, the requirements must be revised, and only after that tests are created based on the revised requirements.

Someone at some point should have figured out that disabling one or both of the date fields should also include disabling of the validation script that compares values in those fields.

Unfortunately, the approach, when tests are "mechanically" generated based on imperfect requirements/specifications, is more common than one might think.

iii. The more customization you allow, the more chances there are that things will break. To predict all such scenarios is impossible. Therefore, there should be a balance between trying to make your clients happy by letting them customize all things imaginable and the risk of making them unhappy when all kinds of weird bugs start popping up. In this specific case, there is hardly any reason to let clients disable one date field in a pair of dependent date fields. Either disable both or none. Job applicants these days fill in an absolutely insane number of "boxes". One extra date field is unlikely to make them any more frustrated.

iv. About two years ago I wrote a post called Unless You Really Want to Look Like a Fool, Don't Save on Testing. The buggy application in question there was developed in-house, so the organization itself was solely responsible for the poor quality of the software. However, if your company uses a third-party commercial software as a service, things are not as clear-cut.

Should you just assume that the vendor has done enough testing to weed out most of the bugs? Whom are your users going to blame when they find a bug? The answer to the first question is simple: never assume that software is bug-free - it never is. As to the second question, I think that, even if you make it explicitly clear to your users that they are going to be redirected to a third party, anything that goes wrong will most likely be blamed on you anyway, but I may be wrong (ask your marketing/branding people).

v. On such systems, clients/tenants have very little room for testing their customized interfaces. In this specific case, I see no way how they could intentionally reproduce the bug I described. On second thought, that's not entirely true. They could enable both date fields, then enter test data, then disable the "Start Date" field and then set the value in the "Graduation Date" field to an earlier date and try to save. But, let's be honest, it is too complicated for a non-technical user.

What is not complicated, however, is to pay attention to user feedback. Whenever I identify a bug "in the wild", I always report it. This one was not an exception. Never heard from either the two companies or the software vendor.



Just in case I failed to make it clear, let me say that I did not write this to bash these companies (both of which, by the way, happen to be huge businesses of global caliber) or this ATS SaaS vendor (also one of the market leaders worldwide). It was just a pretext to share some of my thoughts about quality in the age when marketing budgets hugely exceed those of QA.

No comments: