User Study

In the AWA project, we developed an approach to generate workarounds for Web applications to automatically overcome failures in Web applications exploiting the intrinsic redundancy of software systems. The technique takes advantage of the interactive nature of Web applications by assuming that users report the perceived failures.

When users interact with Web applications they may be exposed to faulty behaviors, but it does not mean that all the users will perceive such behaviors as faulty. For example when asking for the best route from a location A to a location B, users may not perceive the non optimal route as a wrong one.

The goal of this qualitative user study is to understand to what extent the users perceive failures as such, to validate the hypothesis of our study.

The experiment

We conducted the experiment according to the guidelines of Interaction Design: Beyond Human-Computer Interaction by Roger, Sharp and Preece (Wiley 2011): We identified a target population, prepared a questionnaire and a list of tasks, asked the participants to accomplish the tasks, and analyzed the results. Below, we report the details of the experiment and the results.

Experimental Setting

Website

We took the Website of our research group, which uses JQuery to provide some functionality, and we produced a faulty version of the Website by substituting the current correct version of the JQuery library with an obsolete faulty version with 6 bugs that affected the results of 6 different pages. The faulty Website can be accessed at http://star.inf.usi.ch/test/star.

This is the list of the bugs considered for the study with a link to the bug tracker and a short description:

  • 4088: A ui.draggable item (sortable item) cannot be removed (using the call remove()) immediately after the drop callback.
  • 7141: The call show() does not make an element visible if it was previously hidden with the call hide().
  • 5316: When a set of elements are selected and replaced with another set of elements using the call replaceAll(), only the first new element will be then rendered visible.
  • 6264: The call submit() silently fails if the form has a <input name="submit"> or <input id="submit">.
  • 8: The call set('disabled', false) fails in enabling a disabled element.
  • 118: It is not possible to set an attribute to boolean false with the call attr('disabled', false).

These bugs affected the following pages of the Web site:

  • Bug 4088 affects the Home page and prevents the news from the 2013 to be removed when one tries to drag them in the trash bin area.
  • Bug 7141 affects the Research page and impedes the div labeled with "Software Testing and Analysis" to show up, upon a click, with the information about the research area.
  • Bug 5316 affects the People page. When a user clicks on the "Show only PhDs" button, instead of the entire list of Ph.D. students (7 people), only one Ph.D. student is shown; moreover, the other 2 buttons break down and stop working.
  • Bug 6264 affects the Publications page. The "download" button does not work.
  • Bug 8 affects the page of the software ARMOR. When a user writes a comment in the text area the "Send" button should get enabled, but the fault prevents this behavior.
  • Bug 118 affects the Print-friendly version of the page of the software ARMOR and it fails in reactivate the CSS of the page upon a click on the link "Browser Version".
Some of the described failures can be avoided: a user may browser a faulty page but never exercise the bug. In this case the user may interact with a faulty page, but never see a failure.

The evaluation methodology

We designed a list of tasks, and we asked the participants of the study to perform the tasks on the faulty version of the Website, and report the final outcome. The questionnaire is composed of 11 tasks: each task is an action that have to be performed on the Website and requires a closed-ended answer. A participant is required to read the task, perform it, and answer the question with two possible outcomes:

  • Done: when the participant feels that he or she has been able to accomplish the task completely and obtain the result that he or she expected by reading the task.
  • Not Done: when the participant feels that he or she was not able to accomplish the task or when the result was different from the one that the participant expected.

The participant is free to navigate through the entire Website in the attempt to accomplish the tasks. The interviewer follows his or her behavior but does not interfere with the operations of the participant. The participant is encouraged to comment the final outcome of a task, with a particular emphasis for those that he or she considers as Not Done.

The 6 previously described bugs produce a failure in 6 out of the 11 tasks. This means that 6 out of 11 tasks may not be correctly achieved because of a failure that may rise and prevent the user to proceed or produce the expected result. The remaining 5 out of 11 tasks are failure free and the participants should be able to complete them correctly.

Here we report the complete list of tasks:

Please, open this webpage: http://star.inf.usi.ch/test/star/ and complete the following tasks. When you think you have correctly achieved the task, tick the checkbox Done, when you think you cannot complete or achieve the task as expected, tick the checkbox Not Done.
 
 Tasks:DoneNot Done
1Sort the news from the newest to the oldest, newest on top
2Remove the news older than 2014
3Read about the details of the different subareas of the two research areas carried on by the group
4Open the webpage of a group member
5Identify the total number of the group members and the number of members by role
6Read the Bibtex information of a paper published by the group
7Download a paper of the group
8Look at the tutorial for the software ARMOR and send a comment to the developers
9Look at the rewriting rules for Guava and Joda-Time in the page of the software ARMOR
10Look for the print-friendly version of the page of the software ARMOR and return to the browser version
11Read the research focus on self-healing systems

The failure free tasks are: 1, 4, 6, 9, 11.
A failure may rise in the tasks: 2, 3, 5, 7, 8, 10.

We validated the user study and the questionnaire with the other researchers in our research group that were not involved in this work.

Participants

We run the study with two different groups of participants: a first study with people with no IT background and a second study with computer science students. We collected and analyzed the results of the two categories of participants separately.

For the first study we selected 20 participants trying to cover different age ranges, educational background and profession. The goal of the study is to understand if a user can perceive a failure while he/she is interacting with a Web application so, to avoid any bias that a technical background may rise, we selected only people with no IT education or professional background.

The following table summarizes the characteristics of the 20 participants we selected:

Gender Age Range
Male13 18-244
Female7 25-3410
 35-446
Education Level IT Expertise
Bachelor4 Office suite19
Master14 Graphics8
Ph.D.2 Use smartphone20
  Internet and emails20
Profession
Education4Publishing2
Construction1Entertainement3
Student2Government2
Unemployed1Finance1
Engineering1Marketing1
Healt Care2  

For the second study we asked 7 computer science graduate students to take the same questionnaire.

Results

In this section we report the detailed summary of the results of the two user studies. The raw data (the questionnaires filled out by the participants) can be downloaded below. We summarized the results in two tables that follow the convections below:

  • 0: The participant did not perceive a failure while performing the task.
  • 1: The participant perceived a failure while performing the task.
  • Expected answer: Indicates whether a fault is present in the page that must be visited to complete the task. If the value is 0 we expect the participant to not perceive any failure and complete the task, if the value is 1 we expect the participant to perceive and report a failure.
  • Total failures uncovered: Total number of participants that uncovered a failure while performing the task.
  • Total failures perceived: Total number of failures that each participant perceived in the list of tasks.
Place the mouse over the numbers with the yellow background to read additional information and comments that the participant provided while performing the task.



In the following we report the summary of the user study that involves participants with non-IT background:

 Participants
TasksExpected
answer
1234567891011121314151617181920Total failures
uncovered
10000000000000000000000
211111111111111111111120
311110101111111111111118
40000000000000000000000
510011000111111011101113
60000000000000000000000
711111111111111111111120
811111111111111111111120
90000000000000000000000
1010110111011101010000111
110000001000100100000003
Total failures
perceived
645645555676574655456 


In the following we report the summary of the user study that involves computer science graduate students:

 Students
TasksExpected
answer
1234567Total failures
uncovered
1000000000
2111111117
3111010115
4000000000
5111111106
6000000000
7111111117
8111111117
9000000000
10101000012
11000000000
Total failures
perceived
65645455 


Raw data: