Related Resources: computer technology

Big Data model as a Means of Testing Web-Based Applications

Big Data model as a Means of Testing Web-Based Applications

Anton Bykau, A1QA

Testing is a process of execution of the program to detect defects. The generally accepted methodology for the iterative software development Rational Unified Process presupposes the performance of a complete test on each iteration of development. The testing process of not only new but also earlier codes written during the previous iterations of development is called regression testing. It is advisable to use automated tools when performing this type of testing to simplify the tester work, making the process of software testing automation an integral part of the testing process.

The requirements formulation process is the most important process for software development. The V-model is also a convenient model for information systems developing. It’s become government and defense projects standard in the United States. The basic principle of V-model is that the task of testing the application that is being developed should be in correspondence with each stage of application development and refinement of the requirements. One of the greatest development model challenges is system and acceptance testing. Typically, this type of testing is performed according to the black box strategy and is difficult for automation, because automated tests have to use the application interface rather than API. "Capture and replay" is the one of the most widely used technologies for web application test automation according to the black box strategies today. In accordance with this technology, the testing tool records the user's actions in the internal language and generates automated tests.

Practice shows that the development of automated tests is most effective if it is carried out using modern methods of software development; it is necessary to analyze the quality of the code and merge into the library the duplicate codes of tests, which must be documented and tested. All of this requires a significant investment of time, and the test automation engineer should have the skills of the developer. Thus, the following question arise of how to combine the user actions recording technology and the manually automated tests development, how to organize the automated tests verification, and whether it is possible to develop an application and automated tests in parallel according to the methodology of the test-driven development (TDD)?

There are systems capable of determining the set of tests that must be performed first. Such systems offer manually related automated tests with the changes in the source files of applications undergoing tests. However, the connection between the source and the tests can be expressed in terms of conditional probabilities. The probabilistic networks used in the artificial intelligence could also be useful when defining the relations automatically based on the statistics of tests results.

By using Big Data networks, we can link interface operations and test data and this will allow reducing the complexity of automation.

Key Elements of Proposed Testing Technologies

For testing automation, we could use a Big Data network that has the following structure:

The first level network shown in Fig. 1 consists of two layers, which determine the location of graphical controls on the web page. Top-level nodes in Fig. 1.1 are either pages or the condition of the tested application page such as a page of the user authentication. Lower-level units are templates used to identify GUI elements (Fig. 1.3). Some nodes are GUI container templates (Fig. 1.3). Fig. 1.4 shows the properties of the selected node, like the template for the password field. Graphic elements that occur on more than one page can be transferred to a general unit for multiple pages, such as in Fig. 1.5, which shows the menu items. Fig. 1 shows only the network connection between the unit and the common elements of the page to simplify the visualization of the network for testers.

Fig. 1 GUI elements composition

The availability of GUI templates and states of the web interface allows for monitoring of the test coverage for the interface of applications with tests; it also allows for effective adaptation of automated tests to new versions of the tested application.

The main goal of the second level network is to describe the workflow of the program in the form of interconnected rules, describing the program states and GUI interface actions (Fig. 2). The network consists of two layers and two types of nodes that include the nodes of all possible states of the program (Fig. 2.1) and the nodes of all possible program actions (Fig. 2.2).

The communication network describes the state transitions as a result of GUI activities. The page can be linked to the data (Fig. 2.3) to describe the state of the page containing dynamic elements (a table with a date, for example). The data layer consists of nodes storing the state of the tested application and the operations that modify the data. Fig. 2.3 describes the results table, which is used in Fig. 2.4. Each table row should include a reference to additional information; the lower part of the table should contain an additional three references (Fig. 2.4) while the search box should include the search phrase (Fig. 2.5). The state of some graphical elements is not preserved in the data layer (Fig. 2.6) in order to simplify the automation process.

Fig. 2 Program Algorithm


The system of tests automation constantly analyzes the state of the application interface during the test’s recording time. If the same sequence of actions is repeated many times, the system offers to merge this sequence for multiple pages into a common block (Fig. 1.5). The recorded actions and states will not be duplicated. When writing the second and subsequent tests, the system adds only unknown conditions and operations. Although the model interface can be split into separate files, it will not prevent the system from linking blocks common for several pages. Often, automated tests complicate the process of automation as a result of an unsuccessful candidate decomposition code. A single model of the whole test interface can help to avoid duplication and to refactor the source of recorded tests. The system determines an appropriate relationship between the states if a previously unknown combination of actions was done between the known conditions in the process of test recording.

The third level network describes the tests and defects of the tested program. The top layer describes a set of written tests (Fig. 3.1) and is connected to the nodes pages (Fig. 3.2). Each test case describes what action and what graphics should be checked (Fig. 3.3). Subsequently, the system will find preliminary steps for testing, using an algorithm to find a way to graph states proposed by S. Russell to perform one or more tests. The relationship between the test and page nodes can be divided by a bug note to describe the defect (Fig. 3.4). The defect can be in one of the following states, turning a positive test into a negative one (Fig. 3.5):

  • presence of an undocumented and uncorrected defect (the node is absent)
  • expectance of an uncorrected and described defect (the defect node created and verify defect reproduced)
  • absence of the expected defect (the defect node can’t reproduce the defect)
  • confirmed lack of the described defect (the defect node verifies the defect absence)

The test system displays test results in a different way for developers and testers. This allows evaluating the correctness of the automated tests and independently assessing the quality of the tested application. The presence of the life cycle of a defect integrates accounting system defects and automated testing. The priority value is associated with each test node. This characteristic is actually the probability that the test result will be incorrect; for example, the bug will not be reproduced or the expected page will not load properly. The higher the probability of the failure, the more important it is to run the test to fix the problem and increase the stability of testing. The priority of the test run can be set manually by the tester, or can be statistically calculated on the basis of the associated defect status changes, or the associated source code changes, or on the basis of the results of the same test for the same controls of other pages. Typically, these tests are associated with blocks of common elements (Fig. 3.6).

The most important testing task is to measure the relatedness of the test results from the internal state to the application, or previous operation. The main problem of such measurements is an extremely large number of conditions, which should be measured by the test system. The whole history of the automated testing system is preserved, and each performed activity is associated with a corresponding network node.

Fig. 3 Description of Tests and Defects

The fourth level network describes the knowledge of testing purposes (Fig. 4). The network consists of the nodes that represent the testing goal (Fig. 4.1) and is associated with one or more tests (Fig. 4.2). The example of the target can either be one or a group of pages and of the tested interface program (Fig. 4.3).

Fig. 4 Description of  Test Purposes


Calculation Network Algorithm

Two algorithms are used for the network function; they are the calculation network algorithm and the path finding algorithm. The calculation algorithm determines the status of the tested application using patterns of GUI elements, and calculates the priority of tests running – analyzing what associated source files have been changed and what defects have been fixed. The path finding algorithm finds the sequence of preparatory steps to perform the test in order to select a sequence of tests that will reduce the total test time.

The test system uses a modification of the Bayesian networks calculation algorithm proposed by R. Schechter. The modified algorithm can calculate the network even in the presence of the following features:

  • Probabilistic network links can be directed or undirected.
  • Probabilistic network links can contradict each other.

The first level network must be recalculated, despite the controversy, because the program interface can be wrong; the graphic elements may not work properly, requirements may be outdated or the tester can make mistakes. The goal of the test system is to detect these mistakes. Probabilistic networks nodes can take multiple values, which are characterized by probabilities. The probability evaluates whether the node actually takes this particular probability value. The condition corresponding to the node is called a characteristic. The sum of all characteristics of the multivalue node equals 1.


The network connection may be contradictory. Contradictions arise when there is a problem in the test program. The algorithm has to consider the mutual influence of links and to make approximation solutions. On the other hand, the system can independently adjust its work in case of the loss of control of the tested application.

To describe the algorithm, below is an example of calculating the characteristics of the two states of simple networks. For simplicity, we use only the connections between two nodes while the binary characteristics and the conditional probabilities equal 1 or 0. We will use Bayes’ formula to calculate the characteristic of the required node:


In the below example, communication is in conflict. Let’s suppose that we know that:

Figure 5. Contradictory Conditions


When looking at Fig 5, we can consider connections C-A and B-A independent, and the probability node A is calculated as the probability of two independent events:


Another difficulty is the presence of cycles in the network. Let’s add to the previously described structure of the network Fig 5 connection C-B, and calculate the values of the characteristics B and C on the basis of the given vertex A.

Figure 6. Contradictory Dependencies


When looking at the network (Fig 6), we can see an apparent contradiction: the links from node A assign different states to nodes B and C, but the link C-B requires the identity of node values. We could solve the contradiction by reducing the trust in relations of the network, but we can’t do that until we know the correct values. The temporary solution should be the construction of the set of the skeletons of trees of a network for any given performance with equal confidence in relations and the known value of the node A. There are three skeletons for the network (Fig 6). It’s easy to calculate the probability value of the nodes for each such skeleton. Finally, we find the average value for each characteristic for each skeleton tree. The solution can be presented in the following way:

P(C=1)=P(B=0)=0,333, P(C=0)=P(B=1)=0,667

The advantage of the algorithm is that the connection can combine more than two characteristics, and the logic of the relationships conversions can be defined by the programmer manually. The link may be represented as a function of several variables that return the value to the node to which it is directed and that can be defined in any programming language. The presence of a double direction link between the two characteristics can be described by two oppositely oriented links.

Automation Process

The statistics network for the application testing can be created on the basis of the “record and play” tool. This method is useful when the testing system has a poor knowledge of the tested application. When recording, the test system stores the sequence of the application states and interface actions. After the recording of the test, the test automation system invites the tester to answer some questions. The recorded net diagram of transitions between the states should become the result of the recording. The tester creates a test node and describes the data need for the test to define the test case. He/she can create a set of tolerance values for each GUI element of the page (Fig. 2.3). In this case, it will reach the coverage criterion according of the black box strategy “covering the tolerance range,” based on the testing criteria of the class of input and output data. The network for the application testing can be created using the answers to the questions about the interface. This interface is effective when the model contains enough knowledge about the tested program. The system will test the application in the background, and if there is a problem, it will ask the tester without stopping the execution of other tests. The system operation and the work of the tester start with some initial page and state of the tested application. This condition is evaluated, and if the condition does not correspond to GUI templates, the system will suggest that a new state to the model be added. To facilitate the dialogue with the user, all questions are simply reduced to the confirmation of the changes, or, in case of an error, the choice of the right solution.

For example, if the test system reliably determines all the basic controls, it will prompt you to just confirm a page layout. Next, the system selects the highest priority operation for testing, then performs it, and analyzes the next state. In case of conflict, such as some unexpected behavior or the appearance of the tested application, the system will propose to create a characteristic describing the defect.

To Sum it Up

The technology of the test automation using probabilistic networks uses generic templates of interface graphics to conduct the analysis of the interface test program which allows to carry out the testing of the applications based on the “black box” criterion by covering the tolerance range on the basis of the testing criteria of the classes of input and output data. The developed measures allow variation in the order of the execution of tests for related modules, analyzing the test results for the current or previous versions of the application, and can serve as a new measure to evaluate the relation between the test results and various modules of the program for its overall functionality.

The mechanism of defects detection, designed and tested by me, can be used to evaluate the correctness of the automated testing work and independently assess the quality of the tested application. This technology has been tested in the project WebCP by automation Ajax interface testing and has shown its effectiveness and convenience in comparison with the development of GUI Unit Tests writing.

Using the Big Data model in the form of probabilistic networks learning allows for optimization of the automation testing process. The base testing process performs not only test suit groups, but also according to priority – uncertainty of the tests result. Big Data algorithms allow for such estimates based on regression testing statistics automatically. Priority evaluation is also useful for regression analysis, as it helps to identify observable, but not clearly identified, problems in a Web applications interface.

Anton Bykau is a QA Automation Engineer with A1QA. He has a breadth of experience in software development and tests automation. Anton explores Big Data algorithms not only during his work but also during his spare time.

Spider Optimizer

© Copyright 2000 - 2021, by Engineers Edge, LLC
All rights reserved
Disclaimer | Feedback | Advertising | Contact