Formal methods related to computer science research have produced results that are so abstracted from the problem domain that they cannot be applied or validated.
What is Research?
Good research practice suggests that we should start by defining our terms. The Concise Oxford Dictionary defines research as:
research. 1.a. systematic research and the study of materials, sources, etc., in order to establish facts and reach new conclusions. b. effort to discover new ones or compare old facts, etc., by scientific study of a topic or through a critical research course.
This definition is useful because it immediately focuses on the systematic nature of the investigation. In other words, the very meaning of the term implies a method of research. These methods or systems essentially provide a model or structure for the logical argument.
The dialectic of research
The highest level of logical argument can be seen in the structure of the discussion within a particular field. Each contribution to this debate falls into one of the three categories:
Thesis
This presents the original statement of an idea. However, very few research contributions can affirm total originality. Most borrow ideas from previous work, even if that research has been conducted in another discipline.
Antithesis
This presents an argument for questioning an earlier thesis. Typically, this argument can be based on new sources of evidence and is typically of progress within a field.
Synthesis
This seeks to form a new argument from existing sources. Typically, a synthesis could resolve the apparent contradiction between a thesis and an antithesis.
Problems in the Application of the Standard Approach in Computer Science
There are many problems with the standard approach to scientific empiricism when applied to computer science. The main objection is that many aspects of computing challenge the use of probabilistic measures when analyzing empirical test results.
For example, many statistical measures depend on the independence between each test of a hypothesis. It is clear that these techniques cannot be used when trying to measure the performance of any system that tries to optimize its performance over time; this discards load balancing algorithms, etc.
Second, it can be difficult to impose standard experimental conditions on computer products. For example, if a program behaves in one way in a set of operating conditions, there is no guarantee that it will behave in the same way in another set of conditions. These conditions could drop to the level of alpha particles hitting memory chips.
Third, any attempt to prove that a program always satisfies some property will almost certainly be doomed to failure using the usual experimental techniques. The number of possible execution routes, including through simple code, makes it impossible to test properties against all possible execution routes.
Hypothesis and Prototyping
A good example of this form of dialectic is the debate on prototyping. For example, some authors have argued that prototypes are a useful means of generating and evaluating new designs in the early stages of the development process (thesis), (Fuchs, 1992).
Others have presented evidence against this hypothesis by suggesting that customers often choose characteristics of the prototype environment without considering possible alternatives (antithesis) (Hayes and Jones, 1989). Therefore, a third group of researchers has developed techniques aimed at reducing bias towards the characteristics of prototyping (synthesis) environments (Gravell and Henderson, 1996). Research in a field progresses by applying methods to test, refute, and reevaluate arguments in this way.
This approach has been used to support many different aspects of research within Computer Science. For example, Boehm, Gray, and Seewaldt (1984) used it to compare the effectiveness of specification and prototyping techniques for software engineering. Others have used it to compare the efficiency of search and sorting algorithms. Information retrieval researchers have even developed standardized methods that include well-known test sets to set the performance gains of new search engines.
Argument models
A more detailed level of logical argumentation can be seen in the discourse structures used to support individual works of thesis, antithesis or synthesis.
Testing by demonstration?
Perhaps the most intuitive research model is to build something and then let that artifact serve as an example for a more general class of solutions. There are numerous examples of this approach in the field of computing. It can be argued that the problems of multi-user operating system deployment were solved more through UNIX application and growth than through a more measured scientific research process.
However, there are many reasons why this approach is an unsatisfactory research model. The main objection is that it carries high risks. For example, the artifact may fail long before we know anything about the conclusion we intend to support. In fact, it is often the case that this approach ignores the formation of any clear hypotheses or conclusions until after the artifact is built.
Lack of clear hypothesis
The lack of a clear hypothesis does not have to be the barrier it may seem. The test-per-demonstration approach has much in common with current engineering practice. Iterative refinement can be used to gradually move an implementation to some desired solution.
The key problem here is that iterative development of an artifact, in turn, requires a method or structure. This is typically done through test techniques that are based on other models of scientific arguments.
Empiricism
Western empirical tradition can be seen as an attempt to avoid the un directed interpretation of artifacts. It has produced the most dominant research model since the 17th century. It can be summarized in the following stages:
Hypothesis generation
This explicitly identifies the ideas that will be tested by the research.
Identification of the method
This explicitly identifies the techniques that will be used to establish the hypothesis. This is critical because it should be possible for colleagues to review and criticize the adequacy of the methods they have chosen. The ability to repeat an experiment is a key feature of solid empirical research.
Compilation of results
This presents and compiles the results that were collected by following the method. An important concept here is that of statistical significance; whether or not the observed results may be due to chance and not to an observable effect.
Conclusion
Finally, the conclusions are set out in support of or rejecting the hypothesis. In the event that the results do not support a hypothesis, it is important to always remember that this may be due to a weakness in the method. Conversely, satisfactory results may be based on incorrect assumptions. It is therefore vital that all the details of a method be made available for peer review.
Mathematical test in Computer Science
Dissatisfaction with empirical testing techniques has led many in the computer science research community to investigate other means of structuring arguments in support of certain findings.
These rules can then be applied to determine whether or not a conclusion is a valid inference, given some initial claims about a program or hardware.
The field of mathematical reasoning is an area of self-law research. However, it is possible to identify two different approaches to the use of formal testing as a research technique in computer science:
The verification argument
This attempts to establish that some good property will be kept on a given system. The problem here is that if the human cannot build a test, this does not mean that the conclusion is invalid. They just haven’t been able to prove it. Someone else might be able to build the necessary mathematical argument.
The rebuttal argument
Instead of trying to test the correction of an argument, this approach attempts to refute it. Typically, this is done by setting a description of the expected behavior of the system. Model verification tools then automatically scan the status space of the proposed application in an attempt to find a situation where the desired conclusion is not sustained.
Disadvantages of Mathematical Verification Techniques
The appeals of mathematical testing techniques are very strong. They provide a coherent framework for analyzing research issues in computer science. They also explicitly set out the criteria for valid inferences, as well as environmental conditions, which restrict the scope and applicability of the reasoning process. However, there are many issues that limit the usefulness of this approach as a general research tool.
The first is that you have to take incredible care in interpreting the results of the mathematical test. Formal methods are nothing more than a system of argumentation and are expected to make mistakes. Problems arise because errors can be very difficult to detect given the complex nature of the math that is often used. Let us remember that a central feature of the empirical approach was that a peer-open examination should be used to verify that its method was correct.
The second problem with formal reasoning is that its scope is limited. Interactive and time-critical systems pose special problems for the application of mathematics. These issues are being addressed, but many problems remain to be resolved.
The third problem concerns the cost of applying formal techniques. It takes a long time to acquire the necessary skills. Similarly, it can take several months to perform relatively simple testing for medium- to large-scale applications.
Finally, it can be argued that there is no adequate debate about the failures of formal methods. Again, it is important to remember that not testing a hypothesis is a valuable result for empirical techniques. Exaggerated claims have been made on formal grounds, usually not by the investigators themselves, and many of these claims have been falsified. Consequently, errors in the application of mathematical reasoning can be considered a source of shame rather than a learning opportunity for colleagues and colleagues.
Hermeneutics
Formal testing techniques are based on the development of a mathematical model of the artifact being created. This raises important questions about the relationship between that model and the reality that is intended to be represented. For example, if a model ignores some critical aspect of a program’s environment, then it can be proved to be safe, but may well fail when deployed. The distance between mathematical models and reality is popularly known as the formality gap. Hermeneutics provides an alternative that addresses this problem. Hermeneutic research methods have pioneered the field of sociology. The term itself means:
“adjective relating to interpretation, especially of Scripture or literary texts”. (The Oxford Concise Dictionary).
In practice, these approaches require researchers to observe the operation and use of an artifact within the intended working environment. The basic premise is that abstract models do not replace the actual application. Similarly, the results of controlled experiments do not provide generic results that can be accurately used to evaluate performance outside of those controlled environments.
The Hawthorne Effect on Computer Science
In particular, the Hawthorne effect suggests that people and, in fact, systems, will behave very differently when placed within an empirical environment. Repair and maintenance activities are very different for equipment supplied in a laboratory environment.
Individuals react differently when they know they are being observed. Hermeneutic research is therefore based on the interpretation of signs and observations in the context of work, without the explicitity of the question to people about the performance of their systems. Hermeneutic techniques urge researchers to enter the workplace. Taken to the extreme, the performance of an algorithm could only be evaluated in field trials with real data sets on existing architectures with “real” load levels of other applications.
This emphasis on analyzing a final implementation closely resembles the test by demonstration. The main difference, however, is that the researcher approaches the working context with an open mind and without any established hypotheses to test or refute (Burrows and Needham, 1987). This poses problems for conducting targeted research because users may not use the programs in the manner intended. For example, it can be difficult to prove that one search engine is faster than another if users continually abandon their requests after one or two items are returned or if they only use those search engines once or twice on their workday.
Conclusions and a way forward…
Computer science is said to be an immature discipline. Vast resources have also been poured into the subject over a relatively short period of time. This has brought surprising advances in both hardware and software engineering.
In the search for technological objectives, researchers have borrowed models of arguments and discourses from disciplines as varied as philosophy, sociology and natural sciences. This lack of an agreed research framework reflects the strength and vitality of computer science. An optimist might argue that we have learned a lot from the introduction of hermeneutics in the field of requirements analysis. Similarly, it has benefited from the introduction of mathematical argumentation models to specify and verify complex systems.
Bibliographic References
B.W. Boehm, T.E. Gray and T. Seewaldt, Prototyping Vs. Specification: A Multi-Project Experiment, IEEE – Seventh Conference On Software Engineering, 473-484, Computer Society Press, Washington, United States of America, May, 1984.
Burrows, M. Abadi and R. Needham, A Logic of Authentication. ACM Transcations on Computer Systems, 8(1):18-36, 1990.
I.J. Hayes and C.B. Jones (1989), Specifications are not (necessarily) executable, Software Engineering Journal, 1989, 4, (6), pp. 330-338
C.W. Johnson, Literate Specification, Software Engineering Journal, 225-237, September, 1996.
You may also be interested: Accounting Research