# <TITLE REDACTED>

## SUMMARY

them in three dimensions: oracle decay, computation efficiency and tools. The paper
shows drawbacks of techniques like code coverage (not closely related to efficiency of test
suite), mutation score (high computational effort) and the lacking tool support. There is
one main aspect missing in the paper: what is a good unit test?

## VERDICT

[X] Accept
[ ] Weak Accept
[ ] Weak Reject
[ ] Reject

## CONTENT OF THE PAPER

[X] Clear objectives and a very cohesive paper
[ ] The paper objectives and overall cohesion is satisfactory
[ ] The paper does not hang together so well (fragmented)
[ ] The content of the paper is difficult to follow

## PRESENTATION OF THE PAPER

[ ] Excellently written and organized
[X] Well written and organized
[ ] Unclear in many places
[ ] Poor English and disorganized

## REVIEWER FAMILIARITY WITH SUBJECT MATTER

[ ] High
[X] Medium
[ ] Low

## PAPER EVALUATION

=== Technical Quality
The paper has a clear topic and focus. Its content is carefully researched and includes
current research in the field. It gives a good overview about the state of the art and the
issues with today's measures used in software testing.
However, it does not discuss the human aspect of writing good unit tests and does not
make a clear statement what a good unit test it. Important characteristics of good unit
tests (e.g. independence, repeatability, understandability, consistency, helpfulness for
finding bugs) are described in the book “The Art of Unit Testing” by Roy Osherove (chapter
1.2 “Eigenschaften eines guten Unit Tests”).
The paper suggests that future work is needed, but it does not specify what area would
the most important to focus on and it does not make speculations about possible solutions
to the problem. Moreover, it does not directly answer the question mentioned in the title:
Can we trust our tests? This could be discussed in the conclusion.


=== Logical Structure
The paper makes an excellent use of sections and subsections. Section 2 (“Research
Overview”) and Section 3 (“Test Effectiveness”) are divided into four subsections. Section
4 (“Comparison”) is divided into 3 subsections. This clear structuring and the relatively
short texts in the subsections help the reader keep a good orientation and focus on the
corresponding topic.
The paper should include more than just two keywords. Possible options are “test
effectiveness measures”, “code coverage”, “mutation score”, “checked coverage” or
“testing techniques”.


=== Presentation
The paper makes use of one figure and two lists. A figure that gives an overview of the
discussed effectiveness measures (code and checked coverage, mutation score and fault
probability) could be useful in the beginning of chapter 3 (“Test Effectiveness”).
Figure 1 on page 6 could use some more explanation, e.g. which bars indicate that
“statement coverage is sometimes very insensitive to missing assertions”. A sentence
explaining that the reader should focus on comparing “Statement Coverage” and “Checked
Coverage” would support a quicker understanding.


=== Style
The paper follows the guidelines of the llncs LATEX template.
There are a few spelling mistakes:
Under 3.2. “small syntactic chances”: Should this say “small syntactic changes”?
Reference [6] has a displaced white space.
Under 1. “testings” should probably say “testing has”.
The title should use capitalized letters for “we” and “ours”.
The content and formatting in the conclusion could be improved by adding a few lines
answering the title question. A possible statement could be: “Only after finding an efficient
and reliable measure for the quality of our tests we can actually trust our tests.”


=== References
The paper is carefully researched. It makes use of over 50 references. The publishing
years of the papers referenced have a broad range. Most papers were written after the
year 2000. The oldest paper was published in 1977.
The paper mentions tools such as Agitator, JCrasher, EclEmma or PIT and references a
paper listing over 30 automatic test generation tools (reference 30). This helps make a
connection to practice.
URLs do not have an “accessed” date. This should be added.