# ## SUMMARY them in three dimensions: oracle decay, computation efficiency and tools. The paper shows drawbacks of techniques like code coverage (not closely related to efficiency of test suite), mutation score (high computational effort) and the lacking tool support. There is one main aspect missing in the paper: what is a good unit test? ## VERDICT [X] Accept [ ] Weak Accept [ ] Weak Reject [ ] Reject ## CONTENT OF THE PAPER [X] Clear objectives and a very cohesive paper [ ] The paper objectives and overall cohesion is satisfactory [ ] The paper does not hang together so well (fragmented) [ ] The content of the paper is difficult to follow ## PRESENTATION OF THE PAPER [ ] Excellently written and organized [X] Well written and organized [ ] Unclear in many places [ ] Poor English and disorganized ## REVIEWER FAMILIARITY WITH SUBJECT MATTER [ ] High [X] Medium [ ] Low ## PAPER EVALUATION === Technical Quality The paper has a clear topic and focus. Its content is carefully researched and includes current research in the field. It gives a good overview about the state of the art and the issues with today's measures used in software testing. However, it does not discuss the human aspect of writing good unit tests and does not make a clear statement what a good unit test it. Important characteristics of good unit tests (e.g. independence, repeatability, understandability, consistency, helpfulness for finding bugs) are described in the book “The Art of Unit Testing” by Roy Osherove (chapter 1.2 “Eigenschaften eines guten Unit Tests”). The paper suggests that future work is needed, but it does not specify what area would the most important to focus on and it does not make speculations about possible solutions to the problem. Moreover, it does not directly answer the question mentioned in the title: Can we trust our tests? This could be discussed in the conclusion. === Logical Structure The paper makes an excellent use of sections and subsections. Section 2 (“Research Overview”) and Section 3 (“Test Effectiveness”) are divided into four subsections. Section 4 (“Comparison”) is divided into 3 subsections. This clear structuring and the relatively short texts in the subsections help the reader keep a good orientation and focus on the corresponding topic. The paper should include more than just two keywords. Possible options are “test effectiveness measures”, “code coverage”, “mutation score”, “checked coverage” or “testing techniques”. === Presentation The paper makes use of one figure and two lists. A figure that gives an overview of the discussed effectiveness measures (code and checked coverage, mutation score and fault probability) could be useful in the beginning of chapter 3 (“Test Effectiveness”). Figure 1 on page 6 could use some more explanation, e.g. which bars indicate that “statement coverage is sometimes very insensitive to missing assertions”. A sentence explaining that the reader should focus on comparing “Statement Coverage” and “Checked Coverage” would support a quicker understanding. === Style The paper follows the guidelines of the llncs LATEX template. There are a few spelling mistakes: Under 3.2. “small syntactic chances”: Should this say “small syntactic changes”? Reference [6] has a displaced white space. Under 1. “testings” should probably say “testing has”. The title should use capitalized letters for “we” and “ours”. The content and formatting in the conclusion could be improved by adding a few lines answering the title question. A possible statement could be: “Only after finding an efficient and reliable measure for the quality of our tests we can actually trust our tests.” === References The paper is carefully researched. It makes use of over 50 references. The publishing years of the papers referenced have a broad range. Most papers were written after the year 2000. The oldest paper was published in 1977. The paper mentions tools such as Agitator, JCrasher, EclEmma or PIT and references a paper listing over 30 automatic test generation tools (reference 30). This helps make a connection to practice. URLs do not have an “accessed” date. This should be added.