# <TITLE REDACTED>

## SUMMARY

This paper addresses the gap between testing research and industry. It presents various testing approaches, including studies and tools, and discusses their limitations with respect to practical use in a technical and non-technical dimension. More generally, the importance of testing is motivated and testing terminology and techniques are introduced. The paper concludes that contemporary testing approaches are not production-ready due to lacking tool support but also conjectures that combining present approaches into a mature tool could yield a promising tool for industrial use.

## VERDICT

[X] Accept
[ ] Weak Accept
[ ] Weak Reject
[ ] Reject

## CONTENT OF THE PAPER

[ ] Clear objectives and a very cohesive paper
[ ] The paper objectives and overall cohesion is satisfactory
[X] The paper does not hang together so well (fragmented)
[ ] The content of the paper is difficult to follow

## PRESENTATION OF THE PAPER

[ ] Excellently written and organized
[X] Well written and organized
[ ] Unclear in many places
[ ] Poor English and disorganized

## REVIEWER FAMILIARITY WITH SUBJECT MATTER

[X] High
[ ] Medium
[ ] Low

## PAPER EVALUATION

+ Covers relevant literature in software testing. Recent important studies and tool in the field were considered and contrasted with some more historic works.
+ I like the consideration of non-technical aspects such as education that is raised in the discussion section. I also think that this is a very relevant topic and thus agree that it is one of the major reasons for lack of quality software testing. As an addition, the short paper "Software Testing Research and Software Engineering Education" (2010) by "Thomas J. Ostrand" might be also an interesting read focusing on testing education and testing research with respect to achievements that actually did (not) influence industry.
+ The part describing the positive aspects of destructive testing (beginning of Section 3) highlights practical use of an approach that is oftentimes considered as naive in research. One might complement this discussion with random testing at this point.

* "According to [6] this way of testing finds from 60 to 90 percent defects and even reduces introducing bugs in future work." (p3) I would be a bit more cautious and at least contrast these findings with some newer studies' results. Finding up to 90% of 'the' defects (reported defects?) appears fairly optimistic to me. Considering more modern code reviews would be probably (maybe in addition) more practical than classical inspections.
* Embedded items such as figures or tables could be used to support the currently text-only paper.
* Section 3 (Techniques, studies and tools): Try to give a brief overview on what topics the reader can expect in this section (i.e., how it is structured).
* You might want to look into Section "5.5 The Test Gap" in "Software Testing Techniques Technology Maturation and Research Strategy" (2001) by "Lu Luo". It briefly describes some discrepancies between testing research and practice.
* An interesting aspect for discussion could be what practical/industrial approaches (e.g., TDD) have achieved towards efficient testing compared to research approaches.

- Section 3 (Techniques, studies and tools) attempts to address a broad range of topics under a single header. Subsections might help to improve structure here.
- Some sections, especially Section 2 (Background) appear rather fragmented to me. They are not so well connected and it's unclear why certain parts are presented in this context. Example: Software inspections, referred to as "static testing", appears a bit foreign to me in that context and is never mentioned later on.
- Some parts of the discussion are unnecessarily redundant with what was explained before
    * THEO was described in Section "3 Techniques, studies and tools" and then shortly explained in Section "4.1 Discussion of tools" again: "THEO improves the cost of test executions by [...]"
- The conclusion states that a combination of intelligent techniques could lead to a promising tool even for industrial purpose. I missed the part in the paper that discusses how such a combination of approaches could be beneficial.
- It's a bit pointless to criticize test suite generation tools not to include a yet unpublished technique (ICSE 2016): "Although there is techniques for summarizing and automatically commenting test cases [22], these techniques have not been included in test suite generation tools."
- Try to reduce the number of direct (i.e., 1:1) citations. Describe with your own words instead and provide appropriate context for understanding:
    * For example DART was able to "find a way to crash 65% of the oSIP functions within 1,000 attempts for each function" [10]. => The meaning of oSIP functions is undefined and thus unclear to the reader
- Use consistent error terminology (see Software Engineering)
- Be consistent in language style (US-EN vs GB-EN):
    * "summarize" US
    * "behaviour" GB
- Minor language style:
    * Try to avoid the ambiguous "like" => a) introduce examples (e.g., "such") b) describe similarities (e.g., "similar")
    * Many sentences would read better when formulated in active instead of passive.

## CONCLUSION

The gap between research and industry in software testing is not so easy topic to discuss in a paper because the large body of scientific work cannot be easily compared to somewhat not clearly defined industrial practices. The paper focuses on research approaches and leads a reasonable discussion why they are not suited and what is still missing for successful industrial application. The high-level structure makes sense but could definitely be improved at subsection and paragraph level. Furthermore, several linguistic issues should be resolved for the final version.


## CHRONOLOGICAL COMMENTS

* p1: "Under the pressure of an often low time and cost budget, software companies have to stop testing a certain point."
    * low-time and low-cost budget reads a bit hard
    * stop testing at a certain point
* p1: "statistical criterion (e.g. code coverage)"
    * It's not really a statistical criterion (e.g., https://www.boomer.org/c/p3/c19/c1904.html), it's rather a *quality target*
* p1: "Sometimes however, testing is stopped"
    * Needs a comma after sometimes if you really like to use it together
* p1: "[25] estimated costs of 59.5 billion dollars"
    * References shouldn't be treated as main subjects, especially when starting a sentence. Restructuring the sentence or using 1st author et al. [25] are alternatives. This is sometimes more a matter of taste (still used by some authors) but considered as bad linguistic style.
* p1: "by helping developers write test cases more quickly"
    * to write test cases
* p2: "and robustness requirements[,] such as a banking software or a medical tool[,] will require"
* p2: "There is ways to reduce"
    * There are
* p2: "by maintaining a set quality level."
    * a given/predefined quality level
* p2: "The paper's goal is to"
    * I would prefer: This paper aims to
* p2: "This section is going to give a brief overview of the history"
    * Prefer present simple: This section gives a brief (it does it in general and it already exists); same for next sentence
* p3: "More about modern testing techniques can be found below."
    * below is very vague => clarify what it refers to
* p3: "Here an example for black-box testing: A device"
    * No newline after `:`
* p3: White-box testing, on the other hand,
    * Usually implies that "on the one hand" was used before to clarify what it refers to
* p3: Regarding code coverage: "Here again, exhaustive testing is not feasible due to the large number of possible combinations of paths through a program."
    * This depends on the coverage criterion (line coverage is oftentimes feasible, path coverage not)
* p3: "Today, common measures"
    * Nowadays is a bit more general than the too narrow today
* p4: "This section will present related work and explain their main findings."
    * their => related work is uncountable
* p4: "the user can define rules with a RegEx like language"
    * regex-like
* p4: "cost for missing a defect depending on the stage of development."
    * on this stage
* p4: "THEOb's cost based selection strategy"
    * cost-based
* p4: Try to avoid to excessively mention the authors:
    * "According to the authors"
    * "The authors' simulation"
    * Focus on the content instead. Once cited, the reference to the authors should be clear.
* p5: "DART will cover all paths of a method"
    * Clarify that it *attempts* to cover all paths
* p5: "also revealed at that point undiscovered bugs"
    * also revealed previously undiscovered
* p5: "The author suggest that"
    * The authors suggest => or better avoid "the authors" anyways
* p5: 'seeded defects' [7]
    * two words usually do not require direct quotation
* p5: "In average, it reached 71% per class [8]."
    * on average
* p5: "automatic test suite generation tool"
    * automated (automatic != automated)
* p5: "helps developers write code."
    * helps developers to write
* p5: "effectiveness of a test suite [12] (see below)."
    * Clarify below
* p5: "measure for the test's suites ability to "
    * test suites' ability
* p5: by the following mutation operators:
    * This enumeration somewhat separates the text here. I wouldn't use such a strong separation here.
* p5: "higher correlation between mutation score and fault detection than between code coverage and mutant score."
    * Be consistent: mutation score vs mutant score
* p6: "how testing is handled nowadays One is that the most"
    * Missing `.`
* p6: "and the missing tools"
    * rather and missing tools
* p6: Mutation testing is a rather large area of research. Citing the origin of mutation testing here would be more appropriate than some studies that use mutation testing:
    * R.G.Hamlet: Testing programs with the aid of a compiler. IEEE Transactions on Software Engineering SE-3(4), 279-290 (July 1977)
    * R.A.DeMillo, R.J.Lipton, F.G.Sayward: Hints on test data selection: Help for the practicing programmer. Computer (4), 34-41 (1978)
* p6: Why is another citation format used here? (Lehman et al 1997), (Duran et al. 1984)
* p6: "There will also be a discussion of the limitations of these techniques. outlook on the possible future topics and applications."
    * Something is wrong and not correctly connected here
* p6: "Research shown that software testing has a"
    * research has shown
* p6: "there is gap between software testing in research and its application"
    * there is a
* p7: "Very few companies use automatic testing methods"
    * automated
* p7: "Even human developers sometimes struggle to think of all the possible edge cases."
    * "even" => isn't that rather an argument pro automated approach because machines may find 'perfect' solutions given an appropriate algorithm?
* p7: "First, " and Second, 
    * Firstly, and Secondly,
* p7: "quality of a test suite []."
    * Reference missing (probably [12] ?)
* p7: "on github"
    * Github
* p7: "multi-threaded application."
    * applications
* p7: "Without a good support and proof-able quality,"
    * provable? => Is a proof realistic at all?!
* p7: "make the transition into the industry"
    * into industry
* p7: "EvoSuite seems to help reduce the testing time"
    * help to reduce
* p7: "Beta-version"
    * capitalization
    * it's version number (1.0.3) does not indicate being a "beta version" => clarify that it's probably rather meant as beta / immature status
* p8: "It can completely automatically detect"
    * sounds cumbersome
* p8: "there is no need for the tester to write and code."
    * any instead of and?
* p8: "Its main strength is detecting standard errors "
    * common errors?
* p8: "it can handle very well unit testing in"
    * it can handle [object] very well
* p8: "as long as there is doubts"
    * there are
* p9: " (for example through a user's SIGINT)"
    * I wonder whether a user can send such an interrupt when the tool automates execution?
* p9: "How this problem is solved in THEO, the authors of the paper do not tell."
    * This is a bit a hard context-switch from Harness to THEO
* p9: "50% or more of the human effort"
    * Sentences shouldn't be started with numbers. a) avoid b) write "Fifty percent"
* p9: "is actually doing 2daka:campos:fraser:dorn:weimer and 2rojas,fraser,arcuri."
    * Fix citation
* p9: "Although there is techniques "
    * there are techniques
* p10: "Furthermore, it has been shown that automatically generated test cases do not lead to a higher defect-detection rate [9]"
    * A single-sentence paragraph doesn't make much sense to me
    * `.` missing
* p10: "it is more attractive producing something new"
    * to produce
* p10: "gain more significance"
    * "significance" should be reserved for statistical meaning in scientific writing
* p10: "The study was extended by [24]."
    * Unclear which study, especially because it's at the beginning of a new paragraph
* p10: "assisting software testers, there still have to be trained"
    * they
* p10: "In this paper, we explained"
    * "we" sounds a bit strange => Oftentimes, "this paper" is used as the main subject: This paper explains ...
* p10: "On one hand there" and "On the other other hand"
    * On one hand, there
    * Be consistent with 'the'