Here's my review of the ChatDev paper, let me know what you think!

https://arxiv.org/abs/2307.07924

Qian, C., Cong, X., Yang, C., Chen, W., Su, Y., Xu, J., ... & Sun, M. (2023). Communicative agents for software development. arXiv preprint arXiv:2307.07924.

This paper presents a new approach to software development where many calls to LLMs in different roles (CEO, CTO, programmer, reviewer, etc) build an entire software project. The novelty of the paper seems to be the specific roles of the LLMs and the flow of calls between LLMs to design, write, review, and test code. I also liked that there were artistic agents that made assets to be used in the software (like player icons and button icons). My biggest issue with the paper is that it doesn’t formally define “thought instruction” or provide clear enough examples of it and the software projects it generated were small at only a few hundred lines of code. Experimentally, I’m not sure how well it generalizes because it is not clear if the evaluation dataset was used in the training of GPT 3.5.

Questions I had about this paper:

1. Are we sure the dataset for instruction-following (Camel[23]) is not in the training set of the LLM? If so, perhaps these results won’t generalize well to new software projects.

Comments:

1. Gpt 3.5 was used instead of 4, so maybe the results will be better when using Gpt 4

2. One of the key contributions seems to be the “thought instruction” mechanism, but there is no clear example of exactly what that is. On pages 6 and 7 it says “thought instruction includes swapping roles to inquire about which methods are not yet implemented and then switching back to the provide the programmer with more precise instructions to follow”. Is that all “thought instruction” is? I recommend a more formal or complete description of it in the paper.

3. Lines of source code for the projects was pretty small, with the max being 359 lines of code generated in one project.

4. The paper makes a big deal about the price of their approach being tiny ($0.2967) but given that the projects are so tiny, this is not necessarily cheaper than human developers for more realistic software systems if the cost does not scale linearly with the number of lines of code.

5. “Fortunately, with the thought instruction mechanism proposed in this paper, such bugs can often be easily resolved by importing the required class or method.” - this sentence is vague and doesn’t fully explain how “thought instruction” solves these types of bugs.

6. In the discussion section it says this approach to software development is “training-free” but the LLM has to be trained, so that’s not exactly fair to say.

7. I appreciated the examples in the appendix, however I wish the paper was clearer about exactly what “thought instruction” is, such as providing an example with and without thought instruction.

Reply to this note

Please Login to reply.

Discussion

No replies yet.