Leveraging machines to derive domain models from user stories

Maxim Bragilovski, Ashley T. van Can, Fabiano Dalpiaz, Arnon Sturm

Research output: Contribution to journalArticlepeer-review

Abstract

Domain models play a crucial role in software development, as they provide means for communication among stakeholders, for eliciting requirements, and for representing the information structure behind a database scheme or for model-driven development. However, creating such models is a tedious activity and automated support may assist in obtaining an initial domain model that can later be enriched by human analysts. In this paper, we compare the effectiveness of various approaches for deriving domain models from a given set of user stories. We contrast human derivation (of both experts and novices) with machine derivation; for the latter, we compare (i) the Visual Narrator: an existing rule-based NLP approach; (ii) a machine learning classifier that we feature engineered; and (iii) a generative AI approach that we constructed via prompt engineering with multiple configurations. Based on a benchmark dataset comprising nine collections of user stories and their corresponding domain models, the evaluation shows that while no approach matches human performance, large language models (LLMs) are not statistically outperformed by human experts in deriving classes. Additionally, a tuned version of the machine learning approach achieves results close to human performance in deriving associations. To better understand the results, we qualitatively analyze them and identify differences in the types of false positives as well as other factors that affect performance.

Original languageAmerican English
JournalRequirements Engineering
DOIs
StateAccepted/In press - 1 Jan 2025

Keywords

  • Domain models
  • Large language models
  • Machine learning
  • Model derivation
  • Requirements engineering
  • User stories

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Leveraging machines to derive domain models from user stories'. Together they form a unique fingerprint.

Cite this