Dynamic treatment regimes operationalize precision medicine as a sequence of decision rules, one per stage of clinical intervention, that map up-to-date patient information to a recommended intervention. An optimal treatment regime maximizes the mean utility when applied to the population of interest. Methods for estimating an optimal treatment regime assume the data to be fully observed, which rarely occurs in practice. A common approach is to first use multiple imputation and then pool the estimators across imputed datasets. However, this approach requires estimating the joint distribution of patient trajectories, which can be high-dimensional, especially when there are multiple stages of intervention. We examine the application of inverse probability weighted estimating equations as an alternative to multiple imputation in the context of monotonic missingness. This approach applies to a broad class of estimators of an optimal treatment regime including both Q-learning and a generalization of outcome weighted learning. We establish consistency under mild regularity conditions and demonstrate its advantages in finite samples using a series of simulation experiments and an application to a schizophrenia study.
ASJC Scopus subject areas