We consider a network of controlled sensors that monitor the unknown health state of a patient. We assume that the health state process is a Markov chain with a transition matrix that is unknown to the controller. At each timestep, the controller chooses a subset of sensors to activate, which incurs an energy (i.e., battery) cost. Activating more sensors improves the estimation of the unknown state, which introduces an energy-accuracy tradeoff. Our goal is to minimize the combined energy and state misclassification costs over time. Activating sensors now also provides measurements that can be used to learn the model, improving future decisions. Therefore, the learning aspect is intertwined with the energy-accuracy tradeoff. While Reinforcement Learning (RL) is often used when the model is unknown, it cannot be directly applied in health monitoring since the controller does not know the (health) state. Therefore, the monitoring problem is a partially observable Markov decision process (POMDP) where the cost feedback is also only partially available since the misclassification cost is unknown. To overcome this difficulty, we propose a monitoring algorithm that combines RL for POMDPs and online estimation of the expected misclassification cost based on a Hidden Markov Model (HMM). We show empirically that our algorithm achieves comparable performance with a monitoring system that assumes a known transition matrix and quantizes the belief state. It also outperforms the model-based approach where the estimated transition matrix is used for value iteration. Thus, our algorithm can be useful in designing energy-efficient and personalized health monitoring systems.
All Science Journal Classification (ASJC) codes
- !!Control and Optimization
- !!Control and Systems Engineering