Abstract
We study the problem of estimating the density $f({\mathbf {x}})$ of a random vector ${ {\mathbf {X}}}$ in ${\mathbb R}^{d}$. For a spanning tree $T$ defined on the vertex set $\{1, {\dots },d\}$ , the tree density $f_{T}$ is a product of bivariate conditional densities. An optimal spanning tree minimizes the Kullback-Leibler divergence between $f$ and $f_{T}$. From i.i.d. data we identify an optimal tree $T^{*}$ and efficiently construct a tree density estimate $f_{n}$ such that, without any regularity conditions on the density $f$ , one has $\lim _{n\to \infty } \int | f_{n}({\mathbf {x}})-f_{T^{*}}({\mathbf {x}})|d {\mathbf {x}}=0$ a.s. For Lipschitz $f$ with bounded support, ${\mathbb E}\left \{{ \int | f_{n}({\mathbf {x}})-f_{T^{*}}({\mathbf {x}})|d {\mathbf {x}}}\right \}=O\big (n^{-1/4}\big)$ , a dimension-free rate.
Original language | American English |
---|---|
Pages (from-to) | 1168-1176 |
Number of pages | 9 |
Journal | IEEE Transactions on Information Theory |
Volume | 69 |
Issue number | 2 |
DOIs | |
State | Published - 1 Feb 2023 |
Keywords
- Density estimation
- Kruskals algorithm
- consistency
- rate of convergence
- tree identification
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Library and Information Sciences