Linear Program-Based Policies for Restless Bandits: Necessary and Sufficient Conditions for (Exponentially Fast) Asymptotic Optimality

Nicolas Gast; Bruno Gaujal; Chen Yan

doi:10.1287/moor.2022.0101

Article Dans Une Revue Mathematics of Operations Research Année : 2023

Linear Program-Based Policies for Restless Bandits: Necessary and Sufficient Conditions for (Exponentially Fast) Asymptotic Optimality

(1) , (1) , (1, 2)

1
2

Nicolas Gast

Fonction : Auteur
PersonId : 1247
IdHAL : nicolas-gast
ORCID : 0000-0001-6884-8698
IdRef : 233247874

Performance analysis and optimization of LARge Infrastructures and Systems

Bruno Gaujal

Fonction : Auteur
PersonId : 11644
IdHAL : bruno-gaujal
ORCID : 0000-0001-9081-8401
IdRef : 074658441

Performance analysis and optimization of LARge Infrastructures and Systems

Chen Yan

Fonction : Auteur
PersonId : 1102255

Performance analysis and optimization of LARge Infrastructures and Systems

Biostatistique et Processus Spatiaux

Résumé

We provide a framework to analyse control policies for the restless Markovian bandit model, under both finite and infinite time horizon. We show that when the population of arms goes to infinity, the value of the optimal control policy converges to the solution of a linear program (LP). We provide necessary and sufficient conditions for a generic control policy to be: i) asymptotically optimal; ii) asymptotically optimal with square root convergence rate; iii) asymptotically optimal with exponential rate. We then construct the LP-index policy that is asymptotically optimal with square root convergence rate on all models, and with exponential rate if the model is non-degenerate in finite horizon, and satisfies a uniform global attractor property in infinite horizon. We next define the LP-update policy, which is essentially a repeated LP-index policy that solves a new linear program at each decision epoch. We provide numerical experiments to compare the efficiency of LP-based policies. We compare the performance of the LP-index policy and the LP-update policy with other heuristics. Our result demonstrates that the LP-update policy outperforms the LP-index policy in general, and can have a significant advantage when the transition matrices are wrongly estimated.

Mots clés

restless bandits linear programming Markov decision process

Domaines

Optimisation et contrôle [math.OC] Probabilités [math.PR]

Fichier principal

LP_finite_infinite.pdf (806.54 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Nicolas Gast : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03262307

Soumis le : jeudi 21 décembre 2023-22:10:11

Dernière modification le : mercredi 18 décembre 2024-03:37:38

Dates et versions

hal-03262307 , version 1 (18-06-2021)

hal-03262307 , version 2 (11-05-2022)

hal-03262307 , version 3 (13-09-2022)

hal-03262307 , version 4 (21-12-2023)

Licence

Paternité

Identifiants

HAL Id : hal-03262307 , version 4
ARXIV : 2106.10067
DOI : 10.1287/moor.2022.0101

Citer

Nicolas Gast, Bruno Gaujal, Chen Yan. Linear Program-Based Policies for Restless Bandits: Necessary and Sufficient Conditions for (Exponentially Fast) Asymptotic Optimality. Mathematics of Operations Research, 2023, pp.1-29. ⟨10.1287/moor.2022.0101⟩. ⟨hal-03262307v4⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LIG LIG_SRCPR INRIA2 TDS-MACS LIG-SRCPR-POLARIS INRAE ANR LIG_SIDCH MATHNUM RESEAU-EAU INRAEPACA

381 Consultations

530 Téléchargements

Linear Program-Based Policies for Restless Bandits: Necessary and Sufficient Conditions for (Exponentially Fast) Asymptotic Optimality

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager