JEDI Watchdogs
JEDI Watchdogs are dogs. Each watchdog has specific periodic mission to execute.
Watchdogs run independently of JEDI agents and of other watchdogs; i.e. they can run in parallel without blocking one another.
Watchdog is plugin-based so one can write new watchdogs to extend features of JEDI.
JEDI Configuration
General watchdog configuration is in jedi configuration
Specific configuration is mentioned in the section of each watchdog below.
Data Locality Updater
module path: pandajedi.jedidog.AtlasDataLocalityUpdaterWatchDog
class name: AtlasDataLocalityUpdaterWatchDog
configuration example:
[watchdog]
modConfig = (...),atlas:managed:pandajedi.jedidog.AtlasDataLocalityUpdaterWatchDog:AtlasDataLocalityUpdaterWatchDog:DataLocalityUpdater,(...)
procConfig = (...);atlas:managed:1:DataLocalityUpdater:43200;(...)
Here we ask JEDI to run 1 process of Data Locality Updater with a period of 43200 seconds (12 hours).
requirement: JEDI_Dataset_Locality
table exists in PanDA DB.
Description
Data Locality Updater queries Rucio periodically about the RSEs which store input datasets of all active production tasks.
The results are stored in JEDI_Dataset_Locality
table.
This allows other JEDI components to find the available RSEs (and hence PQs) for a task from the DB, which is more efficient than to query Rucio for the task on the fly.
GDPconfig Parameters
(None)
Task Withholder
module path: pandajedi.jedidog.AtlasTaskWithholderWatchDog
class name: AtlasTaskWithholderWatchDog
configuration example:
[watchdog]
modConfig = (...),atlas:managed:pandajedi.jedidog.AtlasTaskWithholderWatchDog:AtlasTaskWithholderWatchDog:TaskWithholder,(...)
procConfig = (...);atlas:managed:1:TaskWithholder:1800;(...)
Here we ask JEDI to run 1 process of Task Withholder with a period of 1800 seconds (30 minutes).
requirement: Data Locality Updater watchdog is running
Description
A tasks can become temporarily hopeless to run when all sites containing the required inputs are too busy or offline.
Task Withholder finds out these apparent hopeless tasks and withhold them by setting them to be in pending status, so that JEDI brokerage does not need to broker for these tasks temporarily and thus increase performance.
The tasks set to be pending will be released as other pending tasks for other reasons will.
GDPconfig Parameters
jedi.task_withholder.LIMIT_IOINTENSITY_managed
: For prodSourceLabel=managed, only tasks with ioIntensity higher than this value will be considered to be withheldjedi.task_withholder.LIMIT_PRIORITY_managed
: For prodSourceLabel=managed, only tasks with currentPriority lower than this value will be considered to be withheld
Queue Filler
module path: pandajedi.jedidog.AtlasQueueFillerWatchDog
class name: AtlasQueueFillerWatchDog
configuration example:
[watchdog]
modConfig = (...),atlas:managed:pandajedi.jedidog.AtlasQueueFillerWatchDog:AtlasQueueFillerWatchDog:QueueFiller,(...)
procConfig = (...);atlas:managed:1:QueueFiller:600;(...)
Here we ask JEDI to run 1 process of Queue Filler with a period of 600 seconds (10 minutes).
requirement: Data Locality Updater watchdog is running
Description
Queue Filler finds out empty sites and pre-assigns some proper production tasks to these sites to fill them with jobs.
Here a proper production tasks satisfies: * The certain site to be filled has the input datasets of the task * The tasks satisfies the constraints about the site (defined on CRIC); e.g. memory * The task still has enough ready but un-processed files, so that it will have enough jobs to fill the queue
When a site is no longer empty, the tasks pre-assign to the site by Queue Filler will be released.
GDPconfig Parameters
jedi.queue_filler.MAX_PREASSIGNED_TASKS_managed
: Max number of production tasks to pre-assign to an empty queue per resource typejedi.queue_filler.MIN_FILES_READY_managed
: Min number of ready files a task which can be pre-assigned should havejedi.queue_filler.MIN_FILES_REMAINING_managed
: Min number of remaining files a task which can be pre-assigned should have