Administrator Guide
Here is a quick tutorial to setup a minimum PanDA system.
0. Hardware Requirements
It is recommended to install JEDI and the PanDA server on separate virtual machines (VMs), but it is possible to install them on a single VM for small testing purposes. A minimum PanDA system would be composed of 3 VMs; the first VM for JEDI and the PanDA server, the second VM for Harvester, and the third VM for the PanDA monitor. The following table shows the minimum hardware configuration.
Component |
Cores |
RAM (GB) |
Disk (GB) |
---|---|---|---|
JEDI + PanDA server |
4 |
8 |
100 |
Harvester |
4 |
8 |
100 |
BigPandaMon |
8 |
16 |
70 |
1. Database Setup
The database is the backbone of the PanDA server and JEDI, so it needs to be setup before start installation of those components. You should go through the Database page.
2. PanDA Server Setup
The next step is to install the PanDA server on a VM following PanDA server installation guide.
You need to decide the userid and group under which the PanDA server runs before editing configuration files.
Make sure that the userid and group are consistent in panda_server.cfg
and panda_server-httpd.conf
,
the permission of log directories is set accordingly.
It would be good to optimize the number of processes in the httpd.conf based on your VM’s configuration,
e.g,
StartServers 4
MinSpareServers 4
ServerLimit 64
MaxSpareServers 64
MaxClients 64
MaxRequestsPerChild 2000
WSGIDaemonProcess pandasrv_daemon processes=4 threads=1 home=/home/iddssv1 inactivity-timeout=600
Then add a new virtual organization following this section.
Make sure that the organization is added to PanDA IAM.
We use the wlcg
organization in this tutorial.
You also need to configure the firewall on the VM to allow access to 25080 and 25443 from outside.
3. JEDI Setup
Once the PanDA server is ready, you can install JEDI on the same VM following JEDI installation guide.
You need to use the name of the virtual organization when configuring plugins in panda_jedi.cfg
.
For testing purposes it would be enough to use generic plugins as shown below:
[ddm]
modConfig = wlcg:1:pandajedi.jediddm.GenDDMClient:GenDDMClient
[confeeder]
procConfig = wlcg:any:1
[taskrefine]
modConfig = wlcg:any:pandajedi.jedirefine.GenTaskRefiner:GenTaskRefiner
procConfig = ::1
[jobbroker]
modConfig = wlcg:any:pandajedi.jedibrokerage.GenJobBroker:GenJobBroker
[jobthrottle]
modConfig = wlcg:any:pandajedi.jedithrottle.GenJobThrottler:GenJobThrottler
[jobgen]
procConfig = wlcg:any:1:
[postprocessor]
modConfig = wlcg:any:pandajedi.jedipprocess.GenPostProcessor:GenPostProcessor
procConfig = ::1
[watchdog]
modConfig = wlcg:any:pandajedi.jedidog.GenWatchDog:GenWatchDog
procConfig = wlcg:any:1
[taskbroker]
modConfig = wlcg:any:pandajedi.jedibrokerage.GenTaskBroker:GenTaskBroker
procConfig = wlcg:any:1
[tcommando]
procConfig = ::1
[tasksetup]
modConfig = wlcg:any:pandajedi.jedisetup.GenTaskSetupper:GenTaskSetupper
5. Testing JEDI and the PanDA server
At this stage, you can submit a test task to the PanDA server and let JEDI generate jobs. Before start testing, start the PanDA server and JEDI.
/sbin/service httpd-pandasrv start
/sbin/service panda-jedi start
Then setup panda-client as explained at panda-client setup guide. You need to set PANDA_URL_SSL and PANDA_URL after sourcing panda_setup.sh, to point to your PanDA server, e.g.,
export PANDA_URL_SSL=https://pandaserver-doma.cern.ch:25443/server/panda
export PANDA_URL=http://pandaserver-doma.cern.ch:25080/server/panda
in addition to the parameters mentioned at client setup for OIDC-based auth, e.g.,
export PANDA_AUTH=oidc
export PANDA_AUTH_VO=wlcg
export PANDA_VERIFY_HOST=off
An example of a test task is available at this link.
wget https://raw.githubusercontent.com/PanDAWMS/panda-jedi/master/pandajedi/jeditest/addNonAtlasTask.py
In this script
taskParamMap['vo'] = 'wlcg'
taskParamMap['prodSourceLabel'] = 'test'
taskParamMap['site'] = 'TEST_SITE'
they would need to be changed to organization, activity, computing resource names registered in the previous step. Then
python addNonAtlasTask.py
You will see a jediTaskID if successful.
The task is passed to JEDI through the PanDA server, and goes through TaskRefiner
, ContentsFeeder
,
and JobGenerator
agents in JEDI. Each agent should give logging messages in logdir/panda-AgentName.log
like
2021-02-24 07:34:13,694 panda.log.TaskRefiner: DEBUG < jediTaskID=24326915 > start
And once jobs are submitted there should be messages like
2021-02-24 07:34:52,905 panda.log.JobGenerator: INFO <jediTaskID=24326915 datasetID=359212908> submit njobs=1 jobs
in logdir/panda-JobGenerator.log. There should be also many messages in logdir/panda-JediDBProxy.log
about database interactions.
Jobs are passed to the PanDA server. If you see something like
2021-02-24 07:34:29,399 panda.log.DBProxy: DEBUG activateJob : 4981974846
in logdir/panda-DBProxy.log
this means that the job successfully went through PanDA server components
and is ready to be pickup by the pilot.
6. Harvester Setup
In this tutorial we use HTCondor as submission backend, so first you need to install HTCondor on the VM where Harvester will be installed. HTCondor documentation will help.
Then refer to Harvester installation guide
to install Harvester on the same VM. For small scale tests it is enough to use the sqlite3 database backend.
Make sure that harvester_id
in panda_harvester.cfg
can be an arbitrary unique string but it needs to be
registered in the database of JEDI and the PanDA server (i.e., not the harvester database),
INSERT INTO DOMA_PANDA.HARVESTER_INSTANCES (HARVESTER_ID,DESCRIPTION) VALUES('your_harvester_id','some description');
6.1. Queue Configuration
In this tutorial, queues are specified in a local json file, so panda_harvester.cfg
has
[qconf]
configFile = panda_queueconfig.json
queueList =
ALL
panda_queueconfig.json
could be something like
a config example
where the computing resource defined in the previous step TEST_SITE is set to “online”.
"TEST_SITE": {
"queueStatus": "online",
"prodSourceLabel": "test",
"templateQueueName": "production.pull",
"maxWorkers": 1,
"nQueueLimitWorkerMin": 1,
"nQueueLimitWorkerMax": 2,
"submitter": {
"templateFile": "/opt/panda/misc/grid_submit_pilot.sdf"
}
},
}
where the templateFile
is a template file to generate sdf files like
an sdf template example
Each sdf file has
executable = /opt/panda/misc/runpilot2-wrapper.sh
arguments = "-s {computingSite} -r {computingSite} -q {pandaQueueName} -j {prodSourceLabel} -i {pilotType} \
-t -w generic --pilot-user generic --url https://pandaserver-doma.cern.ch -d --harvester-submit-mode PULL \
--allow-same-user=False --job-type={jobType} {pilotResourceTypeOption} {pilotUrlOption}"
to launch the pilot on a worker node. runpilot2-wrapper.sh
is available in
the pilot-wrapper repository.
You need to put a template file and the pilot wrapper on the VM, and edit the template file and
panda_queueconfig.json
accordingly. Note that the --url
argument must take the URL of your PanDA server
so that the pilot will talk to your PanDA server.
6.2 Testing Harvester
Now you can start Harvester to submit the pilot and see if the pilot properly communicates with the PanDA server.
etc/rc.d/init.d/panda_harvester start
Harvester logs are available in the directory specified in panda_common.cfg
. It is good to check
panda_harvester_stdout.log
, panda_harvester_stderr.log
, and panda-submitter.log
.
Once the pilot is sent out through HTCondor, there should be log files in the directly specified in the sdf template
file.
log = {logDir}/{logSubdir}/grid.$(Cluster).$(Process).log
output = {logDir}/{logSubdir}/grid.$(Cluster).$(Process).out
error = {logDir}/{logSubdir}/grid.$(Cluster).$(Process).err
where {logDir}
is specified in panda_queueconfig.json
and {logSubdir}
is automatically defined
by Harvester based on the timestamp.
If communication between the pilot and the PanDA server is successful there will be messages in PanDA
server’s log files such as panda_server_access_log`, `panda-JobDispatcher.log
, and panda-DBProxy.log
.