1. Overview¶
1.1. What is ABIN LAUNCHER?¶
The Ab Initio Input Builder and Job Launcher, named ABIN LAUNCHER for short, is a script that creates input files for ab initio programs, and then submits (or “launches”) the corresponding calculations on a job scheduler. It can operate with one or more molecules (through their respective geometry files) and one or more configurations (the parameters for the ab initio program). ABIN LAUNCHER is the first main script of CHAINS and is executed twice through it:
The first time, it is used to build the input files associated with the ORCA program, which performs the geometry optimization of the molecule.
The second time, it is used to build the input files associated with the the Q-CHEM program, which performs the calculation of the different properties of our molecule.
In both cases, after having created the files, ABIN LAUNCHER then launches the corresponding job on the cluster.
However, ABIN LAUNCHER is completely autonomous and does not depend on CHAINS. It does not need any files outside the ones present in its own directory. It can be extracted and used to launch independent calculations, and can be very easily adapted to deal with other ab initio programs. As such, this specific part of the documentation only explains how ABIN LAUNCHER works in and of itself, and its integration into CHAINS is explained elsewhere.
1.2. Directory structure¶
The ABIN LAUNCHER directory has and must keep the following structure:
abin_launcher/
├── abin_launcher.py
├── geom_scan.py
├── scaling_fcts.py
├── renderer.py
├── abin_errors.py
├── clusters.yml
├── mendeleev.yml
└── templates/
As for what each file does, everything will be explained in more details in the other sections of this documentation, but here is a short summary:
abin_launcher.pyis the main script itself, the one that needs to be executed.geom_scan.pyis the library of functions that define how to scan the geometry files, i.e. how to read and interpret them.scaling_fcts.pyis the library of functions that define how to determine the job size, by calculating what we defined as the “scale index”.renderer.pyis the library of functions that define how to render the Jinja templates, i.e. how to create the input files and the job script.abin_errors.pycontains all the classes and functions defining how to handle errors.clusters.ymlis the YAML file containing all the information specific to the different clusters, called the clusters configuration file.mendeleev.ymlis a YAML version of Mendeleev’s periodic table procured by AlexGustafsson’s molecular-data Github repository.templatesis the directory containing all the Jinja templates that will be used byABIN LAUNCHER.
In CHAINS’ case, the templates directory structure is:
└── templates/
├── orca_job.sh.jinja
├── orca.inp.jinja
├── qchem_job.sh.jinja
└── qchem.in.jinja
where
orca.inp.jinjaandqchem.in.jinjaare the Jinja templates corresponding to the input file for ORCA and Q-CHEM, respectively.orca_job.sh.jinjaandqchem_job.sh.jinjaare the Jinja templates corresponding to the job script for ORCA and Q-CHEM, respectively. They contain all the commands that will be executed through the job scheduler to perform the calculation.
Note
There are three supplementary files: benchmark.py, cron_benchmark.sh and, in the templates directory, benchmark.jinja. Those files are not part of ABIN LAUNCHER itself but they introduce a benchmarking tool.
1.3. Command line arguments¶
usage: abin_launcher.py -m MOL_INP [MOL_INP ...] -cf CONFIG [CONFIG ...] -cl
CLUSTER_NAME -p PROFILE -o OUT_DIR [-h] [-ow] [-d]
[--max_mol MAX_MOL] [--max_cf MAX_CF] [-km] [-kc]
1.3.1. Required arguments¶
- -m, --mol_inp
Path to either a geometry file or a directory containing multiple geometry files.
- -cf, --config
Path to either a YAML configuration file or a directory containing multiple YAML configuration files, extension must be .yml or .yaml.
- -cl, --cluster_name
Name of the cluster where this script is running, as defined in the YAML clusters configuration file.
- -p, --profile
Name of the profile you wish to run jobs with, as defined in the YAML clusters configuration file.
- -o, --out_dir
Path to the directory where you want to create the subdirectories for each job.
1.3.2. Optional arguments¶
- -ow, --overwrite
If a job subdirectory for a geometry-configuration combination already exists, remove it before creating a new one.
Default: False
- -d, --dry_run
Do not launch the jobs, just create the files and directories.
Default: False
- --max_mol
Maximum number of geometry files that will be succesfully processed, limiting the number of jobs that will then be launched.
- --max_cf
Maximum number of configuration files that will be succesfully processed, limiting the number of jobs that will then be launched.
- -km, --keep_mol
Do not archive the geometry files after they have been processed and leave them where they are.
Default: False
- -kc, --keep_cf
Do not archive the configuration files after they have been processed and leave them where they are.
Default: False
1.4. How does it work?¶
The executable part of ABIN LAUNCHER is the main script, abin_launcher.py. This is the one that must be called in the command line. The overall procedure follows three main steps: scanning, scaling and rendering, followed by the small submitting step. Each of the three main steps will be more thoroughly explained in a dedicated section of this documentation. As such, this subsection will only focus on the global procedure.
Important
An important file that will be often referenced throughout this documentation is the YAML clusters configuration file (clusters.yml). Rather than presenting it in its entirety at the beginning, the relevant bits of information will be introduced in the different sections, but you can have a full overview of that file in its specific documentation.
1.4.1. Input files¶
There are two main input files for ABIN LAUNCHER:
-m / --mol_inp, the geometry files.
The files presenting the nature and the structure of your molecules. They contain the type and number of the constituting atoms and their respective coordinates.
-cf / --config, the configuration files.
The YAML files containing the parameters specific to your calculations and your programs (job type, basis set, etc.). Those files must have the .yml or .yaml extension.
In both cases, you can either indicate a specific file in the command line, or point towards a directory where there are multiple of those files. If you specify multiple input files, ABIN LAUNCHER will process each geometry-configuration combination. For example, if you have 5 geometry files and 3 configuration files, you will end up with 15 launched jobs on your cluster.
By default, every input file that has been successfully processed by ABIN LAUNCHER will be archived in a launched directory created in the same directory as the input files. This has been designed this way so that you can repeatedly use the same directory as “source” for those input files without repeating jobs. If you want to turn off this behavior, you can use the -km / --keep_mol and/or -kc / --keep_cf optional arguments to keep the geometry files and/or the configuration files, respectively. Note that if a problem occurs when processing a particular geometry-configuration combination, both the geometry file and the configuration file will be kept in place.
1.4.2. Other arguments¶
There are three other required arguments for executing ABIN LAUNCHER:
-cl / --cluster_name, the name of the cluster you are running on.
This value must be the same as the one given in the clusters configuration file, so that
ABIN LAUNCHERknows what you are referring to. (This is case-sensitive!)
Tip
This argument does not need to be the same name as the actual name of your machine. It is just a label used by ABIN LAUNCHER to know which information to get from its clusters configuration file.
-p / --profile, the name of the profile you want to run jobs with.
The profile is a label used by
ABIN LAUNCHERto know which information to get from its different files. It specifies what type of scaling and rendering will be performed, which defines which program will be run and how. As will be shown throughout this documentation, the profiles are defined in the clusters configuration file and the name given in the command line must match one of the names given in the file. (This is case-sensitive!).-o / --out_dir, the output directory
This is the jobs root directory, where each job subdirectory will be created. Those subdirectories are the ones where the files will be created and from where the jobs will be submitted to the job scheduler.
There are also a number of optional arguments that can be used to adapt to each specific situation. Their description in the command line arguments subsection should be self-explanatory.
1.4.3. First step: Scanning¶
ABIN LAUNCHER begins by scanning the geometry file, looking for the chemical formula and the atomic coordinates of the molecule.
For more details on how this scan is performed, consult the Scanning the geometry file specific documentation.
Caution
At this time, only the XYZ format is supported for geometry files. However, new formats can be added if the need arises.
1.4.4. Second step: Scaling¶
Based on the information received from the geometry file, ABIN LAUNCHER attributes a value, called the scale index, to the molecule. This value is then used to evaluate the job scale for that molecule and specify the calculation requirements accordingly (walltime, number of CPUs, memory, etc.).
For more details on how this scaling process is performed, consult the Job scaling specific documentation.
1.4.5. Third step: Rendering¶
Finally, based on user-defined Jinja templates, ABIN LAUNCHER creates the input files and the job script associated with our calculation. The content of those files is based on the information from the geometry file and the configuration file.
For more details on how this whole rendering process is performed, consult the Rendering the templates specific documentation.
1.4.6. The end step: Submitting¶
Now that everything has been prepared for the job, ABIN LAUNCHER submits it to the job scheduler. The exact command that will be executed is:
$ <submit_command> <delay_command> <job script>
where
<submit_command>is the command which submits jobs to your job scheduler. In SLURM’s case, it is thesbatchcommand. This must be indicated in the clusters configuration file:mycluster: submit_command: <submit_command>
where
myclusteris the name of your cluster (the same that was given as the -cl / --cluster_name command line argument).<delay_command>is an optional command that can delay the submission of a particular job, which can prove useful if you want to prioritize certain job sizes (consult the Job scaling specific documentation for details). In SLURM’s case, this is covered by the--beginargument.<job script>is the name of the file that will be created through the rendering process. It contains the commands needed by the job scheduler to run the calculation on the cluster.
For example, if we want to run an ORCA calculation on a SLURM cluster, but delay the submission of this job by 60 seconds, the command executed by ABIN LAUNCHER might look like:
$ sbatch --begin=now+60 orca_job.sh
Once the job has been submitted, ABIN LAUNCHER will proceed to the next configuration file with the same geometry. Once all the configuration files have been treated, it will proceed to the next geometry and treat again all the configuration files for that geometry. At the end of the execution, barring any problems, a job will have been launched for each geometry-configuration combination.
1.4.7. Output directory structure¶
If we have for example 2 geometry files and 2 configuration files, once the execution of ABIN LAUNCHER has ended, the structure of the output directory (given as the -o / --out_dir command line argument) might look like:
out_dir/
└── geometry1_config1/
├── geometry1.xyz
├── config1.yml
├── geometry1_config1.log
├── job_script.sh
└── input_file
└── geometry1_config2/
├── geometry1.xyz
├── config2.yml
├── geometry1_config2.log
├── job_script.sh
└── input_file
└── geometry2_config1/
├── geometry2.xyz
├── config1.yml
├── geometry2_config1.log
├── job_script.sh
└── input_file
└── geometry2_config2/
├── geometry2.xyz
├── config2.yml
├── geometry2_config2.log
├── job_script.sh
└── input_file
where
geometryX_configXis the job subdirectory created byABIN LAUNCHER, and from which the job will be submitted to the job scheduler.geometryX.xyzandconfigX.ymlare copies of the geometry file and the configuration file, respectively.job_script.shandinput_fileare the files created by the rendering process.geometryX_configX.logis an output file containing the details of the treatment of this geometry-configuration combination byABIN LAUNCHER(the computed scale index, the used job scale, etc.)