3. Job scaling¶
3.1. What is job scaling and why does it matter?¶
“Job scaling” consists here to automatically match the computing resources requirement (time limit, number of CPUs, memory, etc.) to the size of the job. Without job scaling, you might either waste resources or not allocate enough of them. In the former scenario, your user fairshare will become uselessly big, and your subsequent jobs might wait a long time before starting. In the latter scenario, your job might just simply crash.
3.2. How does it work?¶
This process first assigns a value to the molecule that will reflect the job size and complexity. That value is what we call the scale index. The way ABIN LAUNCHER computes this value is by calling a function defined in the scaling_fcts.py file, called a scaling function. Then, this scale index will be compared to what we defined as job scales. Those scales are a set of computing resources parameters associated with a range of scale index values.
For example, let’s say we want to run a geometry optimization on the Si36Ge11H60 molecule. The scale index has been computed as 916 and the job scales are defined as follows:
job_scales:
-
label: tiny
scale_limit: 50
time: 0-00:10:00
cores: 4
mem_per_cpu: 500 # in MB
-
label: small
scale_limit: 500
time: 1-00:00:00
cores: 8
mem_per_cpu: 500 # in MB
-
label: medium
scale_limit: 1000
time: 2-00:00:00
cores: 8
mem_per_cpu: 2000 # in MB
-
label: big
scale_limit: 1500
time: 3-00:00:00
cores: 16
mem_per_cpu: 4000 # in MB
The scale_limit key defines the upper limit of that job scale for the scale index. This means our molecule is too big for the tiny and small scales, which have an upper limit of 50 and 500, respectively. It will then uses the resources defined in the medium scale, which are: a time limit of 2 days, 8 CPUs and 2000 MB of memory per CPU.
Obviously, the key part of this process lies in the quality of the job scales definition. The finer they are, the better the scaling will be. Since this is highly dependent on the program you want to run and the cluster on which it will be running, you will need to do extensive testing on your part. If your cluster uses SLURM as a job scheduler, you might want to take a look at the Benchmarking tool section.
3.3. Scaling functions¶
3.3.1. General definition¶
All the scaling functions are defined in the scaling_fcts.py file and obey some restrictions, in order to be callable by ABIN LAUNCHER:
They only take two dictionaries as arguments: the content of the
mendeleev.ymlfile and thefile_datavariable, as built by the scanning function.They return an integer or a float, that will act as the scale index.
If a problem arises when computing the scale index, an AbinError exception is raised with a proper error message (see how to handle errors for more details).
3.3.2. Choosing a scaling function¶
The scaling function that will be called by ABIN LAUNCHER is the one associated with the scaling_function YAML key defined in the clusters configuration file:
mycluster:
profiles:
myprofile1:
scaling_function: name-of-scaling-function
myprofile2:
scaling_function: name-of-scaling-function
where mycluster corresponds to the name of your cluster (given as a command line argument) while myprofile1 and myprofile2 are the names of the profiles you want to use (such as orca or qchem). This way, a different scaling function can be assigned to each profile.
3.3.3. Total number of electrons¶
- scaling_fcts.total_nb_elec(mendeleev: dict, file_data: dict)[source]¶
Calculates the total number of electrons in a molecule.
- Parameters:
mendeleev (dict) – Content of AlexGustafsson’s Mendeleev Table YAML file, which can be found at https://github.com/AlexGustafsson/molecular-data.
file_data (dict) – The extracted information of the geometry file.
- Returns:
total_elec – Total number of electrons in the molecule.
- Return type:
int
- Raises:
AbinError – If there is no atomic number defined in mendeleev for one of the constituting atoms of the molecule. (This exception is raised in the
get_nb_elec_for_elementsubfunction.)
This function starts by defining a sub-function called get_nb_elec_for_element that makes use of Mendeleev’s Periodic Table (mendeleev.yml) in order to get the atomic number of an atom type. Then, the main function gets the different atom types and their respective amount from the chemical_formula key in file_data. For each atom type, the function multiplies the associated number of electrons (obtained through the sub-function) with the amount of electrons of that type in the molecule. It finally sums up the value obtained for each atom type and returns it as the scale index.
Example for the C3H8 molecule:
---------------------------------------------------------------------
Atom Type Atomic Number Number of atoms Number of electrons
---------------------------------------------------------------------
H 1 8 8
C 6 3 18
---------------------------------------------------------------------
Total 11 26
---------------------------------------------------------------------
Scale index: 26
3.3.4. Total number of atoms¶
- scaling_fcts.total_nb_atoms(mendeleev: dict, file_data: dict)[source]¶
Returns the total number of atoms in a molecule.
- Parameters:
mendeleev (dict) – Content of AlexGustafsson’s Mendeleev Table YAML file, which can be found at https://github.com/AlexGustafsson/molecular-data. Unused in this particular function.
file_data (dict) – The extracted information of the geometry file.
- Returns:
total_atoms – Total number of atoms in the molecule.
- Return type:
int
This function simply sums up all the values of the keys in the chemical_formula key in file_data. It does not make use of mendeleev.yml but it is still passed as an argument to satisfy the calling restrictions of the scaling functions (see General definition).
3.4. Job scales¶
The job scales must be defined as follows in the job_scales key in the clusters configuration file:
myclusterA:
profiles:
myprofile1:
scaling_function: name-of-scaling-function
job_scales:
-
label: scale1
scale_limit: value
time: value
cores: value
mem_per_cpu: value
-
label: scale2
scale_limit: value
time: value
cores: value
mem_per_cpu: value
partition_name: value # This is optional
delay_command: value # This is optional
-
...
myprofile2:
scaling_function: name-of-scaling-function
job_scales:
-
label: scale1
scale_limit: value
time: value
cores: value
mem_per_cpu: value
-
label: scale2
scale_limit: value
time: value
cores: value
mem_per_cpu: value
partition_name: value # This is optional
delay_command: value # This is optional
-
...
myclusterB:
profiles:
myprofile1:
scaling_function: name-of-scaling-function
job_scales:
-
label: scale1
scale_limit: value
time: value
cores: value
mem_per_cpu: value
partition_name: value # This is optional
-
label: scale2
scale_limit: value
time: value
cores: value
mem_per_cpu: value
delay_command: value # This is optional
-
...
myprofile2:
scaling_function: name-of-scaling-function
job_scales:
-
label: scale2
scale_limit: value
time: value
cores: value
mem_per_cpu: value
partition_name: value # This is optional
delay_command: value # This is optional
-
label: scale2
scale_limit: value
time: value
cores: value
mem_per_cpu: value
-
...
where
myclusterAandmyclusterBare the names of your clusters (given as a command line argument). This way, different job scales can be assigned to each cluster.myprofile1andmyprofile2are the names of the profiles you want to use (such asorcaorqchem, given as a command line argument). This way, different job scales can be assigned to each profile.label,scale_limit,time,coresandmem_per_cpuare all mandatory keys, specifying the resources requirements of the jobs.partition_nameis an optional key containing the name of the cluster partition on which the job will be running.delay_commandis an optional key that lets you delay the submission of the jobs. For example, by delaying the bigger jobs, you can prioritize the launch of small calculations first. On SLURM, this is handled by the--beginargument of thesbatchcommand, see here.
You can have as many job scales as you want, and they don’t need to be defined in ascending order of scale index limits. ABIN LAUNCHER will automatically sort them before starting to scan the geometry files. Just remember to adjust the scale_limit of your job scales if you change your scaling function. Otherwise, those numbers won’t make sense.