7. Benchmarking tool

7.1. Why is it useful?

As mentioned in the Job scaling section, one of the key parts of the scaling process is the definition of your job scales. A good amount of finely tuned job scales will allow you to avoid wasting resources, thus diminishing your fairshare usage of the cluster, making your jobs spend less time waiting in the queue. Since this is highly dependent on the program you want to run and the cluster on which it will be running, you will need to do extensive testing on your part.

With that said, if your cluster uses SLURM as a job scheduler, we might be able to offer some help in that regard through a benchmarking tool. This tool allows you to see the efficiency of your resources requirements, by calculating how much of the allocated resources were actually used by the job.

7.2. Important SLURM commands

7.2.1. The sacct and squeue commands

The two key SLURM commands we use here are sacct and squeue. The sacct command can directly fetch job accounting data from the SLURM database. Depending on your cluster configuration however, reaching the SLURM database can necessitate an Internet access, which the computing nodes might not have. This command will then only be used from the login nodes, when the job has finished. On the other hand, the squeue command gets its information locally, which means we can use it for the information that will be obtained through the job script, while the job is still running.

Every data about our jobs will be obtained through those two commands. If you want to customize the benchmarking procedure, you should familiarize yourself with them on SLURM’s official documentation (sacct docs and squeue docs).

7.2.2. The seff command: a possible alternative

An alternative option is the seff command (see source code and example). While it pretty much does exactly what we want, there are however some problems with it:

  • It requires SLURM 15.08 or a more recent version, which might cause problems with some older machines.

  • It does not give reliable statistics when the job is running, so a cron task or similar* will be needed in order to automatically check the resources usage after the job has ended.

  • It computes the CPU and memory efficiencies, but nothing is done about the time efficiency.

Other than that, it still is a good option so feel free to use it if it satisfies your needs!

* If your cluster administrator allows it, you can also use the strigger command to trigger a script executing the seff command at the end of the job, consult the documentation for details.

7.3. How does it work?

This process requires three files:

  • A Jinja template, named benchmark.jinja, placed in the templates directory of ABIN LAUNCHER. It is an extension of the job script template.

  • A Python script, named benchmark.py, which must be placed in ABIN LAUNCHER’s directory.

  • A Shell script, named cron_benchmark.sh, which must also be placed in ABIN LAUNCHER’s directory. It will be executed through a cron task and make the link between the first two files.

7.3.1. The role of the Jinja template

At the end of the job script, some additional commands, provided by the benchmark.jinja template, will fetch the following information:

  • The name of the profile and the name of the cluster

  • The chosen job scale and its associated resources requirements

  • The job ID and name

  • The scaling function and the computed scale index

  • The number of nodes and the nodes list

  • Four dates: the submit date, the eligible date, the start date and the end date (which is the current date at which the job ends since those instructions are executed at the end of the job script)

Some of that information is provided directly by ABIN LAUNCHER while the others are obtained through the squeue command. The Jinja template will then store that information by either creating or updating a temporary CSV file.

Warning

The job ID and name are the crucial parts of this file. No matter how you want to customize it, you must keep that information in the temporary CSV file. Otherwise, the Python script won’t be able to fetch its data.

7.3.2. The role of the crontab script

Meanwhile, on the cluster, the cron_benchmark.sh Shell script will be periodically executed through a cron task to check the existence of that temporary CSV file. If the file exists, the script will archive it then execute the benchmark.py Python script to scan and process its content.

7.3.3. The role of the Python script

The benchmark.py Python script will read the content of the temporary CSV file, then fetch the following data using the sacct command:

  • The “reserved”, or queued, time (time between the submit date and the eligible date)

  • The elapsed time (duration of the job)

  • The time efficiency (percentage of used time vs required time)

  • The maximum used memory (in MB)

  • The memory efficiency (percentage of used memory vs required memory)

  • The total time used by all of the CPUs

  • The “wall CPU”, which is the total amount of time CPUs could have used (derived from the duration of the job and the number of CPUs)

  • The CPU efficiency (percentage of total time used by the CPUs vs “wall CPU”)

Then the script will store that information by either creating or updating a final CSV file. That file is a repeat of what was already present in the temporary file, enriched by the new data provided by the sacct command.

The only thing left to do is then to make a copy of that final CSV file on your local computer and open it in your favorite spreadsheet! (like Microsoft Excel)

7.4. Usage and configuration

Before starting to edit the files, you need to decide two important values:

  • benchmark_path, which is the location of your benchmark directory, i.e the directory where all the output files of this benchmarking tool will be stored.

  • prefix, which is the prefix that will be common to all the files created by this tool.

7.4.1. Prepare the Jinja template

First of all, make sure the benchmark.jinja template is present in the templates directory of ABIN LAUNCHER. Then add the following line at the end of your job script template (which should be in the same directory):

{% include "benchmark.jinja" %}

Since that template requires some specific variables, add the following code to your rendering function after having defined your script_render_vars dictionary for your job script, but before calling the jinja_render function for that file:

script_render_vars.update({
   "benchmark_path" : <value>,
   "prefix": <value>,
   "profile" : job_specs['profile'],
   "cluster_name" : job_specs['cluster_name'],
   "jobscale_label" : job_specs['scale_label'],
   "job_walltime" : job_specs['walltime'],
   "job_mem_per_cpu" : job_specs['mem_per_cpu'], # in MB
   "scaling_function" : job_specs['scaling_fct'],
   "scale_index" : job_specs['scale_index']
})

where benchmark_path is the path towards your benchmark directory and prefix the desired prefix for your benchmarking output files.

Now, at the end of your job, a new temporary CSV file will be created in your benchmark directory, named <prefix>_tmp.csv. If the file already exists, a new line will simply be added to it.

7.4.2. Configure the cron task

Use the crontab -e command in your terminal to edit your cron tasks and add the following line:

*/15 * * * * bash -l -c "/path/to/cron_benchmark.sh <prefix> <benchmark_path>" >> benchmark_path/<prefix>_crontab.log 2>&1

where

  • */15 * * * * defines the frequency of execution of this command (at every 15th minutes). Feel free to adjust this value.

  • /path/to/cron_benchmark.sh is the path towards the crontab script.

  • <prefix>_crontab.log is a log file that will contain the output of the execution of this crontab script. This is just a suggested name, feel free to change it however you like.

Don’t forget to also make the cron_benchmark.sh script executable (chmod u+x) !

When executed, the cron_benchmark.sh script will look if there is a file named <prefix>_tmp.csv (the temporary CSV file) in your benchmark directory. If there is, the script will archive it into an archive subdirectory and rename it with the current date. It will then execute benchmark.py on that file.

Tip

If it is not loaded by default in your user profile configuration, remember to load your Python distribution so that the crontab script can execute benchmark.py. This can for example be done by adding instructions at the beginning of the crontab script, or by executing the loading command in the cron task, right before executing the crontab script.

7.4.3. The Python script

usage: benchmark.py -t TMP -f FINAL -p PROB [-h]

7.4.3.1. Required arguments

-t, --tmp

Path towards the temporary CSV file that contains the lines you want to enrich.

-f, --final

Path towards the final CSV file that will contain the enriched lines.

-p, --prob

Path towards a separate CSV file that will contain the problematic lines.

7.4.3.2. How to use it?

You don’t need to edit the Python script, it will be automatically executed by the crontab script with the following arguments:

  • <benchmark_path>/<prefix>_tmp.csv for the -t, --tmp argument

  • <benchmark_path>/<prefix>_final.csv for the -f, --final argument

  • <benchmark_path>/<prefix>_prob.csv for the -p, --prob argument

This will either create or update the final csv file, named <prefix>_final.csv and placed inside the benchmark directory. The log file of this Python execution, named <prefix>_<current_date>.log, can be found inside a bench_logs subdirectory created by the crontab script.

7.4.3.3. How to deal with problematic lines?

If some lines of the temporary CSV files were to cause any kind of problem, they will be stored inside a problematic CSV file, named <prefix>_prob.csv and placed inside the benchmark directory. After having consulted the log file and diagnosed the problem, you might then wish to rerun the Python script on this file. In this case, you can just either:

  • manually execute benchmark.py using the path towards the problematic CSV file for the -t, --tmp argument (and something else for the -p, --prob one)

  • rename <prefix>_prob.csv into <prefix>_tmp.csv*, then manually execute cron_benchmark.sh (with <prefix> as a command line argument).

* Be careful to not erase a real temporary CSV file by doing so.

7.4.4. Dealing with multiple profiles

Unfortunately, this tool only works with one profile at a time. If you have multiple profiles you want to use with ABIN LAUNCHER (or even similar profiles on multiple clusters), you will have to configure the benchmarking tool for each of those profiles.

This implies that you need to:

  • Add the {% include "benchmark.jinja" %} line to all job script templates.

  • Add the Jinja variables definition (script_render_vars.update) to all the rendering functions.

  • Add the cron task command to the crontab of every cluster where the rendering functions will be executed.

Multiple profiles can share the same cron task if you don’t mind their lines being in the same CSV file. To prepare for this eventuality, the first column of the CSV file is the name of the profile, to help differentiate them.

7.5. Sample run

Let’s end this section with a sample run of the benchmarking tool. We will use our example from the previous section, with the three molecules (or geometry files) and the two configuration files.

Tip

Every file presented in this subsection can be downloaded here.

7.5.1. Preparation

Our benchmark_path will be /home/users/n/i/niacobel/abin_docs_sample/benchmark and our prefix will be sample_orca.

This is our starting directory structure:

abin_docs_sample/
   └── abin_launcher/
         ├── benchmark.py
         ├── cron_benchmark.sh
         ├── abin_launcher.py
         ├── geom_scan.py
         ├── scaling_fcts.py
         ├── renderer.py
         ├── abin_errors.py
         ├── clusters.yml
         ├── mendeleev.yml
         └── templates/
               ├── benchmark.jinja
               ├── sample_orca.inp.jinja
               └── sample_orca_job.sh.jinja
   └── molecules/
         ├── ch4.xyz
         ├── c2h6.xyz
         └── c3h8.xyz
   └── configs/
         ├── svp.yml
         └── tzvp.yml
   └── orca_jobs/
         └── currently empty

Note that our benchmark directory does not exist yet, but this is not necessary since the instructions contained in the Jinja template can create it if needed.

7.5.1.1. The Jinja template

At the end of sample_orca_job.sh.jinja, we add the following line:

{% include "benchmark.jinja" %}

and in the sample_orca_render function of renderer.py, we add

script_render_vars.update({
   "benchmark_path" : "/home/users/n/i/niacobel/abin_docs_sample/benchmark",
   "prefix": "sample_orca",
   "profile" : job_specs['profile'],
   "cluster_name" : job_specs['cluster_name'],
   "jobscale_label" : job_specs['scale_label'],
   "job_walltime" : job_specs['walltime'],
   "job_mem_per_cpu" : job_specs['mem_per_cpu'], # in MB
   "scaling_function" : job_specs['scaling_fct'],
   "scale_index" : job_specs['scale_index']
})

If you are not sure of where exactly you need to add this portion of code, you can click on the arrow below to see the complete source code of renderer.py.

renderer.py
################################################################################################################################################
##                                                                The Renderer                                                                ##
##                                                                                                                                            ##
##                                       This script contains the rendering functions for ABIN LAUNCHER,                                      ##
##                                consult the documentation at https://chains-ulb.readthedocs.io/ for details                                 ##
################################################################################################################################################

import os

from jinja2 import Environment, FileSystemLoader

import abin_errors


def jinja_render(templates_dir:str, template_file:str, render_vars:dict):
    """Renders a file based on its Jinja template.

    Parameters
    ----------
    templates_dir : str
        The path towards the directory where the Jinja template is located.
    template_file : str
        The name of the Jinja template file.
    render_vars : dict
        Dictionary containing the definitions of all the variables present in the Jinja template.

    Returns
    -------
    output_text : str
        Content of the rendered file.
    """
   
    file_loader = FileSystemLoader(templates_dir)
    env = Environment(loader=file_loader)
    template = env.get_template(template_file)
    output_text = template.render(render_vars)
    
    return output_text

# =================================================================== #
# =================================================================== #
#                         Rendering functions                         #
# =================================================================== #
# =================================================================== #

def sample_orca_render(mendeleev:dict, clusters_cfg:dict, config:dict, file_data:dict, job_specs:dict, misc:dict):
    """Renders the job script and the input file associated with the ORCA program.
    
    Parameters
    ----------
    mendeleev : dict
        Content of AlexGustafsson's Mendeleev Table YAML file (found at https://github.com/AlexGustafsson/molecular-data).
        Unused in this function.
    clusters_cfg : dict
        Content of the YAML clusters configuration file.
    config : dict
        Content of the YAML configuration file.
    file_data : dict
        Information extracted by the scanning function from the geometry file.
    job_specs : dict
        Contains all information related to the job.
    misc : dict
        Contains all the additional variables that did not pertain to the other arguments.
    
    Returns
    -------
    rendered_content : dict
        Dictionary containing the text of all the rendered files in the form of <filename>: <rendered_content>.
    rendered_script : str
        Name of the rendered job script, necessary to launch the job.
    
    Notes
    -----
    Pay a particular attention to the render_vars dictionaries, they contain all the definitions of the variables appearing in your Jinja templates.
    """
    
    # Define the names of the templates
    
    template_input = "sample_orca.inp.jinja"
    template_script = "sample_orca_job.sh.jinja"
    
    # Define the names of the rendered files
    
    rendered_input = misc['mol_name'] + ".inp"
    rendered_script = "orca_job.sh"
    
    # Initialize the dictionary that will be returned by the function
    
    rendered_content = {}
    
    # Render the template for the input file

    print("{:<80}".format("\nRendering the jinja template for the orca input file ..."), end="")
    
    input_render_vars = {
      "method" : config['method'],
      "basis_set" : config['basis_set'],
      "job_type" : config['job_type'],
      "charge" : config['charge'],
      "multiplicity" : config['multiplicity'],
      "coordinates" : file_data['atomic_coordinates']
    }
    
    rendered_content[rendered_input] = jinja_render(misc['templates_dir'], template_input, input_render_vars)
    
    print('%12s' % "[ DONE ]")

    # Render the template for the job script

    print("{:<80}".format("\nRendering the jinja template for the orca job script ..."), end="")
    
    script_render_vars = {
      "mol_name" : misc['mol_name'],
      "config_name" : misc['config_name'],
      "user_email" : config['user_email'],
      "mail_type" : config['mail_type'],
      "job_walltime" : job_specs['walltime'],
      "job_cores" : job_specs['cores'],
      "job_mem_per_cpu" : job_specs['mem_per_cpu'],
      "partition" : job_specs['partition'],
      "set_env" : clusters_cfg[job_specs['cluster_name']]['profiles'][job_specs['profile']]['set_env'],
      "command" : clusters_cfg[job_specs['cluster_name']]['profiles'][job_specs['profile']]['command'],
      "profile" : job_specs['profile']
    }

    # Add variables specific to the benchmarking template
   
    script_render_vars.update({
      "benchmark_path" : "/home/users/n/i/niacobel/abin_docs_sample/benchmark",
      "prefix": "orca_lemaitre3",
      "profile" : job_specs['profile'],
      "cluster_name" : job_specs['cluster_name'],
      "jobscale_label" : job_specs['scale_label'],
      "job_walltime" : job_specs['walltime'],
      "job_mem_per_cpu" : job_specs['mem_per_cpu'], # in MB
      "scaling_function" : job_specs['scaling_fct'],
      "scale_index" : job_specs['scale_index']
    })
    
    rendered_content[rendered_script] = jinja_render(misc['templates_dir'], template_script, script_render_vars)
    
    print('%12s' % "[ DONE ]")

    # Return the content of the rendered files and the name of the rendered job script
    
    return rendered_content, rendered_script

7.5.1.2. The cron task and the crontab script

Now we execute the crontab -e command in our terminal to edit our cron tasks and add the following line:

*/15 * * * * bash -l -c "/home/users/n/i/niacobel/abin_docs_sample/abin_launcher/cron_benchmark.sh sample_orca /home/users/n/i/niacobel/abin_docs_sample/benchmark" >> /home/users/n/i/niacobel/abin_docs_sample/benchmark/sample_orca_crontab.log 2>&1

Finally, we edit the beginning of our cron_benchmark.sh file to load our Python distribution:

cron_benchmark.sh, lines 7-15
####################################
#       Script configuration       #
####################################

# Load your Python distribution

module --force purge
module load releases/2018b
module load Python/3.6.6-foss-2018b

and we make sure it is executable by entering the following command in our terminal:

$ chmod u+x /home/users/n/i/niacobel/abin_docs_sample/abin_launcher/cron_benchmark.sh

Now our benchmarking tool is ready to run!

7.5.2. Execution

We just run ABIN LAUNCHER as normal, by executing the main script (from abin_docs_sample):

$ python abin_launcher/abin_launcher.py -m molecules/ -cf configs/ -p sample_orca -o orca_jobs/ -cl lemaitre3

We obtain the same results than before, with the six launched jobs.

As soon as each job finishes, the temporary CSV file will either be created or updated with a new line. After the six jobs have finished, this is what the raw file looks like:

sample_orca_tmp.csv
Profile;Cluster;Jobscale;Partition;Cores;MB/CPU;Walltime;Job ID;Job Name;Scaling Function;Scale Index;Submit Date;Eligible Date;Start Date;End Date;Nodes;Nodes List
sample_orca;lemaitre3;tiny;batch;4;500;0-00:10:00;69232077;sample_orca_ch4_svp;total_nb_elec;10;2020-11-09 17:05:07;2020-11-09 17:05:07;2020-11-09 17:05:07;2020-11-09 17:05:21;1;lm3-w061
sample_orca;lemaitre3;tiny;batch;4;500;0-00:10:00;69232076;sample_orca_ch4_tzvp;total_nb_elec;10;2020-11-09 17:05:07;2020-11-09 17:05:07;2020-11-09 17:05:07;2020-11-09 17:05:24;1;lm3-w064
sample_orca;lemaitre3;tiny;batch;4;500;0-00:10:00;69232081;sample_orca_c2h6_svp;total_nb_elec;18;2020-11-09 17:05:09;2020-11-09 17:05:09;2020-11-09 17:05:10;2020-11-09 17:05:46;1;lm3-w063
sample_orca;lemaitre3;tiny;batch;4;500;0-00:10:00;69232079;sample_orca_c3h8_svp;total_nb_elec;26;2020-11-09 17:05:08;2020-11-09 17:05:08;2020-11-09 17:05:10;2020-11-09 17:05:52;1;lm3-w062
sample_orca;lemaitre3;tiny;batch;4;500;0-00:10:00;69232080;sample_orca_c2h6_tzvp;total_nb_elec;18;2020-11-09 17:05:08;2020-11-09 17:05:08;2020-11-09 17:05:10;2020-11-09 17:06:12;1;lm3-w062
sample_orca;lemaitre3;tiny;batch;4;500;0-00:10:00;69232078;sample_orca_c3h8_tzvp;total_nb_elec;26;2020-11-09 17:05:08;2020-11-09 17:05:08;2020-11-09 17:05:10;2020-11-09 17:07:37;1;lm3-w061

and in a more human-readable fashion:

Profile

Cluster

Jobscale

Partition

Cores

MB/CPU

Walltime

Job ID

Job Name

Scaling Function

Scale Index

Submit Date

Eligible Date

Start Date

End Date

Nodes

Nodes List

sample_orca

lemaitre3

tiny

batch

4

500

0-00:10:00

69232077

sample_orca_ch4_svp

total_nb_elec

10

2020-11-09 17:05:07

2020-11-09 17:05:07

2020-11-09 17:05:07

2020-11-09 17:05:21

1

lm3-w061

sample_orca

lemaitre3

tiny

batch

4

500

0-00:10:00

69232076

sample_orca_ch4_tzvp

total_nb_elec

10

2020-11-09 17:05:07

2020-11-09 17:05:07

2020-11-09 17:05:07

2020-11-09 17:05:24

1

lm3-w064

sample_orca

lemaitre3

tiny

batch

4

500

0-00:10:00

69232081

sample_orca_c2h6_svp

total_nb_elec

18

2020-11-09 17:05:09

2020-11-09 17:05:09

2020-11-09 17:05:10

2020-11-09 17:05:46

1

lm3-w063

sample_orca

lemaitre3

tiny

batch

4

500

0-00:10:00

69232079

sample_orca_c3h8_svp

total_nb_elec

26

2020-11-09 17:05:08

2020-11-09 17:05:08

2020-11-09 17:05:10

2020-11-09 17:05:52

1

lm3-w062

sample_orca

lemaitre3

tiny

batch

4

500

0-00:10:00

69232080

sample_orca_c2h6_tzvp

total_nb_elec

18

2020-11-09 17:05:08

2020-11-09 17:05:08

2020-11-09 17:05:10

2020-11-09 17:06:12

1

lm3-w062

sample_orca

lemaitre3

tiny

batch

4

500

0-00:10:00

69232078

sample_orca_c3h8_tzvp

total_nb_elec

26

2020-11-09 17:05:08

2020-11-09 17:05:08

2020-11-09 17:05:10

2020-11-09 17:07:37

1

lm3-w061

As you can see, a different line has been written for each of our jobs, containing the different data that have been collected so far.

After at most 15 minutes, the crontab task executes the crontab script, which archives the temporary CSV file and then runs the Python script on that file to either create or update the final CSV file. The directory structure now looks like:

abin_docs_sample/
   └── abin_launcher/
         └── no changes
   └── benchmark/
         ├── sample_orca_crontab.log
         ├── sample_orca_final.csv
         └── archive/
               └── sample_orca_tmp_20201109_171504.csv
         └── bench_logs/
               └── sample_orca_20201109_171504.log
   └── molecules/
         └── launched/
   └── configs/
         └── launched/
   └── orca_jobs/
         └── ch4_svp/
         └── ch4_tzvp/
         └── c2h6_svp/
         └── c2h6_tzvp/
         └── c3h8_svp/
         └── c3h8_tzvp/

where our final CSV file, sample_orca_final.csv, contains:

Profile

Cluster

Jobscale

Partition

Cores

MB/CPU

Walltime

Job ID

Job Name

Scaling Function

Scale Index

Submit Date

Eligible Date

Start Date

End Date

Nodes

Nodes List

Reserved

Elapsed

Time Efficiency

Max RSS (MB)

RAM Efficiency

Total CPU

Wall CPU

CPU Efficiency

sample_orca

lemaitre3

tiny

batch

4

500

0-00:10:00

69232077

sample_orca_ch4_svp

total_nb_elec

10

2020-11-09 17:05:07

2020-11-09 17:05:07

2020-11-09 17:05:07

2020-11-09 17:05:21

1

lm3-w061

00:00:00

00:00:14

0.0233

1

0.0005

00:08.479

00:00:56

0.1429

sample_orca

lemaitre3

tiny

batch

4

500

0-00:10:00

69232076

sample_orca_ch4_tzvp

total_nb_elec

10

2020-11-09 17:05:07

2020-11-09 17:05:07

2020-11-09 17:05:07

2020-11-09 17:05:24

1

lm3-w064

00:00:00

00:00:17

0.0283

1

0.0005

00:11.338

00:01:08

0.1618

sample_orca

lemaitre3

tiny

batch

4

500

0-00:10:00

69232081

sample_orca_c2h6_svp

total_nb_elec

18

2020-11-09 17:05:09

2020-11-09 17:05:09

2020-11-09 17:05:10

2020-11-09 17:05:46

1

lm3-w063

00:00:01

00:00:36

0.06

235

0.1175

00:22.476

00:02:24

0.1528

sample_orca

lemaitre3

tiny

batch

4

500

0-00:10:00

69232079

sample_orca_c3h8_svp

total_nb_elec

26

2020-11-09 17:05:08

2020-11-09 17:05:08

2020-11-09 17:05:10

2020-11-09 17:05:52

1

lm3-w062

00:00:02

00:00:42

0.07

79

0.0395

00:36.502

00:02:48

0.2143

sample_orca

lemaitre3

tiny

batch

4

500

0-00:10:00

69232080

sample_orca_c2h6_tzvp

total_nb_elec

18

2020-11-09 17:05:08

2020-11-09 17:05:08

2020-11-09 17:05:10

2020-11-09 17:06:12

1

lm3-w062

00:00:02

00:01:02

0.1033

243

0.1215

00:55.619

00:04:08

0.2218

sample_orca

lemaitre3

tiny

batch

4

500

0-00:10:00

69232078

sample_orca_c3h8_tzvp

total_nb_elec

26

2020-11-09 17:05:08

2020-11-09 17:05:08

2020-11-09 17:05:10

2020-11-09 17:07:37

1

lm3-w061

00:00:02

00:02:27

0.245

234

0.117

02:21.569

00:09:48

0.2398

Once loaded into Microsoft Excel, we can then get a nice view of every important data about our jobs:

Excel view of the final CSV file

Excel view of the final CSV file sample_orca_final.csv (click to zoom in)

Here we can see that our job scale definition does not look that good with all those low efficiency percentages. However, those were really small jobs made purely for illustration purposes and they do not have to be taken too seriously. For real, bigger jobs though, you should aim for higher efficiencies.

Note

The temporary CSV file has been archived into the archive directory, as sample_orca_tmp_20201109_171504.csv. This copy is kept purely as backup for the data. Once you’ve make sure those jobs have been correctly benchmarked, you can remove the copy.

7.5.3. Content of the log files

As you can see in the directory structure above, the benchmarking tool creates two log files:

  • The log file from the crontab script, which contains a line for each time a temporary CSV file has been processed. At this point, its content is simply:

    sample_orca_crontab.log
    2020-11-09 17:15:09	INFO - Processed new lines in /home/users/n/i/niacobel/abin_docs_sample/benchmark/sample_orca_tmp.csv
    
  • The log file from the Python script, sample_orca_20201109_171504.log, placed inside the bench_logs directory. Click on the arrow below to see its content.

    sample_orca_20201109_171504.log
    ********************************************************************************
    
         EXECUTION OF THE BENCHMARKING SCRIPT FOR SLURM CLUSTERS JOBS BEGINS NOW     
    
    ********************************************************************************
    
    
    *****************************
         0. Preparation step     
    *****************************
    
    Scanning tmp file /home/users/n/i/niacobel/abin_docs_sample/benchmark/archive/orca_lemaitre3_tmp_20201109_171504.csv ... 
        Detected CSV dialect in tmp file: excel
        Detected CSV header in tmp file : ['Profile', 'Cluster', 'Jobscale', 'Partition', 'Cores', 'MB/CPU', 'Walltime', 'Job ID', 'Job Name', 'Scaling Function', 'Scale Index', 'Submit Date', 'Eligible Date', 'Start Date', 'End Date', 'Nodes', 'Nodes List']
    
    
    ************************************
         1. Get benchmarking values     
    ************************************
    
    Processing lines ...
    
    ------------------------------------------------------------
                Job Name: orca_ch4_svp
                  Job ID: 69232077
    ------------------------------------------------------------
                Reserved: 00:00:00
                 Elapsed: 00:00:14
                Walltime: 00:10:00
         Time Efficiency: 2%
    ------------------------------------------------------------
                  MaxRSS: 1 MB
               Total MEM: 2000 MB (500 MB for each of 4 CPUs)
          RAM Efficiency: 0%
    ------------------------------------------------------------
                TotalCPU: 00:08.479
                Wall CPU: 00:00:56
          CPU Efficiency: 14%
    ------------------------------------------------------------
    
    
    ------------------------------------------------------------
                Job Name: orca_ch4_tzvp
                  Job ID: 69232076
    ------------------------------------------------------------
                Reserved: 00:00:00
                 Elapsed: 00:00:17
                Walltime: 00:10:00
         Time Efficiency: 3%
    ------------------------------------------------------------
                  MaxRSS: 1 MB
               Total MEM: 2000 MB (500 MB for each of 4 CPUs)
          RAM Efficiency: 0%
    ------------------------------------------------------------
                TotalCPU: 00:11.338
                Wall CPU: 00:01:08
          CPU Efficiency: 16%
    ------------------------------------------------------------
    
    
    ------------------------------------------------------------
                Job Name: orca_c2h6_svp
                  Job ID: 69232081
    ------------------------------------------------------------
                Reserved: 00:00:01
                 Elapsed: 00:00:36
                Walltime: 00:10:00
         Time Efficiency: 6%
    ------------------------------------------------------------
                  MaxRSS: 235 MB
               Total MEM: 2000 MB (500 MB for each of 4 CPUs)
          RAM Efficiency: 12%
    ------------------------------------------------------------
                TotalCPU: 00:22.476
                Wall CPU: 00:02:24
          CPU Efficiency: 15%
    ------------------------------------------------------------
    
    
    ------------------------------------------------------------
                Job Name: orca_c3h8_svp
                  Job ID: 69232079
    ------------------------------------------------------------
                Reserved: 00:00:02
                 Elapsed: 00:00:42
                Walltime: 00:10:00
         Time Efficiency: 7%
    ------------------------------------------------------------
                  MaxRSS: 79 MB
               Total MEM: 2000 MB (500 MB for each of 4 CPUs)
          RAM Efficiency: 4%
    ------------------------------------------------------------
                TotalCPU: 00:36.502
                Wall CPU: 00:02:48
          CPU Efficiency: 21%
    ------------------------------------------------------------
    
    
    ------------------------------------------------------------
                Job Name: orca_c2h6_tzvp
                  Job ID: 69232080
    ------------------------------------------------------------
                Reserved: 00:00:02
                 Elapsed: 00:01:02
                Walltime: 00:10:00
         Time Efficiency: 10%
    ------------------------------------------------------------
                  MaxRSS: 243 MB
               Total MEM: 2000 MB (500 MB for each of 4 CPUs)
          RAM Efficiency: 12%
    ------------------------------------------------------------
                TotalCPU: 00:55.619
                Wall CPU: 00:04:08
          CPU Efficiency: 22%
    ------------------------------------------------------------
    
    
    ------------------------------------------------------------
                Job Name: orca_c3h8_tzvp
                  Job ID: 69232078
    ------------------------------------------------------------
                Reserved: 00:00:02
                 Elapsed: 00:02:27
                Walltime: 00:10:00
         Time Efficiency: 24%
    ------------------------------------------------------------
                  MaxRSS: 234 MB
               Total MEM: 2000 MB (500 MB for each of 4 CPUs)
          RAM Efficiency: 12%
    ------------------------------------------------------------
                TotalCPU: 02:21.569
                Wall CPU: 00:09:48
          CPU Efficiency: 24%
    ------------------------------------------------------------
    
    
    End of processing
    
    
    ******************************************************
         2. Writing new information to final CSV file     
    ******************************************************
    
    Used dialect in the final CSV file: excel
    Header used in final CSV file: ['Profile', 'Cluster', 'Jobscale', 'Partition', 'Cores', 'MB/CPU', 'Walltime', 'Job ID', 'Job Name', 'Scaling Function', 'Scale Index', 'Submit Date', 'Eligible Date', 'Start Date', 'End Date', 'Nodes', 'Nodes List', 'Reserved', 'Elapsed', 'Time Efficiency', 'Max RSS (MB)', 'RAM Efficiency', 'Total CPU', 'Wall CPU', 'CPU Efficiency']
    
    Writing newly processed lines to the final file /home/users/n/i/niacobel/abin_docs_sample/benchmark/orca_lemaitre3_final.csv ...      [DONE]
    
    ********************************************************************************
    
                                    END OF EXECUTION                                
    
    ********************************************************************************