7. Benchmarking tool¶
7.1. Why is it useful?¶
As mentioned in the Job scaling section, one of the key parts of the scaling process is the definition of your job scales. A good amount of finely tuned job scales will allow you to avoid wasting resources, thus diminishing your fairshare usage of the cluster, making your jobs spend less time waiting in the queue. Since this is highly dependent on the program you want to run and the cluster on which it will be running, you will need to do extensive testing on your part.
With that said, if your cluster uses SLURM as a job scheduler, we might be able to offer some help in that regard through a benchmarking tool. This tool allows you to see the efficiency of your resources requirements, by calculating how much of the allocated resources were actually used by the job.
7.2. Important SLURM commands¶
7.2.1. The sacct and squeue commands¶
The two key SLURM commands we use here are sacct and squeue. The sacct command can directly fetch job accounting data from the SLURM database. Depending on your cluster configuration however, reaching the SLURM database can necessitate an Internet access, which the computing nodes might not have. This command will then only be used from the login nodes, when the job has finished. On the other hand, the squeue command gets its information locally, which means we can use it for the information that will be obtained through the job script, while the job is still running.
Every data about our jobs will be obtained through those two commands. If you want to customize the benchmarking procedure, you should familiarize yourself with them on SLURM’s official documentation (sacct docs and squeue docs).
7.2.2. The seff command: a possible alternative¶
An alternative option is the seff command (see source code and example). While it pretty much does exactly what we want, there are however some problems with it:
It requires SLURM 15.08 or a more recent version, which might cause problems with some older machines.
It does not give reliable statistics when the job is running, so a cron task or similar* will be needed in order to automatically check the resources usage after the job has ended.
It computes the CPU and memory efficiencies, but nothing is done about the time efficiency.
Other than that, it still is a good option so feel free to use it if it satisfies your needs!
* If your cluster administrator allows it, you can also use the strigger command to trigger a script executing the seff command at the end of the job, consult the documentation for details.
7.3. How does it work?¶
This process requires three files:
A Jinja template, named
benchmark.jinja, placed in thetemplatesdirectory ofABIN LAUNCHER. It is an extension of the job script template.A Python script, named
benchmark.py, which must be placed inABIN LAUNCHER’s directory.A Shell script, named
cron_benchmark.sh, which must also be placed inABIN LAUNCHER’s directory. It will be executed through a cron task and make the link between the first two files.
7.3.1. The role of the Jinja template¶
At the end of the job script, some additional commands, provided by the benchmark.jinja template, will fetch the following information:
The name of the profile and the name of the cluster
The chosen job scale and its associated resources requirements
The job ID and name
The scaling function and the computed scale index
The number of nodes and the nodes list
Four dates: the submit date, the eligible date, the start date and the end date (which is the current date at which the job ends since those instructions are executed at the end of the job script)
Some of that information is provided directly by ABIN LAUNCHER while the others are obtained through the squeue command. The Jinja template will then store that information by either creating or updating a temporary CSV file.
Warning
The job ID and name are the crucial parts of this file. No matter how you want to customize it, you must keep that information in the temporary CSV file. Otherwise, the Python script won’t be able to fetch its data.
7.3.2. The role of the crontab script¶
Meanwhile, on the cluster, the cron_benchmark.sh Shell script will be periodically executed through a cron task to check the existence of that temporary CSV file. If the file exists, the script will archive it then execute the benchmark.py Python script to scan and process its content.
7.3.3. The role of the Python script¶
The benchmark.py Python script will read the content of the temporary CSV file, then fetch the following data using the sacct command:
The “reserved”, or queued, time (time between the submit date and the eligible date)
The elapsed time (duration of the job)
The time efficiency (percentage of used time vs required time)
The maximum used memory (in MB)
The memory efficiency (percentage of used memory vs required memory)
The total time used by all of the CPUs
The “wall CPU”, which is the total amount of time CPUs could have used (derived from the duration of the job and the number of CPUs)
The CPU efficiency (percentage of total time used by the CPUs vs “wall CPU”)
Then the script will store that information by either creating or updating a final CSV file. That file is a repeat of what was already present in the temporary file, enriched by the new data provided by the sacct command.
The only thing left to do is then to make a copy of that final CSV file on your local computer and open it in your favorite spreadsheet! (like Microsoft Excel)
7.4. Usage and configuration¶
Before starting to edit the files, you need to decide two important values:
benchmark_path, which is the location of your benchmark directory, i.e the directory where all the output files of this benchmarking tool will be stored.prefix, which is the prefix that will be common to all the files created by this tool.
7.4.1. Prepare the Jinja template¶
First of all, make sure the benchmark.jinja template is present in the templates directory of ABIN LAUNCHER. Then add the following line at the end of your job script template (which should be in the same directory):
{% include "benchmark.jinja" %}
Since that template requires some specific variables, add the following code to your rendering function after having defined your script_render_vars dictionary for your job script, but before calling the jinja_render function for that file:
script_render_vars.update({
"benchmark_path" : <value>,
"prefix": <value>,
"profile" : job_specs['profile'],
"cluster_name" : job_specs['cluster_name'],
"jobscale_label" : job_specs['scale_label'],
"job_walltime" : job_specs['walltime'],
"job_mem_per_cpu" : job_specs['mem_per_cpu'], # in MB
"scaling_function" : job_specs['scaling_fct'],
"scale_index" : job_specs['scale_index']
})
where benchmark_path is the path towards your benchmark directory and prefix the desired prefix for your benchmarking output files.
Now, at the end of your job, a new temporary CSV file will be created in your benchmark directory, named <prefix>_tmp.csv. If the file already exists, a new line will simply be added to it.
7.4.2. Configure the cron task¶
Use the crontab -e command in your terminal to edit your cron tasks and add the following line:
*/15 * * * * bash -l -c "/path/to/cron_benchmark.sh <prefix> <benchmark_path>" >> benchmark_path/<prefix>_crontab.log 2>&1
where
*/15 * * * *defines the frequency of execution of this command (at every 15th minutes). Feel free to adjust this value./path/to/cron_benchmark.shis the path towards the crontab script.<prefix>_crontab.logis a log file that will contain the output of the execution of this crontab script. This is just a suggested name, feel free to change it however you like.
Don’t forget to also make the cron_benchmark.sh script executable (chmod u+x) !
When executed, the cron_benchmark.sh script will look if there is a file named <prefix>_tmp.csv (the temporary CSV file) in your benchmark directory. If there is, the script will archive it into an archive subdirectory and rename it with the current date. It will then execute benchmark.py on that file.
Tip
If it is not loaded by default in your user profile configuration, remember to load your Python distribution so that the crontab script can execute benchmark.py. This can for example be done by adding instructions at the beginning of the crontab script, or by executing the loading command in the cron task, right before executing the crontab script.
7.4.3. The Python script¶
usage: benchmark.py -t TMP -f FINAL -p PROB [-h]
7.4.3.1. Required arguments¶
- -t, --tmp
Path towards the temporary CSV file that contains the lines you want to enrich.
- -f, --final
Path towards the final CSV file that will contain the enriched lines.
- -p, --prob
Path towards a separate CSV file that will contain the problematic lines.
7.4.3.2. How to use it?¶
You don’t need to edit the Python script, it will be automatically executed by the crontab script with the following arguments:
<benchmark_path>/<prefix>_tmp.csvfor the -t, --tmp argument<benchmark_path>/<prefix>_final.csvfor the -f, --final argument<benchmark_path>/<prefix>_prob.csvfor the -p, --prob argument
This will either create or update the final csv file, named <prefix>_final.csv and placed inside the benchmark directory. The log file of this Python execution, named <prefix>_<current_date>.log, can be found inside a bench_logs subdirectory created by the crontab script.
7.4.3.3. How to deal with problematic lines?¶
If some lines of the temporary CSV files were to cause any kind of problem, they will be stored inside a problematic CSV file, named <prefix>_prob.csv and placed inside the benchmark directory. After having consulted the log file and diagnosed the problem, you might then wish to rerun the Python script on this file. In this case, you can just either:
manually execute
benchmark.pyusing the path towards the problematic CSV file for the -t, --tmp argument (and something else for the -p, --prob one)rename
<prefix>_prob.csvinto<prefix>_tmp.csv*, then manually executecron_benchmark.sh(with<prefix>as a command line argument).
* Be careful to not erase a real temporary CSV file by doing so.
7.4.4. Dealing with multiple profiles¶
Unfortunately, this tool only works with one profile at a time. If you have multiple profiles you want to use with ABIN LAUNCHER (or even similar profiles on multiple clusters), you will have to configure the benchmarking tool for each of those profiles.
This implies that you need to:
Add the
{% include "benchmark.jinja" %}line to all job script templates.Add the Jinja variables definition (
script_render_vars.update) to all the rendering functions.Add the cron task command to the crontab of every cluster where the rendering functions will be executed.
Multiple profiles can share the same cron task if you don’t mind their lines being in the same CSV file. To prepare for this eventuality, the first column of the CSV file is the name of the profile, to help differentiate them.
7.5. Sample run¶
Let’s end this section with a sample run of the benchmarking tool. We will use our example from the previous section, with the three molecules (or geometry files) and the two configuration files.
Tip
Every file presented in this subsection can be downloaded here.
7.5.1. Preparation¶
Our benchmark_path will be /home/users/n/i/niacobel/abin_docs_sample/benchmark and our prefix will be sample_orca.
This is our starting directory structure:
abin_docs_sample/
└── abin_launcher/
├── benchmark.py
├── cron_benchmark.sh
├── abin_launcher.py
├── geom_scan.py
├── scaling_fcts.py
├── renderer.py
├── abin_errors.py
├── clusters.yml
├── mendeleev.yml
└── templates/
├── benchmark.jinja
├── sample_orca.inp.jinja
└── sample_orca_job.sh.jinja
└── molecules/
├── ch4.xyz
├── c2h6.xyz
└── c3h8.xyz
└── configs/
├── svp.yml
└── tzvp.yml
└── orca_jobs/
└── currently empty
Note that our benchmark directory does not exist yet, but this is not necessary since the instructions contained in the Jinja template can create it if needed.
7.5.1.1. The Jinja template¶
At the end of sample_orca_job.sh.jinja, we add the following line:
{% include "benchmark.jinja" %}
and in the sample_orca_render function of renderer.py, we add
script_render_vars.update({
"benchmark_path" : "/home/users/n/i/niacobel/abin_docs_sample/benchmark",
"prefix": "sample_orca",
"profile" : job_specs['profile'],
"cluster_name" : job_specs['cluster_name'],
"jobscale_label" : job_specs['scale_label'],
"job_walltime" : job_specs['walltime'],
"job_mem_per_cpu" : job_specs['mem_per_cpu'], # in MB
"scaling_function" : job_specs['scaling_fct'],
"scale_index" : job_specs['scale_index']
})
If you are not sure of where exactly you need to add this portion of code, you can click on the arrow below to see the complete source code of renderer.py.
################################################################################################################################################
## The Renderer ##
## ##
## This script contains the rendering functions for ABIN LAUNCHER, ##
## consult the documentation at https://chains-ulb.readthedocs.io/ for details ##
################################################################################################################################################
import os
from jinja2 import Environment, FileSystemLoader
import abin_errors
def jinja_render(templates_dir:str, template_file:str, render_vars:dict):
"""Renders a file based on its Jinja template.
Parameters
----------
templates_dir : str
The path towards the directory where the Jinja template is located.
template_file : str
The name of the Jinja template file.
render_vars : dict
Dictionary containing the definitions of all the variables present in the Jinja template.
Returns
-------
output_text : str
Content of the rendered file.
"""
file_loader = FileSystemLoader(templates_dir)
env = Environment(loader=file_loader)
template = env.get_template(template_file)
output_text = template.render(render_vars)
return output_text
# =================================================================== #
# =================================================================== #
# Rendering functions #
# =================================================================== #
# =================================================================== #
def sample_orca_render(mendeleev:dict, clusters_cfg:dict, config:dict, file_data:dict, job_specs:dict, misc:dict):
"""Renders the job script and the input file associated with the ORCA program.
Parameters
----------
mendeleev : dict
Content of AlexGustafsson's Mendeleev Table YAML file (found at https://github.com/AlexGustafsson/molecular-data).
Unused in this function.
clusters_cfg : dict
Content of the YAML clusters configuration file.
config : dict
Content of the YAML configuration file.
file_data : dict
Information extracted by the scanning function from the geometry file.
job_specs : dict
Contains all information related to the job.
misc : dict
Contains all the additional variables that did not pertain to the other arguments.
Returns
-------
rendered_content : dict
Dictionary containing the text of all the rendered files in the form of <filename>: <rendered_content>.
rendered_script : str
Name of the rendered job script, necessary to launch the job.
Notes
-----
Pay a particular attention to the render_vars dictionaries, they contain all the definitions of the variables appearing in your Jinja templates.
"""
# Define the names of the templates
template_input = "sample_orca.inp.jinja"
template_script = "sample_orca_job.sh.jinja"
# Define the names of the rendered files
rendered_input = misc['mol_name'] + ".inp"
rendered_script = "orca_job.sh"
# Initialize the dictionary that will be returned by the function
rendered_content = {}
# Render the template for the input file
print("{:<80}".format("\nRendering the jinja template for the orca input file ..."), end="")
input_render_vars = {
"method" : config['method'],
"basis_set" : config['basis_set'],
"job_type" : config['job_type'],
"charge" : config['charge'],
"multiplicity" : config['multiplicity'],
"coordinates" : file_data['atomic_coordinates']
}
rendered_content[rendered_input] = jinja_render(misc['templates_dir'], template_input, input_render_vars)
print('%12s' % "[ DONE ]")
# Render the template for the job script
print("{:<80}".format("\nRendering the jinja template for the orca job script ..."), end="")
script_render_vars = {
"mol_name" : misc['mol_name'],
"config_name" : misc['config_name'],
"user_email" : config['user_email'],
"mail_type" : config['mail_type'],
"job_walltime" : job_specs['walltime'],
"job_cores" : job_specs['cores'],
"job_mem_per_cpu" : job_specs['mem_per_cpu'],
"partition" : job_specs['partition'],
"set_env" : clusters_cfg[job_specs['cluster_name']]['profiles'][job_specs['profile']]['set_env'],
"command" : clusters_cfg[job_specs['cluster_name']]['profiles'][job_specs['profile']]['command'],
"profile" : job_specs['profile']
}
# Add variables specific to the benchmarking template
script_render_vars.update({
"benchmark_path" : "/home/users/n/i/niacobel/abin_docs_sample/benchmark",
"prefix": "orca_lemaitre3",
"profile" : job_specs['profile'],
"cluster_name" : job_specs['cluster_name'],
"jobscale_label" : job_specs['scale_label'],
"job_walltime" : job_specs['walltime'],
"job_mem_per_cpu" : job_specs['mem_per_cpu'], # in MB
"scaling_function" : job_specs['scaling_fct'],
"scale_index" : job_specs['scale_index']
})
rendered_content[rendered_script] = jinja_render(misc['templates_dir'], template_script, script_render_vars)
print('%12s' % "[ DONE ]")
# Return the content of the rendered files and the name of the rendered job script
return rendered_content, rendered_script
7.5.1.2. The cron task and the crontab script¶
Now we execute the crontab -e command in our terminal to edit our cron tasks and add the following line:
*/15 * * * * bash -l -c "/home/users/n/i/niacobel/abin_docs_sample/abin_launcher/cron_benchmark.sh sample_orca /home/users/n/i/niacobel/abin_docs_sample/benchmark" >> /home/users/n/i/niacobel/abin_docs_sample/benchmark/sample_orca_crontab.log 2>&1
Finally, we edit the beginning of our cron_benchmark.sh file to load our Python distribution:
####################################
# Script configuration #
####################################
# Load your Python distribution
module --force purge
module load releases/2018b
module load Python/3.6.6-foss-2018b
and we make sure it is executable by entering the following command in our terminal:
$ chmod u+x /home/users/n/i/niacobel/abin_docs_sample/abin_launcher/cron_benchmark.sh
Now our benchmarking tool is ready to run!
7.5.2. Execution¶
We just run ABIN LAUNCHER as normal, by executing the main script (from abin_docs_sample):
$ python abin_launcher/abin_launcher.py -m molecules/ -cf configs/ -p sample_orca -o orca_jobs/ -cl lemaitre3
We obtain the same results than before, with the six launched jobs.
As soon as each job finishes, the temporary CSV file will either be created or updated with a new line. After the six jobs have finished, this is what the raw file looks like:
Profile;Cluster;Jobscale;Partition;Cores;MB/CPU;Walltime;Job ID;Job Name;Scaling Function;Scale Index;Submit Date;Eligible Date;Start Date;End Date;Nodes;Nodes List
sample_orca;lemaitre3;tiny;batch;4;500;0-00:10:00;69232077;sample_orca_ch4_svp;total_nb_elec;10;2020-11-09 17:05:07;2020-11-09 17:05:07;2020-11-09 17:05:07;2020-11-09 17:05:21;1;lm3-w061
sample_orca;lemaitre3;tiny;batch;4;500;0-00:10:00;69232076;sample_orca_ch4_tzvp;total_nb_elec;10;2020-11-09 17:05:07;2020-11-09 17:05:07;2020-11-09 17:05:07;2020-11-09 17:05:24;1;lm3-w064
sample_orca;lemaitre3;tiny;batch;4;500;0-00:10:00;69232081;sample_orca_c2h6_svp;total_nb_elec;18;2020-11-09 17:05:09;2020-11-09 17:05:09;2020-11-09 17:05:10;2020-11-09 17:05:46;1;lm3-w063
sample_orca;lemaitre3;tiny;batch;4;500;0-00:10:00;69232079;sample_orca_c3h8_svp;total_nb_elec;26;2020-11-09 17:05:08;2020-11-09 17:05:08;2020-11-09 17:05:10;2020-11-09 17:05:52;1;lm3-w062
sample_orca;lemaitre3;tiny;batch;4;500;0-00:10:00;69232080;sample_orca_c2h6_tzvp;total_nb_elec;18;2020-11-09 17:05:08;2020-11-09 17:05:08;2020-11-09 17:05:10;2020-11-09 17:06:12;1;lm3-w062
sample_orca;lemaitre3;tiny;batch;4;500;0-00:10:00;69232078;sample_orca_c3h8_tzvp;total_nb_elec;26;2020-11-09 17:05:08;2020-11-09 17:05:08;2020-11-09 17:05:10;2020-11-09 17:07:37;1;lm3-w061
and in a more human-readable fashion:
Profile |
Cluster |
Jobscale |
Partition |
Cores |
MB/CPU |
Walltime |
Job ID |
Job Name |
Scaling Function |
Scale Index |
Submit Date |
Eligible Date |
Start Date |
End Date |
Nodes |
Nodes List |
sample_orca |
lemaitre3 |
tiny |
batch |
4 |
500 |
0-00:10:00 |
69232077 |
sample_orca_ch4_svp |
total_nb_elec |
10 |
2020-11-09 17:05:07 |
2020-11-09 17:05:07 |
2020-11-09 17:05:07 |
2020-11-09 17:05:21 |
1 |
lm3-w061 |
sample_orca |
lemaitre3 |
tiny |
batch |
4 |
500 |
0-00:10:00 |
69232076 |
sample_orca_ch4_tzvp |
total_nb_elec |
10 |
2020-11-09 17:05:07 |
2020-11-09 17:05:07 |
2020-11-09 17:05:07 |
2020-11-09 17:05:24 |
1 |
lm3-w064 |
sample_orca |
lemaitre3 |
tiny |
batch |
4 |
500 |
0-00:10:00 |
69232081 |
sample_orca_c2h6_svp |
total_nb_elec |
18 |
2020-11-09 17:05:09 |
2020-11-09 17:05:09 |
2020-11-09 17:05:10 |
2020-11-09 17:05:46 |
1 |
lm3-w063 |
sample_orca |
lemaitre3 |
tiny |
batch |
4 |
500 |
0-00:10:00 |
69232079 |
sample_orca_c3h8_svp |
total_nb_elec |
26 |
2020-11-09 17:05:08 |
2020-11-09 17:05:08 |
2020-11-09 17:05:10 |
2020-11-09 17:05:52 |
1 |
lm3-w062 |
sample_orca |
lemaitre3 |
tiny |
batch |
4 |
500 |
0-00:10:00 |
69232080 |
sample_orca_c2h6_tzvp |
total_nb_elec |
18 |
2020-11-09 17:05:08 |
2020-11-09 17:05:08 |
2020-11-09 17:05:10 |
2020-11-09 17:06:12 |
1 |
lm3-w062 |
sample_orca |
lemaitre3 |
tiny |
batch |
4 |
500 |
0-00:10:00 |
69232078 |
sample_orca_c3h8_tzvp |
total_nb_elec |
26 |
2020-11-09 17:05:08 |
2020-11-09 17:05:08 |
2020-11-09 17:05:10 |
2020-11-09 17:07:37 |
1 |
lm3-w061 |
As you can see, a different line has been written for each of our jobs, containing the different data that have been collected so far.
After at most 15 minutes, the crontab task executes the crontab script, which archives the temporary CSV file and then runs the Python script on that file to either create or update the final CSV file. The directory structure now looks like:
abin_docs_sample/
└── abin_launcher/
└── no changes
└── benchmark/
├── sample_orca_crontab.log
├── sample_orca_final.csv
└── archive/
└── sample_orca_tmp_20201109_171504.csv
└── bench_logs/
└── sample_orca_20201109_171504.log
└── molecules/
└── launched/
└── configs/
└── launched/
└── orca_jobs/
└── ch4_svp/
└── ch4_tzvp/
└── c2h6_svp/
└── c2h6_tzvp/
└── c3h8_svp/
└── c3h8_tzvp/
where our final CSV file, sample_orca_final.csv, contains:
Profile |
Cluster |
Jobscale |
Partition |
Cores |
MB/CPU |
Walltime |
Job ID |
Job Name |
Scaling Function |
Scale Index |
Submit Date |
Eligible Date |
Start Date |
End Date |
Nodes |
Nodes List |
Reserved |
Elapsed |
Time Efficiency |
Max RSS (MB) |
RAM Efficiency |
Total CPU |
Wall CPU |
CPU Efficiency |
sample_orca |
lemaitre3 |
tiny |
batch |
4 |
500 |
0-00:10:00 |
69232077 |
sample_orca_ch4_svp |
total_nb_elec |
10 |
2020-11-09 17:05:07 |
2020-11-09 17:05:07 |
2020-11-09 17:05:07 |
2020-11-09 17:05:21 |
1 |
lm3-w061 |
00:00:00 |
00:00:14 |
0.0233 |
1 |
0.0005 |
00:08.479 |
00:00:56 |
0.1429 |
sample_orca |
lemaitre3 |
tiny |
batch |
4 |
500 |
0-00:10:00 |
69232076 |
sample_orca_ch4_tzvp |
total_nb_elec |
10 |
2020-11-09 17:05:07 |
2020-11-09 17:05:07 |
2020-11-09 17:05:07 |
2020-11-09 17:05:24 |
1 |
lm3-w064 |
00:00:00 |
00:00:17 |
0.0283 |
1 |
0.0005 |
00:11.338 |
00:01:08 |
0.1618 |
sample_orca |
lemaitre3 |
tiny |
batch |
4 |
500 |
0-00:10:00 |
69232081 |
sample_orca_c2h6_svp |
total_nb_elec |
18 |
2020-11-09 17:05:09 |
2020-11-09 17:05:09 |
2020-11-09 17:05:10 |
2020-11-09 17:05:46 |
1 |
lm3-w063 |
00:00:01 |
00:00:36 |
0.06 |
235 |
0.1175 |
00:22.476 |
00:02:24 |
0.1528 |
sample_orca |
lemaitre3 |
tiny |
batch |
4 |
500 |
0-00:10:00 |
69232079 |
sample_orca_c3h8_svp |
total_nb_elec |
26 |
2020-11-09 17:05:08 |
2020-11-09 17:05:08 |
2020-11-09 17:05:10 |
2020-11-09 17:05:52 |
1 |
lm3-w062 |
00:00:02 |
00:00:42 |
0.07 |
79 |
0.0395 |
00:36.502 |
00:02:48 |
0.2143 |
sample_orca |
lemaitre3 |
tiny |
batch |
4 |
500 |
0-00:10:00 |
69232080 |
sample_orca_c2h6_tzvp |
total_nb_elec |
18 |
2020-11-09 17:05:08 |
2020-11-09 17:05:08 |
2020-11-09 17:05:10 |
2020-11-09 17:06:12 |
1 |
lm3-w062 |
00:00:02 |
00:01:02 |
0.1033 |
243 |
0.1215 |
00:55.619 |
00:04:08 |
0.2218 |
sample_orca |
lemaitre3 |
tiny |
batch |
4 |
500 |
0-00:10:00 |
69232078 |
sample_orca_c3h8_tzvp |
total_nb_elec |
26 |
2020-11-09 17:05:08 |
2020-11-09 17:05:08 |
2020-11-09 17:05:10 |
2020-11-09 17:07:37 |
1 |
lm3-w061 |
00:00:02 |
00:02:27 |
0.245 |
234 |
0.117 |
02:21.569 |
00:09:48 |
0.2398 |
Once loaded into Microsoft Excel, we can then get a nice view of every important data about our jobs:
Excel view of the final CSV file sample_orca_final.csv (click to zoom in)¶
Here we can see that our job scale definition does not look that good with all those low efficiency percentages. However, those were really small jobs made purely for illustration purposes and they do not have to be taken too seriously. For real, bigger jobs though, you should aim for higher efficiencies.
Note
The temporary CSV file has been archived into the archive directory, as sample_orca_tmp_20201109_171504.csv. This copy is kept purely as backup for the data. Once you’ve make sure those jobs have been correctly benchmarked, you can remove the copy.
7.5.3. Content of the log files¶
As you can see in the directory structure above, the benchmarking tool creates two log files:
The log file from the crontab script, which contains a line for each time a temporary CSV file has been processed. At this point, its content is simply:
sample_orca_crontab.log¶2020-11-09 17:15:09 INFO - Processed new lines in /home/users/n/i/niacobel/abin_docs_sample/benchmark/sample_orca_tmp.csv
The log file from the Python script,
sample_orca_20201109_171504.log, placed inside thebench_logsdirectory. Click on the arrow below to see its content.sample_orca_20201109_171504.log¶******************************************************************************** EXECUTION OF THE BENCHMARKING SCRIPT FOR SLURM CLUSTERS JOBS BEGINS NOW ******************************************************************************** ***************************** 0. Preparation step ***************************** Scanning tmp file /home/users/n/i/niacobel/abin_docs_sample/benchmark/archive/orca_lemaitre3_tmp_20201109_171504.csv ... Detected CSV dialect in tmp file: excel Detected CSV header in tmp file : ['Profile', 'Cluster', 'Jobscale', 'Partition', 'Cores', 'MB/CPU', 'Walltime', 'Job ID', 'Job Name', 'Scaling Function', 'Scale Index', 'Submit Date', 'Eligible Date', 'Start Date', 'End Date', 'Nodes', 'Nodes List'] ************************************ 1. Get benchmarking values ************************************ Processing lines ... ------------------------------------------------------------ Job Name: orca_ch4_svp Job ID: 69232077 ------------------------------------------------------------ Reserved: 00:00:00 Elapsed: 00:00:14 Walltime: 00:10:00 Time Efficiency: 2% ------------------------------------------------------------ MaxRSS: 1 MB Total MEM: 2000 MB (500 MB for each of 4 CPUs) RAM Efficiency: 0% ------------------------------------------------------------ TotalCPU: 00:08.479 Wall CPU: 00:00:56 CPU Efficiency: 14% ------------------------------------------------------------ ------------------------------------------------------------ Job Name: orca_ch4_tzvp Job ID: 69232076 ------------------------------------------------------------ Reserved: 00:00:00 Elapsed: 00:00:17 Walltime: 00:10:00 Time Efficiency: 3% ------------------------------------------------------------ MaxRSS: 1 MB Total MEM: 2000 MB (500 MB for each of 4 CPUs) RAM Efficiency: 0% ------------------------------------------------------------ TotalCPU: 00:11.338 Wall CPU: 00:01:08 CPU Efficiency: 16% ------------------------------------------------------------ ------------------------------------------------------------ Job Name: orca_c2h6_svp Job ID: 69232081 ------------------------------------------------------------ Reserved: 00:00:01 Elapsed: 00:00:36 Walltime: 00:10:00 Time Efficiency: 6% ------------------------------------------------------------ MaxRSS: 235 MB Total MEM: 2000 MB (500 MB for each of 4 CPUs) RAM Efficiency: 12% ------------------------------------------------------------ TotalCPU: 00:22.476 Wall CPU: 00:02:24 CPU Efficiency: 15% ------------------------------------------------------------ ------------------------------------------------------------ Job Name: orca_c3h8_svp Job ID: 69232079 ------------------------------------------------------------ Reserved: 00:00:02 Elapsed: 00:00:42 Walltime: 00:10:00 Time Efficiency: 7% ------------------------------------------------------------ MaxRSS: 79 MB Total MEM: 2000 MB (500 MB for each of 4 CPUs) RAM Efficiency: 4% ------------------------------------------------------------ TotalCPU: 00:36.502 Wall CPU: 00:02:48 CPU Efficiency: 21% ------------------------------------------------------------ ------------------------------------------------------------ Job Name: orca_c2h6_tzvp Job ID: 69232080 ------------------------------------------------------------ Reserved: 00:00:02 Elapsed: 00:01:02 Walltime: 00:10:00 Time Efficiency: 10% ------------------------------------------------------------ MaxRSS: 243 MB Total MEM: 2000 MB (500 MB for each of 4 CPUs) RAM Efficiency: 12% ------------------------------------------------------------ TotalCPU: 00:55.619 Wall CPU: 00:04:08 CPU Efficiency: 22% ------------------------------------------------------------ ------------------------------------------------------------ Job Name: orca_c3h8_tzvp Job ID: 69232078 ------------------------------------------------------------ Reserved: 00:00:02 Elapsed: 00:02:27 Walltime: 00:10:00 Time Efficiency: 24% ------------------------------------------------------------ MaxRSS: 234 MB Total MEM: 2000 MB (500 MB for each of 4 CPUs) RAM Efficiency: 12% ------------------------------------------------------------ TotalCPU: 02:21.569 Wall CPU: 00:09:48 CPU Efficiency: 24% ------------------------------------------------------------ End of processing ****************************************************** 2. Writing new information to final CSV file ****************************************************** Used dialect in the final CSV file: excel Header used in final CSV file: ['Profile', 'Cluster', 'Jobscale', 'Partition', 'Cores', 'MB/CPU', 'Walltime', 'Job ID', 'Job Name', 'Scaling Function', 'Scale Index', 'Submit Date', 'Eligible Date', 'Start Date', 'End Date', 'Nodes', 'Nodes List', 'Reserved', 'Elapsed', 'Time Efficiency', 'Max RSS (MB)', 'RAM Efficiency', 'Total CPU', 'Wall CPU', 'CPU Efficiency'] Writing newly processed lines to the final file /home/users/n/i/niacobel/abin_docs_sample/benchmark/orca_lemaitre3_final.csv ... [DONE] ******************************************************************************** END OF EXECUTION ********************************************************************************