<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://www.uthpalaherath.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.uthpalaherath.com/" rel="alternate" type="text/html" /><updated>2026-02-27T04:12:15+00:00</updated><id>https://www.uthpalaherath.com/feed.xml</id><title type="html">Uthpala Herath</title><subtitle>Uthpala Herath&apos;s digital corner on the web.</subtitle><author><name>Uthpala Herath</name></author><entry><title type="html">Running LLMs on a HPC cluster</title><link href="https://www.uthpalaherath.com/Running-LLMs-on-a-HPC-cluster/" rel="alternate" type="text/html" title="Running LLMs on a HPC cluster" /><published>2025-12-17T00:00:00+00:00</published><updated>2025-12-17T00:00:00+00:00</updated><id>https://www.uthpalaherath.com/Running-LLMs-on-a-HPC-cluster</id><content type="html" xml:base="https://www.uthpalaherath.com/Running-LLMs-on-a-HPC-cluster/"><![CDATA[<p class="notice--primary"><em>This is a guide on setting up an Ollama server to run LLM models on a HPC cluster with GPUs. The Ollama server is hosted on a GPU node, while the inferencing can be done from any other node on the cluster.</em></p>

<p><img src="/assets/media/2025-12-17-Running-LLMs-on-a-HPC-cluster/2025-12-17-Running-LLMs-on-a-HPC-cluster-20251217211525532.jpg" alt="2025-12-17-Running-LLMs-on-a-HPC-cluster-20251217211525532" /></p>

<p>Large Language Models (LLMs) have become ever so popular, especially with broader access to computational resources such as HPC clusters and GPUs along with tools like <a href="https://ollama.com">Ollama</a> that allow users to access open-source LLM models. In this guide, I will walk you through how to set up an host Ollama server on a GPU node on a HPC cluster which can be used for inference from other nodes. This post was inspired by an <a href="https://medium.com/@afifaniks/running-ollama-with-apptainer-a-step-by-step-guide-to-local-llms-with-gpu-support-on-hpc-3fe98c8af2c8">article</a> by <a href="https://medium.com/@afifaniks/about">Afif</a> so kudos to him for laying down the ground work.</p>

<p>The setup discussed here was tested on Duke University’s <a href="https://oit-rc.pages.oit.duke.edu/rcsupportdocs/">DCC</a> cluster and the <a href="https://ncshare.org">NCShare</a> cluster, both consisting of NVIDIA <a href="https://www.nvidia.com/en-us/data-center/h200/">H200</a> GPU nodes (DCC has a variety of other GPU models as well, but we will only talk about H200s since they are the most powerful).</p>

<h1 id="initial-setup">Initial setup</h1>

<p>1. Usually, on HPC clusters, you won’t have root privileges to install applications, so we are going to first build a simple <a href="https://apptainer.org">Apptainer</a> container with Ollama in it. If you already have <code class="language-plaintext highlighter-rouge">ollama</code>, you may skip this setup. Create the <code class="language-plaintext highlighter-rouge">ollama.def</code> file shown below in some working directory on your cluster. For the purpose of this guide, my working directory where I keep all the files and models mentioned is <code class="language-plaintext highlighter-rouge">/work/ukh/ollama</code>.</p>

<p><code class="language-plaintext highlighter-rouge">ollama.def:</code></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Bootstrap: docker
From: ollama/ollama:latest
</code></pre></div></div>

<p>Then build the container with,</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">APPTAINER_CACHEDIR</span><span class="o">=</span>/work/<span class="k">${</span><span class="nv">USER</span><span class="k">}</span>/tmp
<span class="nb">export </span><span class="nv">APPTAINER_TMPDIR</span><span class="o">=</span>/work/<span class="k">${</span><span class="nv">USER</span><span class="k">}</span>/tmp

apptainer build ollama.sif ollama.def
</code></pre></div></div>

<p>If successful, the Ollama container, <code class="language-plaintext highlighter-rouge">ollama.sif</code> will be built in the same directory. The reason we changed the Apptainer tmp and cache directories was because the default <code class="language-plaintext highlighter-rouge">/tmp</code> sometimes fills up leading to build failures.</p>

<p>You may download an <code class="language-plaintext highlighter-rouge">ollama</code> binary through <a href="https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html">conda</a> with (I assume you have created some conda environment for the <code class="language-plaintext highlighter-rouge">ollama</code> procedure) ,</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>conda <span class="nb">install</span> <span class="nt">-c</span> conda-forge ollama
</code></pre></div></div>

<p>however, I noticed that this does not utilize the GPUs when used as the server like the original <code class="language-plaintext highlighter-rouge">ollama</code> application does, but it can be used as the client for inference. So install it anyways.</p>

<p>2. We will also use the <code class="language-plaintext highlighter-rouge">ollama</code> API to run some Python code, so go ahead and install the Python libraries,</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span>ollama
</code></pre></div></div>

<h1 id="starting-ollama-server-on-the-gpu-node">Starting Ollama server on the GPU node</h1>

<p>We will host the Ollama server on a H200 GPU node and run inference from other nodes on the cluster.</p>

<p>1. Create a bash script, <code class="language-plaintext highlighter-rouge">ollama_server_apptainer.sh</code> and modify it to suit your environment,</p>

<p><code class="language-plaintext highlighter-rouge">ollama_server_apptainer.sh:</code></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>

<span class="c"># Configuration</span>
<span class="nv">CONTAINER_IMAGE</span><span class="o">=</span><span class="s2">"/work/ukh/ollama/ollama.sif"</span>
<span class="nv">INSTANCE_NAME</span><span class="o">=</span><span class="s2">"ollama-</span><span class="nv">$USER</span><span class="s2">"</span>
<span class="nv">MODEL_PATH</span><span class="o">=</span><span class="s2">"/work/ukh/ollama/models"</span>
<span class="nv">PORT</span><span class="o">=</span>11434

<span class="c"># Unset variables to avoid conflicts</span>
<span class="nb">unset </span>ROCR_VISIBLE_DEVICES

<span class="c"># Start Apptainer instance with GPU and writable tempfs</span>
apptainer instance start <span class="se">\</span>
  <span class="nt">--nv</span> <span class="se">\</span>
  <span class="nt">--writable-tmpfs</span> <span class="se">\</span>
  <span class="nt">--bind</span> <span class="s2">"</span><span class="nv">$MODEL_PATH</span><span class="s2">"</span> <span class="se">\</span>
  <span class="s2">"</span><span class="nv">$CONTAINER_IMAGE</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$INSTANCE_NAME</span><span class="s2">"</span>

<span class="c"># Start Ollama serve inside the container in the background</span>
apptainer <span class="nb">exec</span> <span class="se">\</span>
  <span class="nt">--env</span> <span class="nv">OLLAMA_MODELS</span><span class="o">=</span><span class="s2">"</span><span class="nv">$MODEL_PATH</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--env</span> <span class="nv">OLLAMA_HOST</span><span class="o">=</span><span class="s2">"0.0.0.0:</span><span class="nv">$PORT</span><span class="s2">"</span> <span class="se">\</span>
  instance://<span class="nv">$INSTANCE_NAME</span> <span class="se">\</span>
  ollama serve &amp;

<span class="nb">echo</span> <span class="s2">"🦙 Ollama is now serving at http://</span><span class="si">$(</span><span class="nb">hostname</span> <span class="nt">-f</span><span class="si">)</span><span class="s2">:</span><span class="nv">$PORT</span><span class="s2">"</span>
</code></pre></div></div>

<p>This will use the Apptainer container, <code class="language-plaintext highlighter-rouge">ollama.sif</code>, we built earlier to host a server in the background that looks for models stored in <code class="language-plaintext highlighter-rouge">/work/ukh/ollama/models</code>. It is important to host on <code class="language-plaintext highlighter-rouge">0.0.0.0</code> so that the server listens on all available network interfaces and not just <code class="language-plaintext highlighter-rouge">localhost</code>.  Request an interactive session or submit a <a href="https://slurm.schedmd.com">SLURM</a> job script to request a GPU node. I will request an interactive session on a node with 1 H200 GPU, 300 GB of RAM, for 2 hours with,</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>srun <span class="nt">-p</span> h200ea <span class="nt">-A</span> h200ea <span class="nt">--gres</span><span class="o">=</span>gpu:h200:1 <span class="nt">--mem</span><span class="o">=</span>300G <span class="nt">-t</span> 2:00:00 <span class="nt">--pty</span> bash <span class="nt">-i</span>
</code></pre></div></div>

<p>Once you are in the GPU node, start the Ollama server by running the bash script, <code class="language-plaintext highlighter-rouge">ollama_server_apptainer.sh</code>,</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">chmod</span> +x ollama_server_apptainer.sh
./ollama_server_apptainer.sh
</code></pre></div></div>

<p>You will see something like the following,</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">(</span>ai<span class="o">)</span> ukh at dcc-h200-gpu-06 <span class="k">in</span> /work/ukh/ollama
<span class="nv">$ </span>./ollama_server_apptainer.sh 
INFO:    Instance stats will not be available - requires cgroups v2 with systemd as manager.
INFO:    instance started successfully
🦙 Ollama is now serving at http://dcc-h200-gpu-06.rc.duke.edu:11434
</code></pre></div></div>

<p>It is important to note the host: <code class="language-plaintext highlighter-rouge">http://dcc-h200-gpu-06.rc.duke.edu</code> and the port: <code class="language-plaintext highlighter-rouge">11434</code> the server is broadcasting on as we will need this information for the inference. If you already have an <code class="language-plaintext highlighter-rouge">ollama</code> setup in your environment that doesn’t require using Apptainer, you could use the following script, <code class="language-plaintext highlighter-rouge">ollama_server.sh</code>, in place of <code class="language-plaintext highlighter-rouge">ollama_server_apptainer.sh</code>.</p>

<p><code class="language-plaintext highlighter-rouge">ollama_server.sh:</code></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>

<span class="c"># Configuration</span>
<span class="nv">MODEL_PATH</span><span class="o">=</span><span class="s2">"/work/ukh/ollama/models"</span>
<span class="nv">PORT</span><span class="o">=</span>11434

<span class="c"># Unset variables to avoid conflicts</span>
<span class="nb">unset </span>ROCR_VISIBLE_DEVICES

<span class="c"># Model path</span>
<span class="nb">export </span><span class="nv">OLLAMA_MODELS</span><span class="o">=</span><span class="s2">"</span><span class="nv">$MODEL_PATH</span><span class="s2">"</span>

<span class="c"># Bind to all interfaces on that node, on port 11434</span>
<span class="nb">export </span><span class="nv">OLLAMA_HOST</span><span class="o">=</span><span class="s2">"0.0.0.0:</span><span class="nv">$PORT</span><span class="s2">"</span>

<span class="c"># Start Ollama server in the background</span>
ollama serve &amp;

<span class="nb">echo</span> <span class="s2">"🦙 Ollama is now serving at http://</span><span class="si">$(</span><span class="nb">hostname</span> <span class="nt">-f</span><span class="si">)</span><span class="s2">:</span><span class="nv">$PORT</span><span class="s2">"</span>
</code></pre></div></div>

<p>Now that we have the Ollama server hosted, let’s see how we can use that to run inference using LLM models.</p>

<h1 id="running-inference-sessions">Running inference sessions</h1>

<p>We could run the inference session either by using the <code class="language-plaintext highlighter-rouge">ollama</code> application directly, or through the Python API. We will discuss both methods below.</p>

<h2 id="1-using-the-ollama-application">1. Using the Ollama application</h2>

<p>The inferencing can be done either through the <code class="language-plaintext highlighter-rouge">ollama</code> application from the Apptainer container or the <code class="language-plaintext highlighter-rouge">ollama</code> binary from the <code class="language-plaintext highlighter-rouge">conda</code> setup.
If you are using the <code class="language-plaintext highlighter-rouge">ollama</code> instance installed form <code class="language-plaintext highlighter-rouge">conda</code>, from any node on the cluster, you can run,</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">OLLAMA_HOST</span><span class="o">=</span><span class="s2">"http://dcc-h200-gpu-06.rc.duke.edu:11434"</span>
<span class="nb">export </span><span class="nv">OLLAMA_MODELS</span><span class="o">=</span>/work/ukh/ollama/models
ollama run llama4:scout
</code></pre></div></div>

<p>Here we download and run the LLM model, <a href="https://ollama.com/library/llama4:scout">llama4:scout</a>, which has 109 billion parameters. With the <code class="language-plaintext highlighter-rouge">ollama</code> application, it starts a chat session once the download is complete. Chat away,  my friend …</p>

<p>You could use the <code class="language-plaintext highlighter-rouge">ollama</code> instance from the Apptainer container as well to achieve this with something like the following,</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">OLLAMA_HOST</span><span class="o">=</span><span class="s2">"http://dcc-h200-gpu-06.rc.duke.edu:11434"</span>
<span class="nb">export </span><span class="nv">OLLAMA_MODELS</span><span class="o">=</span>/work/ukh/ollama/models
apptainer <span class="nb">exec</span> <span class="nt">--env</span> <span class="nv">OLLAMA_HOST</span><span class="o">=</span><span class="s2">"</span><span class="nv">$OLLAMA_HOST</span><span class="s2">"</span>,OLAMA_MODELS<span class="o">=</span><span class="s2">"</span><span class="nv">$OLLAMA_MODELS</span><span class="s2">"</span> ollama.sif ollama run llama4:scout
</code></pre></div></div>

<h2 id="2-using-the-python-api">2. Using the Python API</h2>

<p>This is more suited if you want to incorporate Python code in some workflow. For a Python script with a fixed prompt you can use the following script,</p>

<p><code class="language-plaintext highlighter-rouge">ollama_client.py:</code></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env python
</span>
<span class="kn">from</span> <span class="nn">ollama</span> <span class="kn">import</span> <span class="n">Client</span>

<span class="n">HOST</span> <span class="o">=</span> <span class="s">"http://dcc-h200-gpu-06.rc.duke.edu:11434"</span>
<span class="n">MODEL</span> <span class="o">=</span> <span class="s">"llama4:scout"</span>
<span class="n">PROMPT</span> <span class="o">=</span> <span class="s">"Write a Python code that calculates the Fibonacci sequence up to 15."</span>

<span class="n">client</span> <span class="o">=</span> <span class="n">Client</span><span class="p">(</span><span class="n">host</span><span class="o">=</span><span class="n">HOST</span><span class="p">)</span>

<span class="c1"># Check if model already exists
</span><span class="n">resp</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="nb">list</span><span class="p">()</span>
<span class="n">models</span> <span class="o">=</span> <span class="n">resp</span><span class="p">.</span><span class="n">models</span>
<span class="n">model_names</span> <span class="o">=</span> <span class="p">[</span><span class="n">m</span><span class="p">.</span><span class="n">model</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">models</span><span class="p">]</span>

<span class="k">if</span> <span class="n">MODEL</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">model_names</span><span class="p">:</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Pulling model '</span><span class="si">{</span><span class="n">MODEL</span><span class="si">}</span><span class="s">'..."</span><span class="p">)</span>
        <span class="n">client</span><span class="p">.</span><span class="n">pull</span><span class="p">(</span><span class="n">model</span><span class="o">=</span><span class="n">MODEL</span><span class="p">)</span>
    <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Could not pull model: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>

<span class="c1"># Stream output token by token
</span><span class="k">for</span> <span class="n">chunk</span> <span class="ow">in</span> <span class="n">client</span><span class="p">.</span><span class="n">chat</span><span class="p">(</span>
    <span class="n">model</span><span class="o">=</span><span class="n">MODEL</span><span class="p">,</span>
    <span class="n">messages</span><span class="o">=</span><span class="p">[{</span><span class="s">"role"</span><span class="p">:</span> <span class="s">"user"</span><span class="p">,</span> <span class="s">"content"</span><span class="p">:</span> <span class="n">PROMPT</span><span class="p">}],</span>
    <span class="n">stream</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="p">):</span>
    <span class="k">print</span><span class="p">(</span><span class="n">chunk</span><span class="p">.</span><span class="n">message</span><span class="p">.</span><span class="n">content</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s">""</span><span class="p">,</span> <span class="n">flush</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">)</span>
</code></pre></div></div>

<p>Run it with,</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> ./ollama_client.py
</code></pre></div></div>

<p>If you want to use the Python API as a chat client, you may use the following code instead,</p>

<p><code class="language-plaintext highlighter-rouge">ollama_client_chat.py:</code></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env python                                                             
</span>                                         
<span class="kn">from</span> <span class="nn">ollama</span> <span class="kn">import</span> <span class="n">Client</span>       
                                         
<span class="n">HOST</span> <span class="o">=</span> <span class="s">"http://dcc-h200-gpu-06.rc.duke.edu:11434"</span>                                             
<span class="n">MODEL</span> <span class="o">=</span> <span class="s">"llama4:scout"</span>                 
                                                                                  
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>                                                                       
    <span class="n">client</span> <span class="o">=</span> <span class="n">Client</span><span class="p">(</span><span class="n">host</span><span class="o">=</span><span class="n">HOST</span><span class="p">)</span>           
                                                                                  
    <span class="c1"># Check if model already exists                                               
</span>    <span class="n">resp</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="nb">list</span><span class="p">()</span>                                                          
    <span class="n">models</span> <span class="o">=</span> <span class="n">resp</span><span class="p">.</span><span class="n">models</span>     
    <span class="n">model_names</span> <span class="o">=</span> <span class="p">[</span><span class="n">m</span><span class="p">.</span><span class="n">model</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">models</span><span class="p">]</span>                           
                                         
    <span class="k">if</span> <span class="n">MODEL</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">model_names</span><span class="p">:</span>
        <span class="k">try</span><span class="p">:</span>                                                                      
            <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Pulling model '</span><span class="si">{</span><span class="n">MODEL</span><span class="si">}</span><span class="s">'..."</span><span class="p">)</span>     
            <span class="n">client</span><span class="p">.</span><span class="n">pull</span><span class="p">(</span><span class="n">model</span><span class="o">=</span><span class="n">MODEL</span><span class="p">)</span>                                              
        <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>           
            <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Could not pull model: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
                                         
    <span class="c1"># Start the conversation with an optional system prompt
</span>    <span class="n">messages</span> <span class="o">=</span> <span class="p">[</span>                                                                  
        <span class="p">{</span>                                                                         
            <span class="s">"role"</span><span class="p">:</span> <span class="s">"system"</span><span class="p">,</span>
            <span class="s">"content"</span><span class="p">:</span> <span class="s">"You are a helpful assistant for an HPC user."</span><span class="p">,</span>
        <span class="p">}</span>                                                                         
    <span class="p">]</span>                                                                             
                                                                                  
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Connected to </span><span class="si">{</span><span class="n">HOST</span><span class="si">}</span><span class="s"> using model </span><span class="si">{</span><span class="n">MODEL</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"Type 'exit' or 'quit' to leave.</span><span class="se">\n</span><span class="s">"</span><span class="p">)</span>
    
    <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>                                                                   
        <span class="k">try</span><span class="p">:</span>                                                                      
            <span class="n">user</span> <span class="o">=</span> <span class="nb">input</span><span class="p">(</span><span class="s">"You: "</span><span class="p">).</span><span class="n">strip</span><span class="p">()</span> 
        <span class="k">except</span> <span class="p">(</span><span class="nb">EOFError</span><span class="p">,</span> <span class="nb">KeyboardInterrupt</span><span class="p">):</span>
            <span class="k">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">Bye!"</span><span class="p">)</span>
            <span class="k">break</span>

        <span class="k">if</span> <span class="ow">not</span> <span class="n">user</span><span class="p">:</span>
            <span class="k">continue</span>
        <span class="k">if</span> <span class="n">user</span><span class="p">.</span><span class="n">lower</span><span class="p">()</span> <span class="ow">in</span> <span class="p">{</span><span class="s">"exit"</span><span class="p">,</span> <span class="s">"quit"</span><span class="p">}:</span>
            <span class="k">print</span><span class="p">(</span><span class="s">"Bye!"</span><span class="p">)</span>
            <span class="k">break</span>

        <span class="c1"># Add user message to the conversation
</span>        <span class="n">messages</span><span class="p">.</span><span class="n">append</span><span class="p">({</span><span class="s">"role"</span><span class="p">:</span> <span class="s">"user"</span><span class="p">,</span> <span class="s">"content"</span><span class="p">:</span> <span class="n">user</span><span class="p">})</span>

        <span class="k">print</span><span class="p">(</span><span class="s">"Model:"</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s">" "</span><span class="p">,</span> <span class="n">flush</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
        <span class="n">assistant_text</span> <span class="o">=</span> <span class="s">""</span>

        <span class="c1"># Stream the response token by token
</span>        <span class="k">for</span> <span class="n">chunk</span> <span class="ow">in</span> <span class="n">client</span><span class="p">.</span><span class="n">chat</span><span class="p">(</span>
            <span class="n">model</span><span class="o">=</span><span class="n">MODEL</span><span class="p">,</span>
            <span class="n">messages</span><span class="o">=</span><span class="n">messages</span><span class="p">,</span>
            <span class="n">stream</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
        <span class="p">):</span>
            <span class="k">if</span> <span class="n">chunk</span><span class="p">.</span><span class="n">message</span><span class="p">.</span><span class="n">content</span><span class="p">:</span>
                <span class="k">print</span><span class="p">(</span><span class="n">chunk</span><span class="p">.</span><span class="n">message</span><span class="p">.</span><span class="n">content</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s">""</span><span class="p">,</span> <span class="n">flush</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
                <span class="n">assistant_text</span> <span class="o">+=</span> <span class="n">chunk</span><span class="p">.</span><span class="n">message</span><span class="p">.</span><span class="n">content</span>
        <span class="k">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">)</span>

        <span class="c1"># Add assistant reply to history so the model remembers context
</span>        <span class="n">messages</span><span class="p">.</span><span class="n">append</span><span class="p">({</span><span class="s">"role"</span><span class="p">:</span> <span class="s">"assistant"</span><span class="p">,</span> <span class="s">"content"</span><span class="p">:</span> <span class="n">assistant_text</span><span class="p">})</span>

<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
    <span class="n">main</span><span class="p">()</span>
</code></pre></div></div>

<p>Feel free to play around with different LLM models you can find online, for example, at https://github.com/ollama/ollama.</p>

<p>Once you are done with your LLM session, stop the server we started on the GPU node with,</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>apptainer instance stop ollama-<span class="nv">$USER</span>
</code></pre></div></div>]]></content><author><name>Uthpala Herath</name></author><category term="ai" /><category term="hpc" /><category term="gpu" /><summary type="html"><![CDATA[This is a guide on setting up an Ollama server to run LLM models on a HPC cluster with GPUs. The Ollama server is hosted on a GPU node, while the inferencing can be done from any other node on the cluster.]]></summary></entry><entry><title type="html">Our latest Nature publication unveiling a new mechanism that explains superfluorescence at high temperatures</title><link href="https://www.uthpalaherath.com/Our-latest-Nature-publication-unveiling-a-new-mechanism-explaining-superfluorescence-at-high-temperatures/" rel="alternate" type="text/html" title="Our latest Nature publication unveiling a new mechanism that explains superfluorescence at high temperatures" /><published>2025-05-30T00:00:00+00:00</published><updated>2025-05-30T00:00:00+00:00</updated><id>https://www.uthpalaherath.com/Our-latest-Nature-publication-unveiling-a-new-mechanism-explaining-superfluorescence-at-high-temperatures</id><content type="html" xml:base="https://www.uthpalaherath.com/Our-latest-Nature-publication-unveiling-a-new-mechanism-explaining-superfluorescence-at-high-temperatures/"><![CDATA[<p class="notice--primary"><em>Publication announcement: our latest publication in Nature highlights our collaborative research that led to the discovery of a new mechanism explaining superfluorescence occurring at high temperatures.</em></p>

<p><img src="/assets/media/2025-05-30-Our-latest-Nature-publication-unveiling-a-new-mechanism-explaining-superfluorescence-at-high-temperatures/2025-05-30-Our-latest-Nature-publication-unveiling-a-new-mechanism-explaining-superfluorescence-at-high-temperatures-20250530210324672.png" alt="2025-05-30-Our-latest-Nature-publication-unveiling-a-new-mechanism-explaining-superfluorescence-at-high-temperatures-20250530210324672" /><em>Perovskites show superfluorescence-quantum coherence from solitons-emerging at high temperatures. Image: Ella Maru Studios</em></p>

<p>Our latest paper was just published in <a href="https://www.nature.com/articles/s41586-025-09030-x">Nature</a>, marking a major milestone for an international collaboration led by North Carolina State University and including colleagues at Duke University, Boston University, and Institut Polytechnique de Paris. The article titled, <em>“Unconventional solitonic high-temperature superfluorescence from perovskites”</em>, reveals how a self-organized exotic soliton state allows superfluorescence to survive well above cryogenic temperatures, opening realistic design routes for quantum materials that can operate in everyday environments.</p>

<p>Macroscopic exotic quantum phenomena such as superconductivity, superfluidity and superradiance usually collapse once thermal noise sets in. In our work we show that, once a critical density of polarons (quasiparticles formed by charge carriers coupled to the lattice) is excited, they self-organize into solitons: large, coherent groupings that effectively shield the system from thermal disturbances. This soliton-mediated damping provides a clear, quantitative mechanism for sustaining long-range quantum coherence at high temperatures including room temperature. These insights point directly toward new classes of materials for quantum computing, secure communications, and photonic devices that can operate without cryogenic cooling.</p>

<p>I was privileged with the opportunity to develop the computational algorithms in <a href="https://fhi-aims.org/">FHI-aims </a> and <a href="https://wordpress.elsi-interchange.org/">ELSI</a> that helped validate the experimental observations of the solitonic mechanism in this work.</p>

<p>Congratulations to a truly multidisciplinary team-<strong>Kenan Gundogdu, Melike Biliroglu, Mustafa Türe, Antonia Ghita, Myratgeldi Kotyrov, Xixi Qin, Dovletgeldi Seyitliyev, Natchanun Phonthiptokun, Malek Abdelsamei, Jingshan Chai, Rui Su, Anna Swan, Vasily Temnov, Volker Blum,</strong> and <strong>Franky So</strong> for an inspiring collaboration.</p>

<p><strong>Read more:</strong><br />
Nature article: <a href="https://rdcu.be/eoeUQ">https://rdcu.be/eoeUQ</a><br />
News release: <a href="https://shorturl.at/C39yA">https://shorturl.at/C39yA</a></p>

<p>Thanks for reading. Please share any questions or feedback in the comments below, or feel free to reach out to me directly via email.</p>]]></content><author><name>Uthpala Herath</name></author><category term="condensed-matter-physics" /><category term="materials-science" /><category term="computational-physics" /><category term="nature" /><category term="superfluorescence" /><summary type="html"><![CDATA[Publication announcement: our latest publication in Nature highlights our collaborative research that led to the discovery of a new mechanism explaining superfluorescence occurring at high temperatures.]]></summary></entry><entry><title type="html">Accelerating materials research with Duke Compute Cluster’s (DCC) NVIDIA Tesla P100 GPUs</title><link href="https://www.uthpalaherath.com/Accelerating-materials-research-with-Duke-Compute-Cluster's-NVIDIA-Tesla-P100-GPUs/" rel="alternate" type="text/html" title="Accelerating materials research with Duke Compute Cluster’s (DCC) NVIDIA Tesla P100 GPUs" /><published>2025-05-04T00:00:00+00:00</published><updated>2025-05-04T00:00:00+00:00</updated><id>https://www.uthpalaherath.com/Accelerating-materials-research-with-Duke-Compute-Cluster&apos;s-NVIDIA-Tesla-P100-GPUs</id><content type="html" xml:base="https://www.uthpalaherath.com/Accelerating-materials-research-with-Duke-Compute-Cluster&apos;s-NVIDIA-Tesla-P100-GPUs/"><![CDATA[<p class="notice--primary"><em>Leveraging DCC’s NVIDIA Tesla P100 GPUs, in this post I demonstrate how GPU acceleration within FHI‑aims and ELPA can speed up all-electron DFT calculations by a factor of ~2x on systems exceeding 3,000 atoms and 50,000 basis functions.</em></p>

<p>Firstly, May the 4th be with you!</p>

<p>As mentioned in an earlier <a href="https://uthpalaherath.com/Advanced-resource-monitoring-on-HPC-clusters/">post</a>, this semester I had the privilege of setting up and optimizing the Density Functional Theory (DFT)<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">1</a></sup> code <a href="https://fhi-aims.org">FHI-aims</a><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">2</a></sup> on the <a href="https://dcc.duke.edu">Duke Compute Cluster (DCC)</a> for students in the <a href="https://graduateschool.bulletins.duke.edu/courses/0261981">ME511</a> (Computational Materials Science) course as well as for lab members of the <a href="https://aims.pratt.duke.edu">AIMS Group</a> to perform materials simulations for research and education. As part of this endeavor, I explored how utilizing GPUs can accelerate electronic structure calculations. I have documented my efforts to configure this setup <a href="https://github.com/uthpalaherath/fhiaims-dcc">here</a>.</p>

<h1 id="gpu-implementation">GPU implementation</h1>

<p>Kohn-Sham DFT (KS-DFT) is considered to be the primary tool for computational materials research in a wide range of areas in science and engineering. FHI-aims is an all-electron electronic structure code based on numeric atom-centered orbitals that solves the Kohn-Sham eigenvalue problem,</p>

\[\hat{h}^{\mathrm{KS}}\left|\psi_l\right\rangle=\epsilon_l\left|\psi_l\right\rangle\]

<p>where, $\ket{\psi_l}$ are effective single-particle orbitals (KS orbitals) with the Hamiltonian $\hat{h}^{\mathrm{KS}}$ in its scalar-relativistic form,</p>

\[\hat{h}_{KS}=\hat{t}_s+\hat{v}_{ext}+\hat{v}_H[n]+\hat{v}_{xc}[n].\]

<p>Here, $\hat{t}_s$ is the kinetic‑energy operator, $\hat{v}_{ext}$ is the external potential, $\hat{v}_H[n]$ is the Hartree potential of the electrons, and $\hat{v}_{xc}[n]$ is the exchange-correlation potential. The density $n$ that minimizes the total energy is obtained through a self‑consistent field (SCF) cycle.</p>

<p>Recent improvements in the code have made it possible to utilize GPUs to accelerate segments of this process. Readers are referred to <em>W.P. Huhn, B. Lange, V.W.-z. Yu et al./Comput. Phys. Commun. 254 (2020) 107314</em><sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">3</a></sup>  and <em>V.W.-z. Yu, J. Moussa, P. Kůs et al./Comput. Phys. Commun. 262 (2021) 107808</em><sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">4</a></sup> for a deeper dive into the implementation.</p>

<p>GPU support in FHI-aims is focused on two computational hotspots that dominate a semi-local DFT self-consistent field (SCF) cycle.</p>

<h2 id="1-real-space-operations">1. Real-space operations</h2>

<p>These calculations include Hamiltonian and overlap matrix integration, electron density update, forces and stress tensor calculation which scale linearly as $O(N)$ with system size, N. A <em>real-space domain decomposition</em> (RSDD) algorithm splits the integration grid into compact <em>“batches”</em> so that the Hamiltonian can be reformulated as,</p>

\[h_{i j}^{u c}=\sum_v h_{i j}^{u c}\left[B_v\right]\]

<p>where,</p>

\[h_{i j}^{u c}\left[B_\nu\right]=\sum_{\boldsymbol{r} \in B_\nu} w(\boldsymbol{r}) \varphi_i^*(\boldsymbol{r}) \hat{h}_{K S} \varphi_j(\boldsymbol{r})\]

<p>is the contribution of batch $B_\nu$ to the real-space Hamiltonian matrix element $h_{i j}^{u c}$. Each batch becomes a small dense matrix that is offloaded to the GPU as cuBLAS function calls. A large number of such batches stream through in parallel while keeping the CPU-GPU communication minimal by holding batches on-device until calculations are finished. An analogous method is used for the density matrix $n_{ij}[B_\nu]$ and force calculations.</p>

<h2 id="2-kohn-sham-eigensolver-elpa">2. Kohn-Sham eigensolver (ELPA)</h2>

<p>This segment scales as $O(N^3)$ and is considered to be the workhorse of DFT. This step focuses on solving the generalized eigenproblem in its matrix form,</p>

\[H C=S C \Sigma\]

<p>where, H is the Hamiltonian,  S is the overlap, C is the eigen vector, and $\Sigma$ is the eigenvalue matrix of the system. FHI-aims delegates this step to the <a href="https://elpa.mpcdf.mpg.de">ELPA</a><sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">5</a></sup> eigensolver through the <a href="https://wordpress.elsi-interchange.org">ELSI</a> interface, whose ELPA2 <em>two-stage tridiagonlization</em> replaces local BLAS calls with their cuBLAS equivalent and adds a CUDA kernel for the computationally complex Householder back-transformation step to an eigenvector $\boldsymbol{x}$,</p>

\[\left(\boldsymbol{I}-\tau \boldsymbol{v} \boldsymbol{v}^*\right) \boldsymbol{x}=\boldsymbol{x}-\tau \boldsymbol{v}\left(\boldsymbol{v}^* \boldsymbol{x}\right) .\]

<p>This process is depicted in Fig. 1.</p>

<p><img src="/assets/media/2025-04-25-Accelerating-materials-research-with-Duke-Compute-Cluster's-NVIDIA-Tesla-P100-GPUs/2025-04-25-Accelerating-materials-research-with-Duke-Compute-Cluster's-NVIDIA-Tesla-P100-GPUs-20250504233246496.png" alt="2025-04-25-Accelerating-materials-research-with-Duke-Compute-Cluster's-NVIDIA-Tesla-P100-GPUs-20250504233246496" /><em>Figure 1: The computational steps of the two-stage tridiagonlization algorithm implemented in ELPA. Ref. [V.W.-z. Yu, J. Moussa, P. Kůs et al.]</em></p>

<p>Similar to the GPU offloaded real-space operations discussed earlier, data stay resident on the GPU between stages to minimize communication latency.</p>

<h1 id="benchmark-systems-and-platform">Benchmark systems and platform</h1>

<p>To demonstrate the GPU acceleration in the code, I ran benchmark tests on the following two materials systems,</p>

<p>(1). 3,000 atom Cu$_2$BaSnS$_4$ semiconductor (CBTS) supercell with 80,250 basis functions <br />
(2). 3,376 atom Graphene-covered SiC surface model with 51,576 basis functions</p>

<p>Their crystal structures are shown in Fig. 2.</p>

<p><img src="/assets/media/2025-04-25-Accelerating-materials-research-with-Duke-Compute-Cluster's-NVIDIA-Tesla-P100-GPUs/2025-04-25-Accelerating-materials-research-with-Duke-Compute-Cluster's-NVIDIA-Tesla-P100-GPUs-20250504215917066.png" alt="2025-04-25-Accelerating-materials-research-with-Duke-Compute-Cluster's-NVIDIA-Tesla-P100-GPUs-20250504215917066" /><em>Figure 2: The crystal structures of Cu$_2$BaSnS$_4$ (left) and Graphene-covered SiC (right).</em></p>

<p>These benchmarks were performed on DCC’s <code class="language-plaintext highlighter-rouge">courses-gpu</code> nodes. Each of these nodes consist of 44 $\times$ Intel(R) Xeon(R) CPU E5-2699 v4 physical cores (88 hyperthreads) operating at a nominal frequency of 2.2 GHz and 479 GB of RAM. However, one core is reserved for the virtualization hypervisor (VMWare ESXi) and another for the guest OS, making 42 physical cores (84 hyperthreads) available for calculations. The nodes are networked through Mellanox ConnectX-4 Infiniband connections. Additionally, each node is equipped with 2 $\times$ NVIDIA Tesla P100 GPUs with 16 GB RAM.</p>

<p><code class="language-plaintext highlighter-rouge">nvidia-smi</code> provides the following information on a typical <code class="language-plaintext highlighter-rouge">courses-gpu</code> node.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(py3) ukh at dcc-courses-gpu-07 in ~
$ nvidia-smi
Fri Apr 25 22:29:34 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05              Driver Version: 560.35.05      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla P100-PCIE-16GB           On  |   00000000:13:00.0 Off |                    0 |
| N/A   21C    P0             24W /  250W |       0MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla P100-PCIE-16GB           On  |   00000000:1B:00.0 Off |                    0 |
| N/A   21C    P0             24W /  250W |       0MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
</code></pre></div></div>

<p>These calculations were performed with FHI-aims v250403 built with Intel oneAPI LLVM Fortran, C, C++ compilers v2025.0.4, Intel MPI v2021.14, Intel MKL v2025.0, and CUDA v12.4. The default ELPA eigensolver (v2020.05.001) that comes bundled with FHI-aims was used as the eigensolver.</p>

<h1 id="benchmark-results">Benchmark results</h1>

<p>For the 3,000 atom Cu$_2$BaSnS$_4$ structure with 80,250 basis functions, the calculation run on two nodes halted with the following error message, due to insufficient GPU memory.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/hpc/home/ukh/local/FHIaims/FHIaims-intel-gpu/external_libraries/elsi_interface/external/ELPA/ELPA-2020.05.001/src/cudaFunctions.cu:90
Error in cudaMalloc: out of memory
</code></pre></div></div>

<p>Since a single node only provides 16 GB of GPU RAM per device, increasing the node count resolved this issue. The results of the benchmark tests are shown in Fig. 3.</p>

<p><img src="/assets/media/2025-04-25-Accelerating-materials-research-with-Duke-Compute-Cluster's-NVIDIA-Tesla-P100-GPUs/2025-04-25-Accelerating-materials-research-with-Duke-Compute-Cluster's-NVIDIA-Tesla-P100-GPUs-20250504124227321.png" alt="2025-04-25-Accelerating-materials-research-with-Duke-Compute-Cluster's-NVIDIA-Tesla-P100-GPUs-20250504124227321" /><em>Figure 3: The benchmark results for CBTS (blue) and SiC-covered Graphene (orange) calculated with FHI-aims comparing timing for a single SCF iteration with and without GPU acceleration. Solid lines represent CPU only runs and dotted lines represent runs with GPU acceleration enabled. The dotted black line represents ideal scaling.</em></p>

<p>As an example, the break down of several key timing steps for a single SCF iteration for the 252 core/12 GPU CBTS run is given in Table 1.</p>

<p><strong>Table 1: Timing of key stages in a single SCF iteration with and without GPU acceleration for the 252 core/12 GPU CBTS run</strong></p>

<table>
  <thead>
    <tr>
      <th>Step</th>
      <th>CPU (s)</th>
      <th>CPU+GPU (s)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Real-space Hamiltonian integration</td>
      <td>31.07</td>
      <td>26.75</td>
    </tr>
    <tr>
      <td>Charge density update</td>
      <td>106.85</td>
      <td>54.767</td>
    </tr>
    <tr>
      <td>Kohn-Sham equation solution (eigensolver)</td>
      <td>1331.15</td>
      <td>596.42</td>
    </tr>
  </tbody>
</table>

<p>Table 1 shows the different calculation steps that were accelerated with GPUs in comparison to their CPU-only counterparts. For this case, the most significant speedup (~2.2x) is due to the eigensolver, which is expected since this is essentially the workhorse of DFT.</p>

<p>Looking at the outcomes in Fig. 3, we observe an average <strong>~2×</strong> acceleration for CBTS and <strong>~1.7×</strong> for SiC‑Graphene, measured per SCF iteration at equal core/GPU counts. This is already a significant improvement in speedup considering these large system sizes and may be improved further by,</p>

<ul>
  <li>
    <p>Building FHI-aims with a more recent version of the ELPA eigensolver that features more modern and efficient GPU optimizations.</p>
  </li>
  <li>
    <p>Experimenting with different Scalapack matrix block sizes by adjusting the <code class="language-plaintext highlighter-rouge">scalapack_block_size &lt;block_size&gt;</code> parameter and changing the <code class="language-plaintext highlighter-rouge">points_in_batch &lt;value&gt;</code> value that targets the number of integration points per batch offloaded to the GPU.</p>
  </li>
  <li>
    <p>Enabling the NVIDIA <a href="https://docs.nvidia.com/deploy/mps/index.html">Multi-Process Service (MPS)</a> allows work from different MPI processes to be executed concurrently on the GPU, so that memory transfer and computation requests from different MPI processes are automatically overlapped whenever possible. This typically increases the overall GPU utilization.</p>
  </li>
</ul>

<p>In summary, the NVIDIA Tesla P100 GPUs on DCC do a great job accelerating calculations for materials simulations, especially for large scale systems. I gratefully acknowledge the Duke Research Computing Team for providing access to DCC and for the insightful discussions that helped shape this work. Feel free to leave any feedback in the comments and please reach out if you need any help setting up your calculations with GPU acceleration.</p>

<h1 id="references">References</h1>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:2" role="doc-endnote">
      <p>P. Hohenberg and W. Kohn, Inhomogeneous Electron Gas, Phys. Rev. <strong>136</strong>, B864 (1964). DOI: <a href="https://doi.org/10.1103/PhysRev.136.B864">https://doi.org/10.1103/PhysRev.136.B864</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:1" role="doc-endnote">
      <p>V. Blum, R. Gehrke, F. Hanke, P. Havu, V. Havu, X. Ren, K. Reuter, and M. Scheffler, Ab initio molecular simulations with numeric atom-centered orbitals, Computer Physics Communications 180, 2175 (2009). DOI: <a href="https://doi.org/10.1016/j.cpc.2009.06.022">https://doi.org/10.1016/j.cpc.2009.06.022</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>W. P. Huhn, B. Lange, V. W. Yu, M. Yoon, and V. Blum, GPU acceleration of all-electron electronic structure theory using localized numeric atom-centered basis functions, Computer Physics Communications <strong>254</strong>, 107314 (2020). DOI: <a href="https://doi.org/10.1016/j.cpc.2020.107314">https://doi.org/10.1016/j.cpc.2020.107314</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p>V. W. Yu, J. Moussa, P. Kůs, A. Marek, P. Messmer, M. Yoon, H. Lederer, and V. Blum, GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems, Computer Physics Communications <strong>262</strong>, 107808 (2021). DOI: <a href="https://doi.org/10.1016/j.cpc.2020.107808">https://doi.org/10.1016/j.cpc.2020.107808</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>Hans-Joachim Bungartz et. al.: ELPA: A Parallel Solver for the Generalized Eigenvalue Problem, Advances in Parallel Computing, Vol. 36 Parallel Computing: Technology Trends, DOI: 10.3233/APC200095 Online version: <a href="https://ebooks.iospress.nl/doi/10.3233/APC200095">https://ebooks.iospress.nl/doi/10.3233/APC200095</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Uthpala Herath</name></author><category term="gpu" /><category term="hpc" /><category term="fhiaims" /><category term="condensed-matter-physics" /><category term="density-functional-theory" /><category term="materials-science" /><summary type="html"><![CDATA[Leveraging DCC’s NVIDIA Tesla P100 GPUs, in this post I demonstrate how GPU acceleration within FHI‑aims and ELPA can speed up all-electron DFT calculations by a factor of ~2x on systems exceeding 3,000 atoms and 50,000 basis functions.]]></summary></entry><entry><title type="html">Error estimation in band structures</title><link href="https://www.uthpalaherath.com/Error-estimation-in-band-structures/" rel="alternate" type="text/html" title="Error estimation in band structures" /><published>2025-03-03T00:00:00+00:00</published><updated>2025-03-03T00:00:00+00:00</updated><id>https://www.uthpalaherath.com/Error-estimation-in-band-structures</id><content type="html" xml:base="https://www.uthpalaherath.com/Error-estimation-in-band-structures/"><![CDATA[<p class="notice--primary"><em>Accurate error estimation is highly beneficial for quantifying differences in band structures of materials. This post explores utilizing the root-mean-square error (RMSE) for comparing band structures calculated with the FHI-aims code. This method can help us quickly gauge the agreement between two band structures and guide more informed decisions in materials simulations.</em></p>

<p>The band structure of a material describes the range of energy levels that electrons are allowed to occupy. It encodes crucial information about a material’s electrical conductivity and forms the basis for predicting various electronic and optical properties including band gap values, excitations, and conduction mechanisms providing deeper insights in semiconductor and photovoltaics applications<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">1</a></sup>.</p>

<p>While calculating band structures is standard practice in materials science helping us visually gauge the similarities and differences between electronic structure methods, it is often useful to quantify these comparisons beyond observables such as band gaps. For example, quantifying the difference between band structures calculated with the GW approximation<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">2</a></sup> vs. HSE06 hybrid functionals<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">3</a></sup>, different basis set tiers such as intermediate vs. tight, or identifying the effects of spin-orbit coupling (SOC) onto a band structure.</p>

<p>For this purpose, I extended the <code class="language-plaintext highlighter-rouge">aimsplot_compare.py</code> tool within <code class="language-plaintext highlighter-rouge">FHI-aims/utilities</code> to use the root-mean-square error (RMSE) which measures the average difference between two statistical datasets- in our case, two band structures. Mathematically, it is the standard deviation of the residuals which represent the distance between the regression line and the data points<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">4</a></sup>. In simple terms, RMSE can be formulated as shown below.</p>

\[RMSE=\sqrt{\sum_{i=1}^n \frac{\left(\hat{y}_i-y_i\right)^2}{n}}\]

<p>where, $\hat{y}_i$ is the predicted value, $y_i$ is the observed value, and $n$ is the number of data points. Let’s see how we can use this to create a model to compare two band structures.</p>

<h1 id="rmse-algorithm-for-comparing-band-structures">RMSE algorithm for comparing band structures</h1>

<p>The discussion here should be applicable to any first-principles electronic structure code (e.g., VASP, Abinit, Siesta, Quantum Espresso, Elk), however, the code I present in this post is targeted for FHI-aims<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">5</a></sup>.</p>

<p>To compute an RMSE for band structures, we need,</p>

<p>1. <strong>Corresponding k-points</strong>: Make sure the two band structures in question have the <strong>same</strong> k-path segmentation, number of k-points per segment, and same order of k-path segments.</p>

<p>2. <strong>Same reference</strong>: Typically, we want both sets of energies aligned to the same reference, often the Fermi level or some valence-band maximum (VBM). This code shifts band energies set by a user-supplied offset if needed.</p>

<p>3. <strong>Common energy window</strong>: It is important to find a one-to-one mapping of bands between the band structures. In some cases a deep core state band with -1000 eV may be matched erroneously with a band that is just -20 eV leading to a high RMSE value. This is remedied by selecting a common set of bands for comparison within a selected energy window. If the user does not provide an energy window, the code selects a default range.</p>

<p>Provided these conditions, the code then filters out all the bands that lie entirely outside the chosen energy window. Then, it matches the columns up to the minimum number of bands present in both datasets within that window. Next, it computes the squared difference at each k-point for those matched bands. Finally, it takes the mean and the square root, providing the root-mean-square (RMSE) value. It also outputs a plot of the band structure comparison.</p>

<p>Mathematically, for a given energy window,</p>

\[RMSE=\sqrt{\frac{1}{N} \sum_{k=1}^{N_k} \sum_{i=1}^{n_{bands}}\left(E_2(k, i)-E_1(k, i)\right)^2}\]

<p>where, $N = N_k \times n_{bands}$ is summed over all band segments.</p>

<h1 id="code-and-usage">Code and usage</h1>

<p>The code snippet below demonstrates the main functionality of the process, where we read two sets of band information, align and filter them by an energy window, and then compute the RMSE. It currently does not support colinear spin calculations, but I intend to incorporate that in the future. The code also accounts for the degeneracy of bands when SOC is enabled.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">##############################
# 6) Compute RMSE if nPlots=2
##############################
</span><span class="k">def</span> <span class="nf">filter_bands_by_energy_range</span><span class="p">(</span><span class="n">E</span><span class="p">,</span> <span class="n">y_min</span><span class="p">,</span> <span class="n">y_max</span><span class="p">):</span>
    <span class="c1"># E shape: (nK, nB)
</span>    <span class="n">keep</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="n">nK</span><span class="p">,</span> <span class="n">nB</span> <span class="o">=</span> <span class="n">E</span><span class="p">.</span><span class="n">shape</span>
    <span class="k">for</span> <span class="n">b</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">nB</span><span class="p">):</span>
        <span class="n">arr</span> <span class="o">=</span> <span class="n">E</span><span class="p">[:,</span> <span class="n">b</span><span class="p">]</span>
        <span class="k">if</span> <span class="n">arr</span><span class="p">.</span><span class="nb">max</span><span class="p">()</span> <span class="o">&lt;</span> <span class="n">y_min</span> <span class="ow">or</span> <span class="n">arr</span><span class="p">.</span><span class="nb">min</span><span class="p">()</span> <span class="o">&gt;</span> <span class="n">y_max</span><span class="p">:</span>
            <span class="k">continue</span>
        <span class="n">keep</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">b</span><span class="p">)</span>
    <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">keep</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">E</span><span class="p">[:,</span> <span class="n">keep</span><span class="p">]</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">nK</span><span class="p">,</span> <span class="mi">0</span><span class="p">))</span>


<span class="k">if</span> <span class="n">nPlots</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
    <span class="n">sum_sq</span> <span class="o">=</span> <span class="mf">0.0</span>
    <span class="n">nvals</span> <span class="o">=</span> <span class="mi">0</span>

    <span class="c1"># Do a naive pass over band_segments[0], spin=1 =&gt; match with #1 in second structure
</span>    <span class="c1"># ignoring spin=2
</span>    <span class="k">for</span> <span class="n">seg_index</span><span class="p">,</span> <span class="n">segdata</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">band_segments</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">start</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
        <span class="p">(</span><span class="n">start</span><span class="p">,</span> <span class="n">end</span><span class="p">,</span> <span class="n">length</span><span class="p">,</span> <span class="n">npoint</span><span class="p">,</span> <span class="n">sname</span><span class="p">,</span> <span class="n">ename</span><span class="p">)</span> <span class="o">=</span> <span class="n">segdata</span>
        <span class="n">keyA</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">seg_index</span><span class="p">)</span>
        <span class="n">keyB</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">seg_index</span><span class="p">)</span>
        <span class="k">if</span> <span class="n">keyA</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">band_data</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">or</span> <span class="n">keyB</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">band_data</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
            <span class="k">continue</span>
        <span class="n">E1</span> <span class="o">=</span> <span class="n">band_data</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="n">keyA</span><span class="p">][</span><span class="s">"energies"</span><span class="p">]</span>
        <span class="n">E2</span> <span class="o">=</span> <span class="n">band_data</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="n">keyB</span><span class="p">][</span><span class="s">"energies"</span><span class="p">]</span>

        <span class="c1"># If SOC is used, keep only first eigenvalue
</span>        <span class="k">if</span> <span class="n">PLOT_SOC</span><span class="p">[</span><span class="mi">0</span><span class="p">]:</span>
            <span class="n">E1</span> <span class="o">=</span> <span class="n">E1</span><span class="p">[:,</span> <span class="p">::</span><span class="mi">2</span><span class="p">]</span>  <span class="c1"># '::2' take every 2nd column
</span>        <span class="k">if</span> <span class="n">PLOT_SOC</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
            <span class="n">E2</span> <span class="o">=</span> <span class="n">E2</span><span class="p">[:,</span> <span class="p">::</span><span class="mi">2</span><span class="p">]</span>

        <span class="k">if</span> <span class="n">E1</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">!=</span> <span class="n">E2</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]:</span>
            <span class="c1"># mismatch # k-points
</span>            <span class="k">continue</span>
        <span class="n">F1</span> <span class="o">=</span> <span class="n">filter_bands_by_energy_range</span><span class="p">(</span><span class="n">E1</span><span class="p">,</span> <span class="n">ylim_lower</span><span class="p">,</span> <span class="n">ylim_upper</span><span class="p">)</span>
        <span class="n">F2</span> <span class="o">=</span> <span class="n">filter_bands_by_energy_range</span><span class="p">(</span><span class="n">E2</span><span class="p">,</span> <span class="n">ylim_lower</span><span class="p">,</span> <span class="n">ylim_upper</span><span class="p">)</span>
        <span class="k">if</span> <span class="n">F1</span><span class="p">.</span><span class="n">size</span> <span class="o">==</span> <span class="mi">0</span> <span class="ow">or</span> <span class="n">F2</span><span class="p">.</span><span class="n">size</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
            <span class="k">continue</span>
        <span class="n">nb1</span> <span class="o">=</span> <span class="n">F1</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
        <span class="n">nb2</span> <span class="o">=</span> <span class="n">F2</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
        <span class="n">ncomm</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">nb1</span><span class="p">,</span> <span class="n">nb2</span><span class="p">)</span>
        <span class="n">d</span> <span class="o">=</span> <span class="n">F2</span><span class="p">[:,</span> <span class="p">:</span><span class="n">ncomm</span><span class="p">]</span> <span class="o">-</span> <span class="n">F1</span><span class="p">[:,</span> <span class="p">:</span><span class="n">ncomm</span><span class="p">]</span>
        <span class="n">sum_sq</span> <span class="o">+=</span> <span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">d</span> <span class="o">*</span> <span class="n">d</span><span class="p">)</span>
        <span class="n">nvals</span> <span class="o">+=</span> <span class="n">d</span><span class="p">.</span><span class="n">size</span>

    <span class="k">if</span> <span class="n">nvals</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
        <span class="n">rmse_val</span> <span class="o">=</span> <span class="n">math</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">sum_sq</span> <span class="o">/</span> <span class="n">nvals</span><span class="p">)</span>
        <span class="n">rmse_str</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"RMSE=</span><span class="si">{</span><span class="n">rmse_val</span><span class="si">:</span><span class="p">.</span><span class="mi">3</span><span class="n">f</span><span class="si">}</span><span class="s"> eV"</span>
        <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"RMSE in range [</span><span class="si">{</span><span class="n">ylim_lower</span><span class="si">}</span><span class="s">, </span><span class="si">{</span><span class="n">ylim_upper</span><span class="si">}</span><span class="s">] eV = </span><span class="si">{</span><span class="n">rmse_val</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> eV."</span><span class="p">)</span>

        <span class="c1"># Create an invisible line2D instance:
</span>        <span class="n">rmse_handle</span> <span class="o">=</span> <span class="n">mlines</span><span class="p">.</span><span class="n">Line2D</span><span class="p">(</span>
            <span class="p">[],</span> <span class="p">[],</span> <span class="n">color</span><span class="o">=</span><span class="s">"none"</span><span class="p">,</span> <span class="n">marker</span><span class="o">=</span><span class="s">""</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s">"none"</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="n">rmse_str</span>
        <span class="p">)</span>

        <span class="c1"># Grab the existing legend handles/labels from the main figure:
</span>        <span class="n">handles</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">ax_bands</span><span class="p">.</span><span class="n">get_legend_handles_labels</span><span class="p">()</span>

        <span class="c1"># Append the RMSE entry
</span>        <span class="n">handles</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">rmse_handle</span><span class="p">)</span>
        <span class="n">labels</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">rmse_str</span><span class="p">)</span>

        <span class="c1"># Re‐draw the legend with this extra line
</span>        <span class="n">ax_bands</span><span class="p">.</span><span class="n">legend</span><span class="p">(</span><span class="n">handles</span><span class="p">,</span> <span class="n">labels</span><span class="p">)</span>

    <span class="k">else</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="s">"No overlapping band data =&gt; no RMSE to compute."</span><span class="p">)</span>
</code></pre></div></div>

<p>The full code can be obtained from: <a href="https://github.com/uthpalaherath/MatSciScripts/blob/master/bands_compare.py">https://github.com/uthpalaherath/MatSciScripts/blob/master/bands_compare.py</a></p>

<p>Props to the original developer of the <code class="language-plaintext highlighter-rouge">FHIaims/utilities/aimsplot_compare.py</code> script as this implementation is built on top of that.</p>
<h2 id="usage">Usage</h2>

<p>Assuming you have two directories (e.g., AlAs_ZB_HSE06 and AlAs_ZB_GW) with band structure outputs, run the <code class="language-plaintext highlighter-rouge">bands_compare.py</code> script as follows.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bands_compare.py 2 /Users/uthpala/AlAs_ZB_HSE06 "HSE06+SOC" 0.0 /Users/uthpala/AlAs_ZB_GW "GW" 0.0 -20 20 
</code></pre></div></div>

<p>This follows the format,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bands_compare.py N_PLOTS DIRECTORY TITLE ENERGY_OFFSET ... [yMin yMax] --diffplot
</code></pre></div></div>

<ul>
  <li><code class="language-plaintext highlighter-rouge">N_PLOTS</code>:  number of band structures to plot (1 to 7). RMSE is calculated only for <code class="language-plaintext highlighter-rouge">N_PLOTS=2</code>.</li>
  <li>For each band structure, supply:
          <code class="language-plaintext highlighter-rouge">[DIRECTORY TITLE ENERGY_OFFSET]</code></li>
  <li>Optionally specify the energy window <code class="language-plaintext highlighter-rouge">[yMin, yMax]</code> for the plot</li>
  <li>Optional flag <code class="language-plaintext highlighter-rouge">--diffplot</code> to plot the residual</li>
</ul>

<h1 id="examples">Examples</h1>

<p>Here, I share some examples that make use of the RMSE calculations.</p>

<p>These calculations were run with <code class="language-plaintext highlighter-rouge">FHI-aims</code> using an <code class="language-plaintext highlighter-rouge">intermediate</code> basis set. The SCF calculations were performed with a 10 $\times$ 10 $\times$ 10 k-grid while the band structures were calculated with 21 points between each high-symmetry point for both cases. The k-grid for the SCF calculations may be different, but the code requires the band structure k-points to be the same.</p>

<p><strong>1. AlAs_ZB GW approximation vs. HSE06 hybrid functional <em>with</em> spin-orbit coupling (SOC)</strong></p>

<p>Fig. 1 below shows a comparison of the band structure of the compound AlAs in its Zincblende (ZB) phase calculated with the GW approximation and the hybrid functional HSE06 with spin-orbit coupling (SOC) enabled.
The differences between the band structures are quantified with a RMSE value of 0.412 eV.</p>

<p><img src="/assets/media/2025-03-01-Error-estimation-in-band-structures/2025-03-01-Error-estimation-in-band-structures-20250302162329280.png" alt="2025-03-01-Error-estimation-in-band-structures-20250302162329280" /><em>Figure 1: The comparison of band structures of AlAs_ZB calculated with GWA and HSE06+SOC. The difference is quantified with a RMSE value of 0.412 eV.</em></p>

<p>Optionally, if the flag  <code class="language-plaintext highlighter-rouge">--diffplot</code> is passed to <code class="language-plaintext highlighter-rouge">bands_compare.py</code>, the residuals of each matched band for each k-point is plotted as shown in Fig. 2. Each band is assigned a unique color.</p>

<p><img src="/assets/media/2025-03-01-Error-estimation-in-band-structures/2025-03-01-Error-estimation-in-band-structures-20250303152842818.png" alt="2025-03-01-Error-estimation-in-band-structures-20250303152842818" /><em>Figure 2: The residual plot for comparing AlAs_ZB band structures between GWA and HSE06+SOC.</em></p>

<p><strong>2. AlAs_ZB GW approximation vs. HSE06 hybrid functional <em>without</em> spin-orbit coupling (SOC)</strong></p>

<p>In Fig. 3, we notice that disabling the spin-orbit coupling in the HSE06 calculation provides a band structure closer to that of the GW approximation. This is mostly emphasized near the HOMO at the $\Gamma$ point where the HSE bands show a slight upward shift.</p>

<p><img src="/assets/media/2025-03-01-Error-estimation-in-band-structures/2025-03-01-Error-estimation-in-band-structures-20250303134801333.png" alt="2025-03-01-Error-estimation-in-band-structures-20250303134801333" /><em>Figure 3: The comparison of band structures of AlAs_ZB calculated with GWA and HSE06. The difference is quantified with a RMSE value of 0.392 eV.</em></p>

<p><strong>3. GaN_ZB with and without auxiliary basis functions</strong></p>

<p>Fig. 4 showcases the differences in the GW band structures of GaN in its Zincblende (ZB) phase calculated with and without auxiliary basis functions. Auxiliary basis functions are additional basis functions to represent the Coulomb operator and help improve the band structure accuracy, especially for materials with higher densities such as MgO (Rocksalt), GaN (Wurtzite) and GaN (Zincblende). As we note in Fig. 4, including an additional auxiliary function to represent a $4f$ orbital to the basis resolves the erroneous band structure of conduction bands obtained from the default calculation. The differences between these two are quantified with a high RMSE value of 1.785 eV.</p>

<p><img src="/assets/media/2025-03-01-Error-estimation-in-band-structures/2025-03-01-Error-estimation-in-band-structures-20250303140106072.png" alt="2025-03-01-Error-estimation-in-band-structures-20250303140106072" /><em>Figure 4: The comparison of GW band structures of GaN_ZB calculated with and without 4f0 auxiliary functions. The difference is quantified with a RMSE value of 1.785 eV.</em></p>

<p>I hope this post helps illustrate the process of computing and interpreting the RMSE for band structure comparisons. Numerical error estimations, alongside visual plots, can provide additional clarity when deciding which settings and parameters provide the best desired outcome for your requirements. Let me know in the comments below if you have any questions or wish to provide any feedback.</p>
<h1 id="references">References</h1>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://www.forbes.com/sites/chadorzel/2016/03/30/why-do-solids-have-band-gaps/">https://www.forbes.com/sites/chadorzel/2016/03/30/why-do-solids-have-band-gaps/</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>L. Hedin, New method for calculating the one-particle green’s function with application to the electron-gas problem, Phys. Rev.139, A796 (1965). <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p>S. Kokott, F. Merz, Y. Yao, C. Carbogno, M. Rossi, V. Havu, M. Rampp, M. Scheffler, and V. Blum, Efficient all-electron hybrid density functionals for atomistic simulations beyond 10 000 atoms, The Journal of Chemical Physics (2024). <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://statisticsbyjim.com/regression/root-mean-square-error-rmse/">https://statisticsbyjim.com/regression/root-mean-square-error-rmse/</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://fhi-aims.org">https://fhi-aims.org</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Uthpala Herath</name></author><category term="condensed-matter-physics" /><category term="density-functional-theory" /><category term="fhiaims" /><category term="python" /><category term="physics" /><summary type="html"><![CDATA[Accurate error estimation is highly beneficial for quantifying differences in band structures of materials. This post explores utilizing the root-mean-square error (RMSE) for comparing band structures calculated with the FHI-aims code. This method can help us quickly gauge the agreement between two band structures and guide more informed decisions in materials simulations.]]></summary></entry><entry><title type="html">Advanced resource monitoring on HPC clusters</title><link href="https://www.uthpalaherath.com/Advanced-resource-monitoring-on-HPC-clusters/" rel="alternate" type="text/html" title="Advanced resource monitoring on HPC clusters" /><published>2025-02-05T00:00:00+00:00</published><updated>2025-02-05T00:00:00+00:00</updated><id>https://www.uthpalaherath.com/Advanced-resource-monitoring-on-HPC-clusters</id><content type="html" xml:base="https://www.uthpalaherath.com/Advanced-resource-monitoring-on-HPC-clusters/"><![CDATA[<p class="notice--primary"><em>Efficient resource monitoring is key to maximizing HPC performance. In this post, I explore techniques for advanced resource monitoring on HPC clusters and provide a script that utilizes seff, sstat, and sacct to track CPU and memory usage effectively.</em></p>

<p><img src="/assets/media/2025-02-05-Advanced-resource-monitoring-on-HPC-clusters/2025-02-05-Advanced-resource-monitoring-on-HPC-clusters-20250206132157590.png" alt="2025-02-05-Advanced-resource-monitoring-on-HPC-clusters-20250206132157590" /></p>

<p>Happy 2025, everyone! Although it might feel a little late for New Year’s greetings, this is my first post of the year, so I hope you all have a wonderful 2025.</p>

<p>This semester, as a volunteer instructor for the course ME511 (Computational Materials Science)<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">1</a></sup> at Duke University, I had the opportunity to access Duke’s Research Computing Cluster, DCC<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">2</a></sup>,  to setup and test the Density Functional Theory (DFT) code, FHI-aims<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">3</a></sup> for electronic structure calculations and molecular dynamics simulations. Learning through examples and exercises using this code, students will get to explore modern computational techniques for the prediction of materials properties encompassing multi-scale domains.</p>

<p>I used three tiers of test cases to benchmark the FHI-aims code on the DCC cluster.</p>

<ol>
  <li>Small: MD simulation of a 220 atom Ac-Ala19-LysH+ molecule</li>
  <li>Medium: SCF calculation of a 320 atom periodic Paracetamol structure</li>
  <li>Large: SCF calculation of a 1648 atom periodic Graphene on SiC(0001) structure</li>
</ol>

<p>For the larger systems, I needed to carefully gauge and optimize memory allocation to ensure the calculations would complete successfully. To simplify this task, I teamed up with my buddy, ChatGPT to write a script that wraps the SLURM utilities <code class="language-plaintext highlighter-rouge">seff</code><sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>, <code class="language-plaintext highlighter-rouge">sstat</code><sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>, and <code class="language-plaintext highlighter-rouge">sacct</code><sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup> providing a straightforward summary of resource usage for HPC jobs. <code class="language-plaintext highlighter-rouge">sstat</code> provides resource statistics for currently running jobs while <code class="language-plaintext highlighter-rouge">sacct</code> is used for completed jobs. <code class="language-plaintext highlighter-rouge">seff</code> provides information for both ongoing and completed jobs, but is more useful for the latter.  The following sections explain how to use the script and interpret its output.</p>
<h1 id="script">Script</h1>

<p>The resource monitoring script, <code class="language-plaintext highlighter-rouge">jobstats.sh</code> is provided below.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/usr/bin/env bash</span>

<span class="c"># A script containing functions to parse Slurm job statistics.</span>
<span class="c"># Provides a useful summary of seff, sstat, and sacct.</span>
<span class="c">#</span>
<span class="c"># Usage: jobstats.sh &lt;job_id&gt; [n_steps_for_sacct (optional)]</span>
<span class="c">#</span>
<span class="c"># Authors: Uthpala Herath and ChatGPT</span>

<span class="c"># A function to parse Slurm strings like "7721840K", "5.73M", "186.30M", etc.</span>
<span class="c"># into a byte count. Also handles decimal numbers plus an optional K/M/G suffix.</span>
to_bytes<span class="o">()</span> <span class="o">{</span>
    <span class="nb">local </span><span class="nv">val</span><span class="o">=</span><span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span>
    <span class="c"># If empty or not recognized, return 0.</span>
    <span class="o">[[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$val</span><span class="s2">"</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> <span class="nb">echo </span>0 <span class="o">&amp;&amp;</span> <span class="k">return</span>

    <span class="c"># Regex: optionally a decimal number, then optionally K/M/G</span>
    <span class="c"># Examples that match:</span>
    <span class="c">#   "12345" -&gt; no suffix, assume raw bytes</span>
    <span class="c">#   "123.45M"</span>
    <span class="c">#   "186.30M"</span>
    <span class="c">#   "11521316K"</span>
    <span class="c">#   "5.73M"</span>
    <span class="k">if</span> <span class="o">[[</span> <span class="s2">"</span><span class="nv">$val</span><span class="s2">"</span> <span class="o">=</span>~ ^<span class="o">([</span>0-9]+<span class="o">(</span><span class="se">\.</span><span class="o">[</span>0-9]+<span class="o">)</span>?<span class="o">)([</span>KMG]<span class="o">)</span>?<span class="nv">$ </span><span class="o">]]</span><span class="p">;</span> <span class="k">then
        </span><span class="nb">local </span><span class="nv">num</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">BASH_REMATCH</span><span class="p">[1]</span><span class="k">}</span><span class="s2">"</span>   <span class="c"># e.g. "186.30"</span>
        <span class="nb">local </span><span class="nv">unit</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">BASH_REMATCH</span><span class="p">[3]</span><span class="k">}</span><span class="s2">"</span>  <span class="c"># e.g. "M"</span>
        <span class="nb">local factor</span><span class="o">=</span>1

        <span class="k">case</span> <span class="s2">"</span><span class="nv">$unit</span><span class="s2">"</span> <span class="k">in
            </span>K<span class="p">)</span> <span class="nb">factor</span><span class="o">=</span><span class="k">$((</span><span class="m">1024</span><span class="k">))</span> <span class="p">;;</span>
            M<span class="p">)</span> <span class="nb">factor</span><span class="o">=</span><span class="k">$((</span><span class="m">1024</span><span class="o">*</span><span class="m">1024</span><span class="k">))</span> <span class="p">;;</span>
            G<span class="p">)</span> <span class="nb">factor</span><span class="o">=</span><span class="k">$((</span><span class="m">1024</span><span class="o">*</span><span class="m">1024</span><span class="o">*</span><span class="m">1024</span><span class="k">))</span> <span class="p">;;</span>
            <span class="k">*</span><span class="p">)</span> <span class="nb">factor</span><span class="o">=</span>1 <span class="p">;;</span>  <span class="c"># no suffix =&gt; bytes</span>
        <span class="k">esac</span>

        <span class="c"># We must do a floating-point multiply: num * factor.</span>
        <span class="c"># Use awk to produce an integer result (rounding).</span>
        <span class="nb">awk</span> <span class="nt">-v</span> <span class="nv">n</span><span class="o">=</span><span class="s2">"</span><span class="nv">$num</span><span class="s2">"</span> <span class="nt">-v</span> <span class="nv">f</span><span class="o">=</span><span class="s2">"</span><span class="nv">$factor</span><span class="s2">"</span> <span class="s1">'BEGIN {
            bytes = n * f
            # round to nearest integer
            printf "%.0f", bytes
        }'</span>
    <span class="k">else</span>
        <span class="c"># If purely integer digits, assume raw bytes</span>
        <span class="k">if</span> <span class="o">[[</span> <span class="s2">"</span><span class="nv">$val</span><span class="s2">"</span> <span class="o">=</span>~ ^[0-9]+<span class="nv">$ </span><span class="o">]]</span><span class="p">;</span> <span class="k">then
            </span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$val</span><span class="s2">"</span>
        <span class="k">else</span>
            <span class="c"># Unrecognized format</span>
            <span class="nb">echo </span>0
        <span class="k">fi
    fi</span>
<span class="o">}</span>

<span class="c"># Convert bytes to human-readable MB or GB (switch at 1GB).</span>
to_mb_or_gb<span class="o">()</span> <span class="o">{</span>
    <span class="nb">local </span><span class="nv">bytes</span><span class="o">=</span><span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span>
    <span class="nb">local </span><span class="nv">oneGB</span><span class="o">=</span><span class="k">$((</span><span class="m">1024</span><span class="o">*</span><span class="m">1024</span><span class="o">*</span><span class="m">1024</span><span class="k">))</span>
    <span class="k">if</span> <span class="o">((</span> bytes &lt; oneGB <span class="o">))</span><span class="p">;</span> <span class="k">then</span>
        <span class="c"># Show in MB</span>
        <span class="nb">awk</span> <span class="nt">-v</span> <span class="nv">b</span><span class="o">=</span><span class="s2">"</span><span class="nv">$bytes</span><span class="s2">"</span> <span class="s1">'BEGIN { printf "%.2fMB", b/(1024*1024) }'</span>
    <span class="k">else</span>
        <span class="c"># Show in GB</span>
        <span class="nb">awk</span> <span class="nt">-v</span> <span class="nv">b</span><span class="o">=</span><span class="s2">"</span><span class="nv">$bytes</span><span class="s2">"</span> <span class="s1">'BEGIN { printf "%.2fGB", b/(1024*1024*1024) }'</span>
    <span class="k">fi</span>
<span class="o">}</span>

<span class="c"># The main function:</span>
<span class="c">#    - Parse "Nodes: X" from seff output</span>
<span class="c">#    - seff summary</span>
<span class="c">#    - sstat summary (JobID,JobName,MaxRSS,MaxDiskWrite) in MB/GB, plus "Total MaxRSS"</span>
<span class="c">#    - sacct summary (JobID,JobName,MaxRSS,MaxDiskWrite) in MB/GB, plus "Total MaxRSS"</span>
jobstats<span class="o">()</span> <span class="o">{</span>
    <span class="nb">local </span><span class="nv">jobid</span><span class="o">=</span><span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span>
    <span class="nb">local </span><span class="nv">maxSteps</span><span class="o">=</span><span class="s2">"</span><span class="nv">$2</span><span class="s2">"</span>   <span class="c"># optional second argument</span>

    <span class="k">if</span> <span class="o">[[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$jobid</span><span class="s2">"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
        </span><span class="nb">echo</span> <span class="s2">"Usage: jobstats.sh &lt;job_id&gt; [&lt;n_steps&gt;]"</span>
        <span class="k">return </span>1
    <span class="k">fi</span>

    <span class="c"># (A) seff output</span>
    <span class="nb">local </span>seff_output
    <span class="nv">seff_output</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>seff <span class="s2">"</span><span class="nv">$jobid</span><span class="s2">"</span> 2&gt;&amp;1<span class="si">)</span><span class="s2">"</span>
    <span class="nb">echo</span> <span class="s2">"=== [1/3] seff summary ==="</span>
    <span class="nb">echo</span> <span class="s2">"</span><span class="nv">$seff_output</span><span class="s2">"</span>
    <span class="nb">echo</span>

    <span class="c"># Parse "Nodes: X" from seff, if present</span>
    <span class="nb">local </span>numNodes
    <span class="nv">numNodes</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$seff_output</span><span class="s2">"</span> | <span class="nb">sed</span> <span class="nt">-n</span> <span class="s1">'s/^Nodes:\s*\([0-9]\+\).*/\1/p'</span><span class="si">)</span><span class="s2">"</span>
    <span class="o">[[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$numNodes</span><span class="s2">"</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> <span class="nv">numNodes</span><span class="o">=</span>1

    <span class="c"># (B) sstat summary</span>
    <span class="nb">echo</span> <span class="s2">"=== [2/3] sstat summary (LIVE) ==="</span>
    <span class="nb">local </span>sstat_out
    <span class="nv">sstat_out</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>
        sstat <span class="nt">--noheader</span> <span class="nt">--format</span><span class="o">=</span>JobID%30,MaxRSS,MaxDiskWrite <span class="se">\</span>
              <span class="nt">-j</span> <span class="s2">"</span><span class="k">${</span><span class="nv">jobid</span><span class="k">}</span><span class="s2">"</span> 2&gt;/dev/null
    <span class="si">)</span><span class="s2">"</span>

    <span class="k">if</span> <span class="o">[[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$sstat_out</span><span class="s2">"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
        </span><span class="nb">echo</span> <span class="s2">"Job </span><span class="k">${</span><span class="nv">jobid</span><span class="k">}</span><span class="s2"> is not running. Check sacct summary."</span>
    <span class="k">else
        </span><span class="nb">echo</span> <span class="s2">"JobID step      | MaxRSS/node | Total MaxRSS | MaxDiskWrite"</span>
        <span class="nb">echo</span> <span class="s2">"----------------|------------ | ------------ | ------------"</span>
        <span class="k">while </span><span class="nv">IFS</span><span class="o">=</span> <span class="nb">read</span> <span class="nt">-r</span> line<span class="p">;</span> <span class="k">do</span>
            <span class="o">[[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$line</span><span class="s2">"</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> <span class="k">continue
            </span><span class="nv">IFS</span><span class="o">=</span><span class="s1">' '</span> <span class="nb">read</span> <span class="nt">-r</span> stepJobID rawRss rawDisk <span class="o">&lt;&lt;&lt;</span> <span class="s2">"</span><span class="nv">$line</span><span class="s2">"</span>
            <span class="o">[[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$stepJobID</span><span class="s2">"</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> <span class="k">continue

            </span><span class="nb">local </span>rssBytes diskBytes
            <span class="nv">rssBytes</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>to_bytes <span class="s2">"</span><span class="nv">$rawRss</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span>
            <span class="nv">diskBytes</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>to_bytes <span class="s2">"</span><span class="nv">$rawDisk</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span>

            <span class="nb">local </span>rssHuman diskHuman
            <span class="nv">rssHuman</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>to_mb_or_gb <span class="s2">"</span><span class="nv">$rssBytes</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span>
            <span class="nv">diskHuman</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>to_mb_or_gb <span class="s2">"</span><span class="nv">$diskBytes</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span>

            <span class="nb">local </span><span class="nv">totalRSSBytes</span><span class="o">=</span><span class="k">$((</span> rssBytes <span class="o">*</span> numNodes <span class="k">))</span>
            <span class="nb">local </span>totalRSSHuman
            <span class="nv">totalRSSHuman</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>to_mb_or_gb <span class="s2">"</span><span class="nv">$totalRSSBytes</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span>

            <span class="nb">printf</span> <span class="s2">"%-15s | %-11s | %-12s | %s</span><span class="se">\n</span><span class="s2">"</span> <span class="se">\</span>
                <span class="s2">"</span><span class="nv">$stepJobID</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$rssHuman</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$totalRSSHuman</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$diskHuman</span><span class="s2">"</span>
        <span class="k">done</span> <span class="o">&lt;&lt;&lt;</span> <span class="s2">"</span><span class="nv">$sstat_out</span><span class="s2">"</span>
    <span class="k">fi
    </span><span class="nb">echo</span>

    <span class="c"># (C) sacct summary</span>
    <span class="nb">echo</span> <span class="s2">"=== [3/3] sacct summary (WARNING: Inactive until job step completion!) ==="</span>
    <span class="nb">local </span>sacct_out
    <span class="nv">sacct_out</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>
        sacct <span class="nt">--noheader</span> <span class="se">\</span>
              <span class="nt">--format</span><span class="o">=</span>JobID%30,JobName%25,MaxRSS,MaxDiskWrite <span class="se">\</span>
              <span class="nt">-j</span> <span class="s2">"</span><span class="k">${</span><span class="nv">jobid</span><span class="k">}</span><span class="s2">"</span> 2&gt;/dev/null
    <span class="si">)</span><span class="s2">"</span>

    <span class="k">if</span> <span class="o">[[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$sacct_out</span><span class="s2">"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
        </span><span class="nb">echo</span> <span class="s2">"No sacct info found."</span>
    <span class="k">else</span>
        <span class="c"># Convert sacct_out into an array so we can optionally truncate</span>
        <span class="nb">mapfile</span> <span class="nt">-t</span> sacct_lines <span class="o">&lt;&lt;&lt;</span> <span class="s2">"</span><span class="nv">$sacct_out</span><span class="s2">"</span>

        <span class="c"># If maxSteps is given and numeric, we tail only last N lines</span>
        <span class="k">if</span> <span class="o">[[</span> <span class="nt">-n</span> <span class="s2">"</span><span class="nv">$maxSteps</span><span class="s2">"</span> <span class="o">&amp;&amp;</span> <span class="s2">"</span><span class="nv">$maxSteps</span><span class="s2">"</span> <span class="o">=</span>~ ^[0-9]+<span class="nv">$ </span><span class="o">]]</span><span class="p">;</span> <span class="k">then
            if</span> <span class="o">((</span> maxSteps &lt; <span class="k">${#</span><span class="nv">sacct_lines</span><span class="p">[@]</span><span class="k">}</span> <span class="o">))</span><span class="p">;</span> <span class="k">then</span>
                <span class="c"># Truncate to last N lines</span>
                <span class="nv">sacct_lines</span><span class="o">=(</span><span class="s2">"</span><span class="k">${</span><span class="nv">sacct_lines</span><span class="p">[@]</span>:<span class="p"> -</span><span class="nv">$maxSteps</span><span class="k">}</span><span class="s2">"</span><span class="o">)</span>
            <span class="k">fi
        fi

        </span><span class="nb">echo</span> <span class="s2">"JobID step      | JobName                    | MaxRSS/node | Total MaxRSS | MaxDiskWrite"</span>
        <span class="nb">echo</span> <span class="s2">"----------------|----------------------------|-------------|--------------|-------------"</span>
        <span class="k">for </span>line <span class="k">in</span> <span class="s2">"</span><span class="k">${</span><span class="nv">sacct_lines</span><span class="p">[@]</span><span class="k">}</span><span class="s2">"</span><span class="p">;</span> <span class="k">do</span>
            <span class="o">[[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$line</span><span class="s2">"</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> <span class="k">continue
            </span><span class="nv">IFS</span><span class="o">=</span><span class="s1">' '</span> <span class="nb">read</span> <span class="nt">-r</span> cJobID cJobName cMaxRSS cMaxDiskWrite <span class="o">&lt;&lt;&lt;</span> <span class="s2">"</span><span class="nv">$line</span><span class="s2">"</span>
            <span class="o">[[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$cJobID</span><span class="s2">"</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> <span class="k">continue

            </span><span class="nb">local </span>rssBytes diskBytes
            <span class="nv">rssBytes</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>to_bytes <span class="s2">"</span><span class="nv">$cMaxRSS</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span>
            <span class="nv">diskBytes</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>to_bytes <span class="s2">"</span><span class="nv">$cMaxDiskWrite</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span>

            <span class="nb">local </span>rssHuman diskHuman
            <span class="nv">rssHuman</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>to_mb_or_gb <span class="s2">"</span><span class="nv">$rssBytes</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span>
            <span class="nv">diskHuman</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>to_mb_or_gb <span class="s2">"</span><span class="nv">$diskBytes</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span>

            <span class="nb">local </span><span class="nv">totalRSSBytes</span><span class="o">=</span><span class="k">$((</span> rssBytes <span class="o">*</span> numNodes <span class="k">))</span>
            <span class="nb">local </span>totalRSSHuman
            <span class="nv">totalRSSHuman</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>to_mb_or_gb <span class="s2">"</span><span class="nv">$totalRSSBytes</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span>

            <span class="nb">printf</span> <span class="s2">"%-15s | %-26s | %-11s | %-12s | %s</span><span class="se">\n</span><span class="s2">"</span> <span class="se">\</span>
                <span class="s2">"</span><span class="nv">$cJobID</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$cJobName</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$rssHuman</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$totalRSSHuman</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$diskHuman</span><span class="s2">"</span>
        <span class="k">done
    fi</span>
<span class="o">}</span>

<span class="c"># Call the function with any arguments passed to the script</span>
jobstats <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>
</code></pre></div></div>

<h1 id="usage">Usage</h1>

<p>Place this script in a directory which is included in your <code class="language-plaintext highlighter-rouge">$PATH</code> variable to access it globally (e.g., to make  <code class="language-plaintext highlighter-rouge">/hpc/home/ukh/dotfiles</code> directory globally accessible, add the line <code class="language-plaintext highlighter-rouge">export PATH="/hpc/home/ukh/dotfiles/:$PATH"</code>  to your <code class="language-plaintext highlighter-rouge">~/.bashrc</code> and source it). You may also need to make it executable by typing the following command in the terminal.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>chmod +x jobstats.sh
</code></pre></div></div>

<p>Now we’re all set to use the script to monitor jobs. Say you’ve submitted a SLURM script for your calculation with <code class="language-plaintext highlighter-rouge">sbatch</code> and checked its status with <code class="language-plaintext highlighter-rouge">squeue</code>. You should get something like below.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>JOBID     PARTITION NAME             USER  ST     TIME  NODES   NODELIST(REASON)
25623378  courses   paracetamol-5N   ukh   R      0:00  5       dcc-courses-[15,31-34]
</code></pre></div></div>

<p>This job is running on 5 nodes with 84 cores each (with hyper-threading) on the <code class="language-plaintext highlighter-rouge">dcc-courses</code> partition. I requested 5 GB per cpu with <code class="language-plaintext highlighter-rouge">#SBATCH --mem-per-cpu=5G</code>, i.e. 2100 GB across all 5 nodes. The quantity <code class="language-plaintext highlighter-rouge">JOBID</code> is passed as an argument to <code class="language-plaintext highlighter-rouge">jobstats.sh</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>jobstats.sh &lt;JOBID&gt;
</code></pre></div></div>

<p>This gives the output shown below.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ jobstats.sh 25623378
=== [1/3] seff summary ===
Job ID: 25623378
Cluster: dcc
User/Group: ukh/dukeusers
State: RUNNING
Nodes: 5
Cores per node: 84
CPU Utilized: 00:00:00
CPU Efficiency: 0.00% of 2-08:07:00 core-walltime
Job Wall-clock time: 00:08:01
Memory Utilized: 0.00 MB
Memory Efficiency: 0.00% of 2.05 TB (5.00 GB/core)
WARNING: Efficiency statistics can only be obtained after the job has ended as seff tool is based on the accounting database data.

=== [2/3] sstat summary (LIVE) ===
JobID step      | MaxRSS/node | Total MaxRSS | MaxDiskWrite
----------------|------------ | ------------ | ------------
25623378.0      | 269.17GB    | 1345.87GB    | 175.70MB

=== [3/3] sacct summary (WARNING: Inactive until job step completion!) ===
JobID step      | JobName                    | MaxRSS/node | Total MaxRSS | MaxDiskWrite
----------------|----------------------------|-------------|--------------|-------------
25623378        | paracetamol-5N             | 0.00MB      | 0.00MB       | 0.00MB
25623378.batch  | batch                      | 0.00MB      | 0.00MB       | 0.00MB
25623378.extern | extern                     | 0.00MB      | 0.00MB       | 0.00MB
25623378.0      | hydra_bstrap_proxy         | 0.00MB      | 0.00MB       | 0.00MB
</code></pre></div></div>

<ul>
  <li><code class="language-plaintext highlighter-rouge">seff</code> shows partial statistics of the running job, including the number of nodes, cores per node, elapsed time, and requested memory.</li>
  <li><code class="language-plaintext highlighter-rouge">sacct</code> is inactive for a running job but lists different job steps.</li>
  <li><code class="language-plaintext highlighter-rouge">sstat</code> is most useful here, displaying <em>current</em> memory (<code class="language-plaintext highlighter-rouge">MaxRSS/node</code> and <code class="language-plaintext highlighter-rouge">Total MaxRSS</code>) and cumulative I/O usage (<code class="language-plaintext highlighter-rouge">MaxDiskWrite</code>).</li>
</ul>

<p><code class="language-plaintext highlighter-rouge">MaxRSS</code> (Maximum Resident Set Size) gives the maximum memory used for the job at this instant. Both the per-node (MaxRSS/node) and total (Total MaxRSS) values are displayed here. <code class="language-plaintext highlighter-rouge">MaxDiskWrite</code> gives the maximum amount of data written to disk at this instant for that job step. The script can be used with the <code class="language-plaintext highlighter-rouge">watch</code> utility in Unix to monitor real-time resource usage with,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>watch -n &lt;interval_in_seconds&gt; jobstats.sh &lt;JOBID&gt;
</code></pre></div></div>

<p>If <code class="language-plaintext highlighter-rouge">Total MaxRSS</code> exceeds the amount of memory requested for the job (2.05 TB in this case), then the calculation will be terminated with an “Out of Memory” error:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>slurmstepd: error: Detected 1 oom_kill event in StepId=25639552.0. Some of the step tasks have been OOM Killed.
srun: error: dcc-courses-28: task 0: Out Of Memory
</code></pre></div></div>

<p>Once the job finishes, the script will also show final statistics under <code class="language-plaintext highlighter-rouge">seff</code> and <code class="language-plaintext highlighter-rouge">sacct</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ jobstats.sh 25623378
=== [1/3] seff summary ===
Job ID: 25623378
Cluster: dcc
User/Group: ukh/dukeusers
State: COMPLETED (exit code 0)
Nodes: 5
Cores per node: 84
CPU Utilized: 1-03:31:25
CPU Efficiency: 46.44% of 2-11:16:00 core-walltime
Job Wall-clock time: 00:08:28
Memory Utilized: 1.31 TB
Memory Efficiency: 64.02% of 2.05 TB (5.00 GB/core)
The task which had the largest memory consumption differs by 100.12% from the average task max memory consumption

=== [2/3] sstat summary (LIVE) ===
Job 25623378 is not running. Check sacct summary.

=== [3/3] sacct summary (WARNING: Inactive until job step completion!) ===
JobID step      | JobName                    | MaxRSS/node | Total MaxRSS | MaxDiskWrite
----------------|----------------------------|-------------|--------------|-------------
25623378        | paracetamol-5N             | 0.00MB      | 0.00MB       | 0.00MB
25623378.batch  | batch                      | 89.84MB     | 449.20MB     | 1.38MB
25623378.extern | extern                     | 0.25MB      | 1.25MB       | 0.00MB
25623378.0      | hydra_bstrap_proxy         | 269.17GB    | 1345.87GB    | 175.70MB
</code></pre></div></div>

<p>“Memory Utilized” reflects the total memory used for the job which is equivalent to “Total MaxRSS” at the completion of the job. If the job terminates with an insufficient memory problem, request more memory in your SLURM submission script. For optimal resource usage, request sufficient memory while ensuring high “Memory Efficiency” ($\frac{Memory~Utilized}{Memory~Requested}\times100\%$) to make the best of memory resources.</p>

<p>The CPU Efficiency is given by,</p>

\[\text {CPU Efficiency}=\frac{\text {CPU Utilized time in seconds}}{(\text {Total no. of CPU cores }) \times(\text {Wall-clock time in seconds})} \times 100 \%\]

<p>The CPU Utilized time is the <em>actual</em> CPU time used by all the processes across all cores. Wall-clock time is the <em>real</em> clock time the job spent in the “RUNNING” state. The CPU Efficiency is a measure of <em>actual</em> CPU usage vs. total <em>possible</em> CPU usage. This job shows a 46.44% CPU Efficiency, which may be improved by changing the total number of cores used. However, it very much depends on how well your code is written to efficiently utilize resources.</p>

<p class="notice--warning"><strong>WARNING!</strong><br />SLURM’s resource usage statistics are best-effort approximations obtained via periodic sampling and kernel cgroup counters, so they may not capture transient peaks precisely and should be regarded as estimates rather than exact measurements.</p>

<p>The JobID step comprises of different stages of the job process, namely,</p>
<ul>
  <li>25623378 - The top-level parent job record</li>
  <li>25623378.batch - The “batch” step that typically runs your job script</li>
  <li>25623378.extern - The “extern” step SLURM uses internally for things like resource allocation</li>
  <li>25623378.0 - Another job step typically invoked by <code class="language-plaintext highlighter-rouge">srun</code> (e.g., a MPI task)</li>
</ul>

<p>Depending on the number of compute steps or sub-processes that may be called within a calculation(s), the JobID step could contain a large number of steps. For such instances, you may truncate the output by passing an argument to <code class="language-plaintext highlighter-rouge">jobstats.sh</code> with the number of trailing lines to filter the most recent steps with,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>jobstats.sh &lt;JobID&gt; &lt;number_of_trailing_lines&gt;
</code></pre></div></div>

<p>For example, <code class="language-plaintext highlighter-rouge">jobstats.sh 1839519 10</code> shows only the last 10 lines of <code class="language-plaintext highlighter-rouge">sacct</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>=== [1/3] seff summary ===
Job ID: 1839519
Cluster: thornyflat
User/Group: ukh0001/its-rc-thorny
State: RUNNING
Nodes: 2
Cores per node: 40
CPU Utilized: 852-16:05:48
CPU Efficiency: 44.23% of 1927-16:30:40 core-walltime
Job Wall-clock time: 24-02:18:23
Memory Utilized: 164.15 GB (estimated maximum)
Memory Efficiency: 0.00% of 16.00 B (8.00 B/node)
WARNING: Efficiency statistics may be misleading for RUNNING jobs.

=== [2/3] sstat summary (LIVE) ===
JobID step      | MaxRSS/node | Total MaxRSS | MaxDiskWrite
----------------|------------ | ------------ | ------------
1839519.658     | 3.74GB      | 7.49GB       | 913.36MB

=== [3/3] sacct summary (WARNING: Inactive until job step completion!) ===
JobID step      | JobName                    | MaxRSS/node | Total MaxRSS | MaxDiskWrite
----------------|----------------------------|-------------|--------------|-------------
1839519.649     | wannier90.x                | 561.31MB    | 1.10GB       | 137.15MB
1839519.650     | XHF0.py                    | 108.22MB    | 216.44MB     | 435.06MB
1839519.651     | dmft.x                     | 434.59MB    | 869.19MB     | 36.82MB
1839519.652     | ctqmc                      | 225.77MB    | 451.54MB     | 0.01MB
1839519.653     | ctqmc                      | 218.11MB    | 436.21MB     | 2.51MB
1839519.654     | ctqmc                      | 220.24MB    | 440.48MB     | 2.51MB
1839519.655     | ctqmc                      | 229.46MB    | 458.93MB     | 2.53MB
1839519.656     | ctqmc                      | 230.46MB    | 460.91MB     | 2.54MB
1839519.657     | ctqmc                      | 220.06MB    | 440.12MB     | 2.54MB
1839519.658     | vaspDMFT                   | 0.00MB      | 0.00MB       | 0.00MB
</code></pre></div></div>

<p>This tells us that <code class="language-plaintext highlighter-rouge">vaspDMFT</code> is the currently active step in the job process.</p>

<p>I hope this script helps you monitor and optimize your HPC jobs more effectively. Let me know in the comments if it works for you.</p>
<h1 id="references">References</h1>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://graduateschool.bulletins.duke.edu/courses/0261981">https://graduateschool.bulletins.duke.edu/courses/0261981</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://oit-rc.pages.oit.duke.edu/rcsupportdocs">https://oit-rc.pages.oit.duke.edu/rcsupportdocs</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://fhi-aims.org">https://fhi-aims.org</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://support.schedmd.com/show_bug.cgi?id=1611">https://support.schedmd.com/show_bug.cgi?id=1611</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://support.schedmd.com/show_bug.cgi?id=1611">https://slurm.schedmd.com/sstat.html</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://slurm.schedmd.com/sacct.html">https://slurm.schedmd.com/sacct.html</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Uthpala Herath</name></author><category term="hpc" /><category term="bash" /><category term="linux" /><summary type="html"><![CDATA[Efficient resource monitoring is key to maximizing HPC performance. In this post, I explore techniques for advanced resource monitoring on HPC clusters and provide a script that utilizes seff, sstat, and sacct to track CPU and memory usage effectively.]]></summary></entry><entry><title type="html">Carved a Pumpkin? Seed the Possibilities!</title><link href="https://www.uthpalaherath.com/Carved-a-Pumpkin-Seed-the-Possibilities/" rel="alternate" type="text/html" title="Carved a Pumpkin? Seed the Possibilities!" /><published>2024-11-01T00:00:00+00:00</published><updated>2024-11-01T00:00:00+00:00</updated><id>https://www.uthpalaherath.com/Carved-a-Pumpkin-Seed-the-Possibilities</id><content type="html" xml:base="https://www.uthpalaherath.com/Carved-a-Pumpkin-Seed-the-Possibilities/"><![CDATA[<p class="notice--primary"><em>This is a quick and simple recipe to roast pumpkin seeds.</em></p>

<p>Pumpkin carving season just ended so you may be wondering what to do with those leftover pumpkin seeds. Not only are they delicious, but they’re also packed with nutrients such as magnesium, zinc, and healthy fats. So don’t toss them away. Let me show you what I ended up doing with them.</p>

<h1 id="ingredients">Ingredients</h1>

<ul>
  <li>Pumpkin seeds</li>
  <li>Olive oil</li>
  <li>Ground rosemary</li>
  <li>Salt and pepper</li>
  <li>Garlic powder</li>
  <li>Cayenne pepper powder (or any chili powder depending on your spice tolerance)</li>
</ul>

<h1 id="steps">Steps</h1>

<p>1. <strong>Clean and dry the seeds:</strong> Place the seeds in a colander and rinse under cold water to remove the pumpkin pulp. Pat them dry with a paper towel. This helps them crisp up in the oven.</p>

<p><img src="/assets/media/2024-11-01-Carved-a-Pumpkin-Seed-the-Possibilities/2024-11-01-Carved-a-Pumpkin-Seed-the-Possibilities-20241103130417771.jpg" alt="2024-11-01-Carved-a-Pumpkin-Seed-the-Possibilities-20241103130417771" /></p>

<p>2. <strong>Season the seeds:</strong> On a baking tray, toss the seeds with with olive oil, ground rosemary, salt and pepper, garlic powder and Cayenne pepper powder. You may replace the Cayenne pepper powder with a different type of chili powder depending on your spice tolerance.</p>

<p><img src="/assets/media/2024-11-01-Carved-a-Pumpkin-Seed-the-Possibilities/2024-11-01-Carved-a-Pumpkin-Seed-the-Possibilities-20241103130427326.jpg" alt="2024-11-01-Carved-a-Pumpkin-Seed-the-Possibilities-20241103130427326" /></p>

<p>3. <strong>Roast:</strong> Preheat your oven to 325° F (165° C). Spread the seasoned seeds in a single layer on a baking sheet lined with parchment paper or aluminum foil. Roast for 20-25 minutes, until they’re golden brown.</p>

<p><img src="/assets/media/2024-11-01-Carved-a-Pumpkin-Seed-the-Possibilities/2024-11-01-Carved-a-Pumpkin-Seed-the-Possibilities-20241103130433344.jpg" alt="2024-11-01-Carved-a-Pumpkin-Seed-the-Possibilities-20241103130433344" /></p>

<p>4. <strong>Cool and Enjoy:</strong> Let the seeds cool. This will ensure they will crisp up a little more.</p>

<p>Finally, sit in front of your favorite Fall movie and enjoy!</p>

<p><img src="/assets/media/2024-11-01-Carved-a-Pumpkin-Seed-the-Possibilities/2024-11-01-Carved-a-Pumpkin-Seed-the-Possibilities-20241103130440057.jpg" alt="2024-11-01-Carved-a-Pumpkin-Seed-the-Possibilities-20241103130440057" /></p>

<p>If you try this recipe, let me know how it goes in the comments below. 
Cheers and happy snacking, friends!</p>]]></content><author><name>Uthpala Herath</name></author><category term="recipes" /><category term="cooking" /><category term="hobbies" /><summary type="html"><![CDATA[This is a quick and simple recipe to roast pumpkin seeds.]]></summary></entry><entry><title type="html">Embedding a video gallery in Jekyll websites</title><link href="https://www.uthpalaherath.com/Embedding-a-video-gallery-in-Jekyll-websites/" rel="alternate" type="text/html" title="Embedding a video gallery in Jekyll websites" /><published>2024-09-22T00:00:00+00:00</published><updated>2024-09-22T00:00:00+00:00</updated><id>https://www.uthpalaherath.com/Embedding-a-video-gallery-in-Jekyll-websites</id><content type="html" xml:base="https://www.uthpalaherath.com/Embedding-a-video-gallery-in-Jekyll-websites/"><![CDATA[<p class="notice--primary"><em>A simple tutorial on embedding a video gallery in a Minimal Mistakes themed Jekyll website.</em></p>

<p><img src="/assets/media/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites-20240922105246495.png" alt="2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites-20240922105246495" /></p>

<p>The <a href="https://mmistakes.github.io/minimal-mistakes/">Minimal Mistakes</a> theme <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> for Jekyll websites by Michael Rose offers a great way to embed images in a <a href="https://mmistakes.github.io/minimal-mistakes/post%20formats/post-gallery/">gallery layout</a>. I wanted to do the same for videos. In this post I will show you how I achieved that.</p>

<p>First, we create a gallery layout using a custom CSS by adding the following snippet in <code class="language-plaintext highlighter-rouge">assets/css/main.scss</code>.</p>

<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">.video-gallery</span> <span class="p">{</span>
  <span class="nl">display</span><span class="p">:</span> <span class="n">flex</span><span class="p">;</span>
  <span class="nl">flex-wrap</span><span class="p">:</span> <span class="n">wrap</span><span class="p">;</span>
  <span class="py">gap</span><span class="p">:</span> <span class="m">20px</span><span class="p">;</span> <span class="c">/* Space between the videos */</span>
  <span class="nl">justify-content</span><span class="p">:</span> <span class="n">space-between</span><span class="p">;</span>
<span class="p">}</span>

<span class="nc">.video-item</span> <span class="p">{</span>
  <span class="nl">flex</span><span class="p">:</span> <span class="m">1</span> <span class="m">1</span> <span class="n">calc</span><span class="p">(</span><span class="m">33.33%</span> <span class="n">-</span> <span class="m">20px</span><span class="p">);</span> <span class="c">/* 33.33% width for 3 items per row */</span>
  <span class="nl">margin-bottom</span><span class="p">:</span> <span class="m">20px</span><span class="p">;</span> <span class="c">/* Space between rows */</span>
<span class="p">}</span>

<span class="nc">.video-item</span> <span class="nt">video</span> <span class="p">{</span>
  <span class="nl">width</span><span class="p">:</span> <span class="m">100%</span><span class="p">;</span> <span class="c">/* Make videos responsive */</span>
  <span class="nl">height</span><span class="p">:</span> <span class="nb">auto</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Here, we use <code class="language-plaintext highlighter-rouge">flex-wrap:wrap;</code> to wrap items onto the next line. <code class="language-plaintext highlighter-rouge">flex: 1 1 calc(33.33% - 20px);</code> ensures that each video takes up one-third of the available space (minus the gap) such that we limit 3 videos per row. You can modify this according to your needs.</p>

<p>Next, in your markdown file you can embed the video gallery using a HTML structure as follows.</p>

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"video-gallery"</span><span class="nt">&gt;</span>
  <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"video-item"</span><span class="nt">&gt;</span>
    <span class="nt">&lt;video</span> <span class="na">width=</span><span class="s">"560"</span> <span class="na">height=</span><span class="s">"315"</span> <span class="na">controls</span><span class="nt">&gt;</span>
      <span class="nt">&lt;source</span> <span class="na">src=</span><span class="s">"/assets/videos/video1.mp4"</span> <span class="na">type=</span><span class="s">"video/mp4"</span><span class="nt">&gt;</span>
      Your browser does not support the video tag.
    <span class="nt">&lt;/video&gt;</span>
  <span class="nt">&lt;/div&gt;</span>
  <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"video-item"</span><span class="nt">&gt;</span>
    <span class="nt">&lt;video</span> <span class="na">width=</span><span class="s">"560"</span> <span class="na">height=</span><span class="s">"315"</span> <span class="na">controls</span><span class="nt">&gt;</span>
      <span class="nt">&lt;source</span> <span class="na">src=</span><span class="s">"/assets/videos/video2.mp4"</span> <span class="na">type=</span><span class="s">"video/mp4"</span><span class="nt">&gt;</span>
      Your browser does not support the video tag.
    <span class="nt">&lt;/video&gt;</span>
  <span class="nt">&lt;/div&gt;</span>
  <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"video-item"</span><span class="nt">&gt;</span>
    <span class="nt">&lt;video</span> <span class="na">width=</span><span class="s">"560"</span> <span class="na">height=</span><span class="s">"315"</span> <span class="na">controls</span><span class="nt">&gt;</span>
      <span class="nt">&lt;source</span> <span class="na">src=</span><span class="s">"/assets/videos/video3.mp4"</span> <span class="na">type=</span><span class="s">"video/mp4"</span><span class="nt">&gt;</span>
      Your browser does not support the video tag.
    <span class="nt">&lt;/video&gt;</span>
  <span class="nt">&lt;/div&gt;</span>
  <span class="c">&lt;!-- Add more video items as needed --&gt;</span>
<span class="nt">&lt;/div&gt;</span>
</code></pre></div></div>

<p>Optionally, you may also want to set,</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code>toc: false
classes: wide
</code></pre></div></div>

<p>in your YAML front matter to ensure all the available space is being used for the gallery layout.</p>

<p>On some browsers (e.g.- Chrome), <code class="language-plaintext highlighter-rouge">.mov</code> files aren’t rendered correctly so we have to convert them to <code class="language-plaintext highlighter-rouge">.mp4</code> with the <code class="language-plaintext highlighter-rouge">H.264</code> codec. I used the <a href="https://www.ffmpeg.org">ffmpeg</a> <sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> utility to do the conversion as follows,</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ffmpeg <span class="nt">-i</span> video1.mov <span class="nt">-vcodec</span> h264 <span class="nt">-acodec</span> aac video1.mp4
</code></pre></div></div>

<p>To showcase the gallery layout view, I am sharing some videos I captured from a Breaking Benjamin, Staind, Daughtry and Lakeview concert <sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> I attended in Raleigh the other day. Enjoy some of my favorite bands I’ve been listening to since high school!</p>

<div class="video-gallery">
  <div class="video-item">
    <video width="560" height="315" controls="">
      <source src="/assets/media/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites-20240922114125601.mp4" type="video/mp4" />
      Your browser does not support the video tag.
    </video>
  </div>
  <div class="video-item">
    <video width="560" height="315" controls="">
      <source src="/assets/media/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites-20240922114135400.mp4" type="video/mp4" />
      Your browser does not support the video tag.
    </video>
  </div>
  <div class="video-item">
    <video width="560" height="315" controls="">
      <source src="/assets/media/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites-20240922114143925.mp4" type="video/mp4" />
      Your browser does not support the video tag.
    </video>
  </div>
  <div class="video-item">
    <video width="560" height="315" controls="">
      <source src="/assets/media/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites-20240922114151896.mp4" type="video/mp4" />
      Your browser does not support the video tag.
    </video>
  </div>
  <div class="video-item">
    <video width="560" height="315" controls="">
      <source src="/assets/media/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites-20240922114203117.mp4" type="video/mp4" />
      Your browser does not support the video tag.
    </video>
  </div>
  <div class="video-item">
    <video width="560" height="315" controls="">
      <source src="/assets/media/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites/2024-09-22-Embedding-a-video-gallery-in-Jekyll-websites-20240922114211171.mp4" type="video/mp4" />
      Your browser does not support the video tag.
    </video>
  </div>
</div>

<h1 id="references">References</h1>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://mmistakes.github.io/minimal-mistakes/">https://mmistakes.github.io/minimal-mistakes/</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://www.ffmpeg.org">https://www.ffmpeg.org</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://www.livenationentertainment.com/2024/03/rock-icons-staind-and-breaking-benjamin-announce-co-headline-tour-with-special-guests-daughtry-and-lakeview/">https://www.livenationentertainment.com/2024/03/rock-icons-staind-and-breaking-benjamin-announce-co-headline-tour-with-special-guests-daughtry-and-lakeview/</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Uthpala Herath</name></author><category term="tutorials" /><category term="jekyll" /><category term="minimal-mistakes" /><category term="html" /><summary type="html"><![CDATA[A simple tutorial on embedding a video gallery in a Minimal Mistakes themed Jekyll website.]]></summary></entry><entry><title type="html">Atomate2 workflows for FHI-aims</title><link href="https://www.uthpalaherath.com/Atomate2-workflows-for-FHI-aims/" rel="alternate" type="text/html" title="Atomate2 workflows for FHI-aims" /><published>2024-09-20T00:00:00+00:00</published><updated>2024-09-20T00:00:00+00:00</updated><id>https://www.uthpalaherath.com/Atomate2-workflows-for-FHI-aims</id><content type="html" xml:base="https://www.uthpalaherath.com/Atomate2-workflows-for-FHI-aims/"><![CDATA[<p class="notice--primary"><em>This brief tutorial provides an introduction to using Atomate2 to set up DFT workflows with the FHI-aims code.</em></p>

<p width="50%"><img src="/assets/media/2024-09-20-Atomate2-workflows-for-FHI-aims/2024-09-20-Atomate2-workflows-for-FHI-aims-20240920121324117.png" alt="2024-09-20-Atomate2-workflows-for-FHI-aims-20240920121324117" /></p>

<p>With the increasing availability of High Performance Computing (HPC) and High Throughput Computing (HTC), the use of efficient tools to perform complex Density Functional Theory (DFT) calculations is critical for advancing materials design. Atomate2 <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> is such a tool that can create workflows for various DFT codes. This is a brief tutorial on running FHI-aims <sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> workflows with Atomate2. It guides a user to setup a conda environment, MongoDB <sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>, Pymatgen <sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>, jobflow_remote <sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup> and Atomate2 to run a relaxation calculation for a Si structure using a <code class="language-plaintext highlighter-rouge">light</code>  basis set on a 7x7x7 k-grid. The first section discusses running a calculation on a local computer followed by instructions on launching a calculation on a remote server and storing the output data in a MongoDB database which can be  post-processed with Python.</p>

<h1 id="creating-a-conda-environment">Creating a conda environment</h1>

<p>Create conda environment  <code class="language-plaintext highlighter-rouge">atomate2</code> (or any arbitrary name) and activate it.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>conda create <span class="nt">-n</span> atomate2 <span class="nv">python</span><span class="o">=</span>3.10
conda activate atomate2
</code></pre></div></div>

<p>Install the following packages.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip install atomate2 ase pymatgen jobflow_remote
</code></pre></div></div>

<h1 id="installing-mongodb">Installing MongoDB</h1>

<p>Atomate2 uses MongoDB to store the data output from the calculations. This is stored in a <code class="language-plaintext highlighter-rouge">.json</code>-like format which makes it easier to access through Python for further post-processing. To install MongoDB through Homebrew, run the following commands on a terminal.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew tap mongodb/brew
brew update
brew <span class="nb">install </span>mongodb-community
</code></pre></div></div>

<p>Start the MongoDB server with,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew services start mongodb-community
</code></pre></div></div>

<p>You can check if the service is running with <code class="language-plaintext highlighter-rouge">brew services list</code>. It should display something like <code class="language-plaintext highlighter-rouge">mongodb-community started uthpala ~/Library/LaunchAgents/homebrew.mxcl.mongodb-community</code>. For installing on other platforms see <a href="https://www.mongodb.com/docs/manual/administration/install-community/">this</a> page.
If you would like to see your databases in a GUI, consider installing <a href="https://www.mongodb.com/cloud/atlas/register">MongoDB Atlas</a>.</p>

<h1 id="configuring-atomate2">Configuring Atomate2</h1>

<p>Next, we have to configure Atomate2 to use FHI-aims. This is done through two <code class="language-plaintext highlighter-rouge">yaml</code> files. I stored them in <code class="language-plaintext highlighter-rouge">/Users/uthpala/atomate-workflows/config</code>.</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">atomate2.yaml:</code>
    <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="na">AIMS_CMD</span><span class="pi">:</span> <span class="s">mpirun aims.x &gt; aims.out</span>
 <span class="na">AIMS_ZIP_FILES</span><span class="pi">:</span> <span class="s">atomate</span>
</code></pre></div>    </div>

    <p>Here, <code class="language-plaintext highlighter-rouge">aims.x</code> is the FHI-aims binary. Consider adding the location of <code class="language-plaintext highlighter-rouge">aims.x</code> to the <code class="language-plaintext highlighter-rouge">$PATH</code> environmental variable to provide global access to it.</p>
  </li>
  <li><code class="language-plaintext highlighter-rouge">jobflow.yaml:</code>
    <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="na">JOB_STORE</span><span class="pi">:</span>
   <span class="na">docs_store</span><span class="pi">:</span>
     <span class="na">type</span><span class="pi">:</span> <span class="s">MongoStore</span>
     <span class="na">database</span><span class="pi">:</span> <span class="s">atomate2</span>
     <span class="na">host</span><span class="pi">:</span> <span class="s">localhost</span>
     <span class="na">port</span><span class="pi">:</span> <span class="m">27017</span>
     <span class="na">collection_name</span><span class="pi">:</span> <span class="s">outputs</span>
   <span class="na">additional_stores</span><span class="pi">:</span>
     <span class="na">data</span><span class="pi">:</span>
       <span class="na">type</span><span class="pi">:</span> <span class="s">GridFSStore</span>
       <span class="na">database</span><span class="pi">:</span> <span class="s">atomate2</span>
       <span class="na">host</span><span class="pi">:</span> <span class="s">localhost</span>
       <span class="na">port</span><span class="pi">:</span> <span class="m">27017</span>
       <span class="na">collection_name</span><span class="pi">:</span> <span class="s">outputs_blobs</span>
</code></pre></div>    </div>
  </li>
</ol>

<p>The locations of these files are then added to the <code class="language-plaintext highlighter-rouge">~/.bashrc</code> file (<code class="language-plaintext highlighter-rouge">~/.zshrc</code> on a Mac) as,</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">ATOMATE2_CONFIG_FILE</span><span class="o">=</span><span class="s2">"/Users/uthpala/atomate-workflows/config/atomate2.yaml"</span>
<span class="nb">export </span><span class="nv">JOBFLOW_CONFIG_FILE</span><span class="o">=</span><span class="s2">"/Users/uthpala/atomate-workflows/config/jobflow.yaml"</span>
</code></pre></div></div>

<p>The database <code class="language-plaintext highlighter-rouge">atomate2</code> along with the collections <code class="language-plaintext highlighter-rouge">outputs</code> and <code class="language-plaintext highlighter-rouge">outputs_blobs</code> need to be created. To do that, log in to the MongoDB server with the command <code class="language-plaintext highlighter-rouge">mongosh</code> and run the following.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>use atomate2<span class="p">;</span>
db.createCollection<span class="o">(</span><span class="s2">"outputs"</span><span class="o">)</span><span class="p">;</span>
db.createCollection<span class="o">(</span><span class="s2">"outputs_blobs"</span><span class="o">)</span><span class="p">;</span>
</code></pre></div></div>

<p>Additionally, as the FHI-aims species defaults are parsed through Pymatgen, the location has to be added to <code class="language-plaintext highlighter-rouge">~/.config/.pmgrc.yaml</code> as follows,</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">AIMS_SPECIES_DIR</span><span class="pi">:</span> <span class="s2">"</span><span class="s">/Users/uthpala/apps/FHIaims/FHIaims/species_defaults/"</span>
</code></pre></div></div>

<h1 id="running-locally">Running locally</h1>

<p>The following script is used to run a relaxation calculation locally.</p>

<p><code class="language-plaintext highlighter-rouge">si_relax.py:</code></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env python
</span>
<span class="kn">from</span> <span class="nn">pymatgen.core</span> <span class="kn">import</span> <span class="n">Structure</span><span class="p">,</span> <span class="n">Molecule</span><span class="p">,</span> <span class="n">Lattice</span>
<span class="kn">from</span> <span class="nn">pymatgen.io.aims.sets.core</span> <span class="kn">import</span> <span class="n">RelaxSetGenerator</span>
<span class="kn">from</span> <span class="nn">atomate2.aims.jobs.core</span> <span class="kn">import</span> <span class="n">RelaxMaker</span>
<span class="kn">from</span> <span class="nn">jobflow</span> <span class="kn">import</span> <span class="n">run_locally</span>

<span class="n">a</span> <span class="o">=</span> <span class="mf">2.715</span>
<span class="n">lattice</span> <span class="o">=</span> <span class="n">Lattice</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">a</span><span class="p">](</span><span class="mf">0.0</span><span class="p">,</span><span class="o">%</span><span class="mi">20</span><span class="n">a</span><span class="p">,</span><span class="o">%</span><span class="mi">20</span><span class="n">a</span><span class="p">))</span>
<span class="n">si</span> <span class="o">=</span> <span class="n">Structure</span><span class="p">(</span>
    <span class="n">lattice</span><span class="o">=</span><span class="n">lattice</span><span class="p">,</span>
    <span class="n">species</span><span class="o">=</span><span class="p">[</span><span class="s">"Si"</span><span class="p">,</span> <span class="s">"Si"</span><span class="p">],</span>
    <span class="n">coords</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">](</span><span class="mi">0</span><span class="p">,</span><span class="o">%</span><span class="mi">200</span><span class="p">,</span><span class="o">%</span><span class="mi">200</span><span class="p">),</span>
<span class="p">)</span>

<span class="c1"># Create relax job
</span><span class="n">relax_job</span> <span class="o">=</span> <span class="n">RelaxMaker</span><span class="p">(</span>
    <span class="n">input_set_generator</span><span class="o">=</span><span class="n">RelaxSetGenerator</span><span class="p">(</span>
        <span class="n">user_params</span><span class="o">=</span><span class="p">{</span><span class="s">"species_dir"</span><span class="p">:</span> <span class="s">"light"</span><span class="p">,</span> <span class="s">"k_grid"</span><span class="p">:</span> <span class="p">[</span><span class="mi">7</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">7</span><span class="p">]}</span>
    <span class="p">)</span>
<span class="p">).</span><span class="n">make</span><span class="p">(</span><span class="n">si</span><span class="p">)</span>

<span class="c1"># Run relax job locally
</span><span class="n">j_id</span> <span class="o">=</span> <span class="n">run_locally</span><span class="p">(</span><span class="n">relax_job</span><span class="p">)</span>
</code></pre></div></div>

<p>In your terminal, run:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python si_relax.py
</code></pre></div></div>

<p>Once the calculation is complete, the output information is parsed from <code class="language-plaintext highlighter-rouge">aims.out</code> and stored in the MongoDB database <code class="language-plaintext highlighter-rouge">atomate2</code> under the collection <code class="language-plaintext highlighter-rouge">outputs</code>.</p>

<h1 id="running-remotely">Running remotely</h1>

<p>To run Atomate2 in a remote cluster, repeat the previous steps of creating a conda environment and installing the Python libraries on that cluster. Note that this is only possible for clusters you have password-less access to, which can be achieved by using <code class="language-plaintext highlighter-rouge">ssh-keys</code>. The calculation request is initiated in your local computer and is sent to the remote cluster through the <code class="language-plaintext highlighter-rouge">jobflow_remote</code> library which is then actually run on that cluster based on the input parameters you provided.</p>

<p>Add the following jobflow_remote configuration file to <code class="language-plaintext highlighter-rouge">~/.jfremote/timewarp.yaml</code> on your <strong>local</strong> computer. Replace <code class="language-plaintext highlighter-rouge">timewarp.yaml</code> with the name of your <strong>remote</strong> server. <code class="language-plaintext highlighter-rouge">pre_run</code> invokes the commands run prior to running the calculation on the <strong>remote</strong> server. Make sure you are activating an <code class="language-plaintext highlighter-rouge">automate2</code> conda environment. <code class="language-plaintext highlighter-rouge">work_dir</code> is the location the calculations are run on the <strong>remote</strong> server. Modify this according to your needs.</p>

<p><code class="language-plaintext highlighter-rouge">timewarp.yaml:</code></p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">name</span><span class="pi">:</span> <span class="s">timewarp</span>
<span class="na">log_level</span><span class="pi">:</span> <span class="s">debug</span>
<span class="na">workers</span><span class="pi">:</span>
  <span class="na">timewarp_worker</span><span class="pi">:</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">remote</span>
    <span class="na">interactive_login</span><span class="pi">:</span> <span class="no">false</span>
    <span class="na">scheduler_type</span><span class="pi">:</span> <span class="s">slurm</span>
    <span class="na">work_dir</span><span class="pi">:</span> <span class="s">/home/ukh/atomate2</span>
    <span class="na">pre_run</span><span class="pi">:</span> <span class="pi">|</span>
      <span class="s">source ~/.bashrc</span>
      <span class="s">intel</span>
      <span class="s">conda activate atomate2</span>
    <span class="na">timeout_execute</span><span class="pi">:</span> <span class="m">60</span>
    <span class="na">host</span><span class="pi">:</span> <span class="s">timewarp-02.egr.duke.edu</span>
    <span class="na">user</span><span class="pi">:</span> <span class="s">ukh</span>
<span class="na">queue</span><span class="pi">:</span>
  <span class="na">store</span><span class="pi">:</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">MongoStore</span>
    <span class="na">host</span><span class="pi">:</span> <span class="s">localhost</span>
    <span class="na">database</span><span class="pi">:</span> <span class="s">timewarp</span>
    <span class="na">collection_name</span><span class="pi">:</span> <span class="s">queue</span>
<span class="na">exec_config</span><span class="pi">:</span> <span class="pi">{}</span>
<span class="na">jobstore</span><span class="pi">:</span>
  <span class="na">docs_store</span><span class="pi">:</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">MongoStore</span>
    <span class="na">database</span><span class="pi">:</span> <span class="s">timewarp</span>
    <span class="na">host</span><span class="pi">:</span> <span class="s">localhost</span>
    <span class="na">port</span><span class="pi">:</span> <span class="m">27017</span>
    <span class="na">collection_name</span><span class="pi">:</span> <span class="s">outputs</span>
  <span class="na">additional_stores</span><span class="pi">:</span>
    <span class="na">data</span><span class="pi">:</span>
      <span class="na">type</span><span class="pi">:</span> <span class="s">GridFSStore</span>
      <span class="na">database</span><span class="pi">:</span> <span class="s">timewarp</span>
      <span class="na">host</span><span class="pi">:</span> <span class="s">localhost</span>
      <span class="na">port</span><span class="pi">:</span> <span class="m">27017</span>
      <span class="na">collection_name</span><span class="pi">:</span> <span class="s">outputs_blobs</span>
</code></pre></div></div>

<p>Additionally, the Pymatgen configuration file has to be added to the <strong>remote</strong> server to point to the location of the FHI-aims <code class="language-plaintext highlighter-rouge">species_defaults</code> folder on the <strong>remote</strong> server. i.e. add the equivalent of the following to your <code class="language-plaintext highlighter-rouge">~/.config/.pmgrc.yaml</code>  on the <strong>remote</strong> server.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">AIMS_SPECIES_DIR</span><span class="pi">:</span> <span class="s2">"</span><span class="s">/home/ukh/local/FHIaims/species_defaults/"</span>
</code></pre></div></div>

<p>For the remote server calculations, we store the data in the MongoDB database <code class="language-plaintext highlighter-rouge">timewarp</code> with the collections <code class="language-plaintext highlighter-rouge">queue</code>, <code class="language-plaintext highlighter-rouge">outputs</code> and <code class="language-plaintext highlighter-rouge">outputs_blobs</code>. Create these through <code class="language-plaintext highlighter-rouge">mongosh</code> with,</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>use timewarp<span class="p">;</span>
db.createCollection<span class="o">(</span><span class="s2">"queue"</span><span class="o">)</span><span class="p">;</span>
db.createCollection<span class="o">(</span><span class="s2">"outputs"</span><span class="o">)</span><span class="p">;</span>
db.createCollection<span class="o">(</span><span class="s2">"outputs_blobs"</span><span class="o">)</span><span class="p">;</span>
</code></pre></div></div>

<p>On the <strong>remote</strong> server, create the file <code class="language-plaintext highlighter-rouge">~/.config/atomate2/atomate2.yaml</code> with the following content.</p>

<p><code class="language-plaintext highlighter-rouge">atomate2.yaml:</code></p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">AIMS_CMD</span><span class="pi">:</span> <span class="s">srun aims.x &gt; aims.out</span>
<span class="na">AIMS_ZIP_FILES</span><span class="pi">:</span> <span class="s">atomate</span>
</code></pre></div></div>

<p>Then add the following line to the <code class="language-plaintext highlighter-rouge">~/.bashrc</code> on the <strong>remote</strong> server,</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">ATOMATE2_CONFIG_FILE</span><span class="o">=</span><span class="s2">"/home/ukh/.config/atomate2/atomate2.yaml"</span>
</code></pre></div></div>

<p>We have to then start the jf-runner on the <strong>local</strong> computer with the following commands.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>jf runner start
jf admin reset
</code></pre></div></div>

<p>Any time you change the contents of <code class="language-plaintext highlighter-rouge">~/.jfremote/timewarp.yaml</code>, you will have to restart the runner after first stopping it with <code class="language-plaintext highlighter-rouge">jf runner stop</code>.</p>

<h2 id="running-the-calculation">Running the calculation</h2>

<p>The python code, <code class="language-plaintext highlighter-rouge">si_relax_remote.py</code> is used to run a FHI-aims relaxation calculation on the remote server, requested from the local computer.</p>

<p><code class="language-plaintext highlighter-rouge">si_relax_remote.py:</code></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env python
</span>
<span class="kn">from</span> <span class="nn">pymatgen.core</span> <span class="kn">import</span> <span class="n">Structure</span><span class="p">,</span> <span class="n">Molecule</span><span class="p">,</span> <span class="n">Lattice</span>
<span class="kn">from</span> <span class="nn">pymatgen.io.aims.sets.core</span> <span class="kn">import</span> <span class="n">RelaxSetGenerator</span>
<span class="kn">from</span> <span class="nn">atomate2.aims.jobs.core</span> <span class="kn">import</span> <span class="n">RelaxMaker</span>
<span class="kn">from</span> <span class="nn">jobflow_remote</span> <span class="kn">import</span> <span class="n">submit_flow</span>

<span class="n">a</span> <span class="o">=</span> <span class="mf">2.715</span>
<span class="n">lattice</span> <span class="o">=</span> <span class="n">Lattice</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">a</span><span class="p">](</span><span class="mf">0.0</span><span class="p">,</span><span class="o">%</span><span class="mi">20</span><span class="n">a</span><span class="p">,</span><span class="o">%</span><span class="mi">20</span><span class="n">a</span><span class="p">))</span>
<span class="n">si</span> <span class="o">=</span> <span class="n">Structure</span><span class="p">(</span>
    <span class="n">lattice</span><span class="o">=</span><span class="n">lattice</span><span class="p">,</span>
    <span class="n">species</span><span class="o">=</span><span class="p">[</span><span class="s">"Si"</span><span class="p">,</span> <span class="s">"Si"</span><span class="p">],</span>
    <span class="n">coords</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">](</span><span class="mi">0</span><span class="p">,</span><span class="o">%</span><span class="mi">200</span><span class="p">,</span><span class="o">%</span><span class="mi">200</span><span class="p">),</span>
<span class="p">)</span>

<span class="c1"># Create relax job
</span><span class="n">relax_job</span> <span class="o">=</span> <span class="n">RelaxMaker</span><span class="p">(</span>
    <span class="n">input_set_generator</span><span class="o">=</span><span class="n">RelaxSetGenerator</span><span class="p">(</span>
        <span class="n">user_params</span><span class="o">=</span><span class="p">{</span><span class="s">"species_dir"</span><span class="p">:</span> <span class="s">"light"</span><span class="p">,</span> <span class="s">"k_grid"</span><span class="p">:</span> <span class="p">[</span><span class="mi">7</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">7</span><span class="p">]}</span>
    <span class="p">)</span>
<span class="p">).</span><span class="n">make</span><span class="p">(</span><span class="n">si</span><span class="p">)</span>

<span class="n">resource</span> <span class="o">=</span> <span class="p">{</span><span class="s">"nodes"</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span> <span class="s">"ntasks_per_node"</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span> <span class="s">"partition"</span><span class="p">:</span> <span class="s">"small"</span><span class="p">}</span>

<span class="c1"># Run relax job remotely
</span><span class="n">j_id</span> <span class="o">=</span> <span class="n">submit_flow</span><span class="p">(</span><span class="n">relax_job</span><span class="p">,</span> <span class="n">project</span><span class="o">=</span><span class="s">"timewarp"</span><span class="p">,</span> <span class="n">resources</span><span class="o">=</span><span class="n">resource</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">j_id</span><span class="p">)</span>
</code></pre></div></div>

<p>On your <strong>local</strong> computer run,</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python si_relax_remote.py 
</code></pre></div></div>

<p>You can monitor the job status with the command <code class="language-plaintext highlighter-rouge">jf job list</code> from your <strong>local</strong> computer. You may submit as many calculations as you want since the job scheduler on the remote cluster will take care of queuing jobs and executing them. Once the calculation is complete, the output data can be accessed in the MongoDB database <code class="language-plaintext highlighter-rouge">timewarp</code>. Each job is saved with a unique identifier, <code class="language-plaintext highlighter-rouge">uuid</code> which looks something like `
<code class="language-plaintext highlighter-rouge">uuid:"71411da5-078c-4f88-8362-9da3ae3263ea"</code>.</p>

<h1 id="importing-mongodb-data-in-python">Importing MongoDB data in Python</h1>

<p>The calculation data is saved in the database <code class="language-plaintext highlighter-rouge">timewarp</code> in the collection <code class="language-plaintext highlighter-rouge">outputs</code>. Each run is a document within the collection and can be referenced by its <code class="language-plaintext highlighter-rouge">uuid</code> using Python. The following script displays the structural information and bandgap of the material.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">jobflow_remote</span> <span class="kn">import</span> <span class="n">get_jobstore</span>

<span class="n">js</span> <span class="o">=</span> <span class="n">get_jobstore</span><span class="p">()</span>
<span class="n">js</span><span class="p">.</span><span class="n">connect</span><span class="p">()</span>

<span class="n">data</span> <span class="o">=</span> <span class="n">js</span><span class="p">.</span><span class="n">get_output</span><span class="p">(</span><span class="s">"71411da5-078c-4f88-8362-9da3ae3263ea"</span><span class="p">)</span>

<span class="c1"># output structural information
</span><span class="k">print</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="s">"structure"</span><span class="p">])</span>

<span class="c1"># output bandgap 
</span><span class="k">print</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="s">"output"</span><span class="p">][</span><span class="s">"bandgap"</span><span class="p">])</span>
</code></pre></div></div>

<h1 id="next-steps">Next steps</h1>

<p>What we just did was a simple structural relaxation calculation. However, the beauty of Atomate2 is the ability to chain calculations to do workflows. For instance, a relaxation followed by a bandstructure calculation.
Please refer to the Atomate2 <a href="https://materialsproject.github.io/atomate2/">documentation</a> for guidance on how to do this and much more.</p>

<h1 id="references">References</h1>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://github.com/materialsproject/atomate2">https://github.com/materialsproject/atomate2</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://fhi-aims.org">https://fhi-aims.org</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://www.mongodb.com">https://www.mongodb.com</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://pymatgen.org">https://pymatgen.org</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://matgenix.github.io/jobflow-remote/">https://matgenix.github.io/jobflow-remote/</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Uthpala Herath</name></author><category term="fhiaims" /><category term="density-functional-theory" /><category term="python" /><category term="condensed-matter-physics" /><category term="materials-science" /><summary type="html"><![CDATA[This brief tutorial provides an introduction to using Atomate2 to set up DFT workflows with the FHI-aims code.]]></summary></entry><entry><title type="html">Cron-the Unix assistant you didn’t know you had</title><link href="https://www.uthpalaherath.com/Cron-the-unix-assistant-you-didn't-know-you-had/" rel="alternate" type="text/html" title="Cron-the Unix assistant you didn’t know you had" /><published>2024-09-15T00:00:00+00:00</published><updated>2024-09-15T00:00:00+00:00</updated><id>https://www.uthpalaherath.com/Cron-the-unix-assistant-you-didn&apos;t-know-you-had</id><content type="html" xml:base="https://www.uthpalaherath.com/Cron-the-unix-assistant-you-didn&apos;t-know-you-had/"><![CDATA[<p class="notice--primary"><em>A short tutorial on automating recurring tasks on Unix systems with Cron.</em></p>

<p><img src="/assets/media/2024-09-15-Cron-the-unix-assistant-you-didn't-know-you-had/2024-09-15-Cron-the-unix-assistant-you-didn't-know-you-had-20240915122634371.png" alt="2024-09-15-Cron-the-unix-assistant-you-didn't-know-you-had-20240915122634371" width="50%" /></p>

<p>Do you have recurring tasks that thus far you had to run on your Unix (Linux/ Mac) system manually? This could be anything from making scheduled backups of a folder to syncing the contents of a file with a remote server. Fortunately, for us Unix users, there’s a built-in service that allows you to automate all of this. Enter Cron.</p>

<h1 id="backing-up-with-cron">Backing up with Cron</h1>

<p>Say we have a folder, <code class="language-plaintext highlighter-rouge">/home/ukh/MatD3</code> that needs backing up weekly. First we write a shell script to compress that directory and save it as a tar file with the date in the directory <code class="language-plaintext highlighter-rouge">/home/ukh/MatD3_backups/</code>. We will also check  for backups older than 90 days and delete them.</p>

<p><code class="language-plaintext highlighter-rouge">backup-weekly.sh</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">tar</span> <span class="nt">-zcf</span> /home/ukh/MatD3_backups/MatD3-<span class="si">$(</span><span class="nb">date</span> +%m%d%Y<span class="si">)</span>.tar.gz <span class="nt">-C</span> /home/ukh MatD3
find /home/ukh/MatD3_backups/<span class="k">*</span> <span class="nt">-mtime</span> +90 <span class="nt">-delete</span>
</code></pre></div></div>

<p>You can run this script and test if it works. If it does, then we can move on to the step of automating the task with Cron.</p>

<h2 id="adding-a-cron-job">Adding a Cron job</h2>

<p>To add a task to Cron, you can open the <code class="language-plaintext highlighter-rouge">crontab</code> file with the command: <code class="language-plaintext highlighter-rouge">crontab -e</code> and it will open with your default editor.
Add the following line to <code class="language-plaintext highlighter-rouge">crontab</code> to run the <code class="language-plaintext highlighter-rouge">backup-weekly.sh</code> script every week.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>30 0 <span class="k">*</span> <span class="k">*</span> 1 sh /home/ukh/cronscripts/backup-weekly.sh
</code></pre></div></div>

<p>Remember to provide the absolute path of the <code class="language-plaintext highlighter-rouge">backup-weekly.sh</code> script. The syntax, <code class="language-plaintext highlighter-rouge">30 0 * * 1</code> tells Cron to run this task every Monday at 12:30 AM. The Cron syntax follows the format, <code class="language-plaintext highlighter-rouge">[minute (m)|hour (h)|day of month (dom)|month (mon)|day of week (dow)]</code>. The * means “for any”. For a verbose interpretation of the format, check out <a href="https://crontab.guru">this</a> website.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>

<h1 id="backing-up-to-a-remote-server">Backing up to a remote server</h1>

<p>It’s always a good idea to backup your data externally in case something happens to your local system. <code class="language-plaintext highlighter-rouge">rsync</code> is a great utility that allows syncing a directory with another location. Unlike <code class="language-plaintext highlighter-rouge">scp</code> which performs a plain linear copy, locally, or over a network, <code class="language-plaintext highlighter-rouge">rsync</code> employs a special delta transfer algorithm and a few optimizations to make the operation a lot faster.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>

<p>Let’s say we want to sync our local directory <code class="language-plaintext highlighter-rouge">/home/ukh/MatD3_backups</code> with a similar location in a remote server, <code class="language-plaintext highlighter-rouge">timewarp-02.egr.duke.edu</code>. For this to be automated with Cron, it is important that you have your <code class="language-plaintext highlighter-rouge">ssh-keys</code> set for a password-less connection with the remote server. Read more on how to do that <a href="https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys-2">here</a>.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></p>

<p>First let’s test the <code class="language-plaintext highlighter-rouge">rsync</code> command,</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rsync <span class="nt">-a</span> <span class="nt">--delete</span> /home/ukh/MatD3_backups/ ukh@timewarp-02.egr.duke.edu:/home/ukh/MatD3_backups
</code></pre></div></div>

<p>If you get a <code class="language-plaintext highlighter-rouge">protocol version mismatch -- is your shell clean?</code> error, this is because your remote server outputs something to the terminal, probably due to output statements in the remote server’s <code class="language-plaintext highlighter-rouge">.bashrc</code> or another startup script. To check this, run the following command and remove whatever is outputting to the terminal. <code class="language-plaintext highlighter-rouge">out.dat</code> should be a zero-length file.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rsh ukh@timewarp-02.egr.duke.edu /bin/true <span class="o">&gt;</span> out.dat
</code></pre></div></div>

<p>Once, you have that working we can automate the task with Cron. Add the following to your <code class="language-plaintext highlighter-rouge">crontab</code> which you can open with <code class="language-plaintext highlighter-rouge">crontab -e</code>.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0 2 <span class="k">*</span> <span class="k">*</span> <span class="k">*</span> rsync <span class="nt">-a</span> <span class="nt">--delete</span> /home/ukh/MatD3_backups/ ukh@timewarp-02.egr.duke.edu:/home/ukh/MatD3_backups
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">0 2 * * *</code> means run this task at 02:00 AM everyday. <code class="language-plaintext highlighter-rouge">--delete</code> will ensure that the expired backups will be deleted on the remote server as well. Note the forward slash in the local directory and the lack of one in the remote directory. This is to avoid making an additional directory <code class="language-plaintext highlighter-rouge">MatD3_backups</code> within the remote directory.</p>

<p>If everything works, then you’ll have a system that automatically makes scheduled backups and syncs them with a remote server. 
Happy Cron-ing!</p>
<h1 id="references">References</h1>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://crontab.guru/">https://crontab.guru/</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://stackoverflow.com/questions/20244585/how-does-scp-differ-from-rsync">https://stackoverflow.com/questions/20244585/how-does-scp-differ-from-rsync</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys-2">https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys-2</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Uthpala Herath</name></author><category term="linux" /><category term="tutorials" /><summary type="html"><![CDATA[A short tutorial on automating recurring tasks on Unix systems with Cron.]]></summary></entry><entry><title type="html">Hi again and Resolve Mathjax Latex rendering issues on Jekyll Markdown website</title><link href="https://www.uthpalaherath.com/Resolve-Mathjax-Latex-Rendering-on-Jekyll-Markdown-Website/" rel="alternate" type="text/html" title="Hi again and Resolve Mathjax Latex rendering issues on Jekyll Markdown website" /><published>2024-08-18T00:00:00+00:00</published><updated>2024-08-18T00:00:00+00:00</updated><id>https://www.uthpalaherath.com/Resolve-Mathjax-Latex-Rendering-on-Jekyll-Markdown-Website</id><content type="html" xml:base="https://www.uthpalaherath.com/Resolve-Mathjax-Latex-Rendering-on-Jekyll-Markdown-Website/"><![CDATA[<p class="notice--primary"><em>This post serves as a short catch-up and a quick fix to a Latex rendering issue I recently noticed on my web posts.</em></p>

<p>Hey folks! Long time no see! Hope you’re all doing awesome!</p>

<p>After about a 2 year hiatus, I thought I would try and get back into blogging on a more consistent basis. I’ve been postdoc-ing at Duke since the Summer of 2022 and I’d say it’s been quite an eventful journey since I  defended my Ph.D. dissertation and moved to Durham, NC from the Wild and Wonderful West Virginia. Durham has treated me quite nicely. I’ve met a bunch of wonderful people and there’s a lot to do around here. I went on a backpacking trip in the Appalachian Mountains of Western North Carolina,  explored the beautiful beach towns of Outer Banks, and bought a new car (Crosstrek FTW!).</p>

<p>Anyways, the actual reason that motivated me to write this post was noticing the Latex rendering on my website was broken. After doing a bit of digging<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> <sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, the reason seemed to be an update to <a href="https://www.mathjax.org">MathJax</a> v3. The fix was quite simple. Adding the following snippet to <code class="language-plaintext highlighter-rouge">_includes/script.html</code> did the job.</p>

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;script </span><span class="na">src=</span><span class="s">"https://polyfill.io/v3/polyfill.min.js?features=es6"</span><span class="nt">&gt;&lt;/script&gt;</span>
<span class="nt">&lt;script </span><span class="na">id=</span><span class="s">"MathJax-script"</span> <span class="na">async</span> <span class="na">src=</span><span class="s">"https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"</span><span class="nt">&gt;&lt;/script&gt;</span>
<span class="nt">&lt;script&gt;</span>
 <span class="nx">MathJax</span> <span class="o">=</span> <span class="p">{</span>
  <span class="na">tex</span><span class="p">:</span> <span class="p">{</span>
    <span class="na">inlineMath</span><span class="p">:</span> <span class="p">[[</span><span class="dl">'</span><span class="s1">$</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">$</span><span class="dl">'</span><span class="p">],</span> <span class="p">[</span><span class="dl">'</span><span class="se">\\</span><span class="s1">(</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="se">\\</span><span class="s1">)</span><span class="dl">'</span><span class="p">]],</span>
    <span class="na">displayMath</span><span class="p">:</span> <span class="p">[</span> <span class="p">[</span><span class="dl">'</span><span class="s1">$$</span><span class="dl">'</span><span class="p">,</span><span class="dl">'</span><span class="s1">$$</span><span class="dl">'</span><span class="p">],</span> <span class="p">[</span><span class="dl">"</span><span class="se">\\</span><span class="s2">[</span><span class="dl">"</span><span class="p">,</span><span class="dl">"</span><span class="se">\\</span><span class="s2">]</span><span class="dl">"</span><span class="p">]</span> <span class="p">],</span>
    <span class="na">processEscapes</span><span class="p">:</span> <span class="kc">true</span><span class="p">,</span>
  <span class="p">}</span>
<span class="p">}</span>
<span class="nt">&lt;/script&gt;</span>
</code></pre></div></div>

<p>This also ensures that in-line math enclosed within single <code class="language-plaintext highlighter-rouge">'$LaTex$'</code> renders too.</p>

<p>That’s it for now. Hope to see you all soon!</p>

<h1 id="references">References</h1>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://www.cross-validated.com/How-to-render-math-on-Minimal-Mistakes/">https://www.cross-validated.com/How-to-render-math-on-Minimal-Mistakes/</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://tex.stackexchange.com/questions/27633/mathjax-inline-mode-not-rendering">https://tex.stackexchange.com/questions/27633/mathjax-inline-mode-not-rendering</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Uthpala Herath</name></author><category term="markdown" /><category term="tutorials" /><category term="latex" /><summary type="html"><![CDATA[This post serves as a short catch-up and a quick fix to a Latex rendering issue I recently noticed on my web posts.]]></summary></entry></feed>