3. Automated high-throughput wannierisation

In the following tutorial you will learn how to perform automated high-throughput Wannierisation using a dedicated AiiDA workchain.

The protocol for automating the construction of Wannier functions is discussed in the following article:

  • Valerio Vitale, Giovanni Pizzi, Antimo Marrazzo, Jonathan Yates, Nicola Marzari, Arash Mostofi, Automated high-throughput wannierisation, accepted in npj Computational Materials (2020); https://arxiv.org/abs/1909.00433

whose data is available on the Materials Cloud Archive as:

  • Valerio Vitale, Giovanni Pizzi, Antimo Marrazzo, Jonathan R. Yates, Nicola Marzari, Arash A. Mostofi, Automated high-throughput Wannierisation, Materials Cloud Archive (2019), doi: 10.24435/materialscloud:2019.0044/v2.

The protocol leverages the SCDM method introduced in:

The initial workflow was written by Antimo Marrazzo (EPFL) and Giovanni Pizzi (EPFL) (and is available, in a Virtual Machine, on the Materials Cloud entry mentioned above). It was later substantially improved and upgraded to AiiDA v1.x by Junfeng Qiao (EFPL). The SCDM implementation in Quantum ESPRESSO was done by Valerio Vitale (Imperial College London and University of Cambridge).

Note

Launch while you read! The workflow should take a few minutes to run on the virtual machine (5-10 minutes). So, we suggest that you launch it now (following the suggestions below) in the background, while you read through the tutorial.

3.1. Preparation

You can download the launch script for the workflow here: launch_auto-wannier_workflow.py. The script accepts one argument, which is the location of the crystal structure file (in XSF format) for which you want to run the Wannierisation. As an example, you can pick any one of the simple crystal structures from this list below:

Once you downloaded both the launcher script and one of the XSF files, first adapt the launcher script content, as usual, filling in the names of the codes that you want to use.

Then, launch the script with the following command:

verdi run launch_auto-wannier_workflow.py CsH.xsf

(in the following, we will use CsH as an example); you can replace CsH.xsf with any other structure of the list above, e.g. PtS2.xsf, or Br2Ti.xsf, …

3.2. Introduction

The use of maximally-localised Wannier functions (MLWFs) in high-throughput (HT) workflows has been hindered by the fact that generating MLWFs automatically and robustly without any user intervention and for arbitrary materials is, in general, very challenging.

The procedure to obtain MLWFs requires to specify a number of parameters that depends on the specific system under study, such as

  • the type of projections to build the \(A_{mn}({\mathbf{k}})\) matrix

and, in the case of entangled bands, many other parameters like

  • the number of Wannier functions,

  • the frozen energy window,

  • the outer energy window, …

The SCDM method allows to circumvent the need to specify initial projections, by introducting an algorithm that builds the \(A_{mn}({\mathbf{k}})\) matrix by optimally selecting columns of the density matrix \(P_{\mathbf{k}}\)

\[P_{\mathbf{k}}(\mathbf{r},\mathbf{r'})=\sum_{n=1}^{J}f(\epsilon_{n\mathbf{k}})\psi_{n{\mathbf{k}}}(\mathbf{r})\psi^*_{n{\mathbf{k}}}(\mathbf{r'})\]

where \(f(\epsilon_{n\mathbf{k}})\) is an occupation function. The SCDM algorithm is based on a QR factorization with column pivoting (QRCP) and it is currently implemented in the pw2wannier90.x code of Quantum Espresso.

It is worth to stress that the occupation function does not necessarily correspond to a physical smearing, but it is used as a “window function” that restricts the manifold to the energy region of interest. For example, the isolated-band case can be recovered by setting \(f(\epsilon_{n\mathbf{k}})=1\) for energy values \(\epsilon_{n\mathbf{k}}\) within the energy range of the isolated bands, and zero elsewhere.

Another typical choice for the occupation function is the so-called complementary error function (erfc)

\[f(\epsilon)=\frac{1}{2}\mathrm{erfc}\left(\frac{\epsilon - \mu}{\sigma}\right)\]

and it is used to to deal with a manifold of bands that are entangled only in the upper energy region (e.g. for metals, or for the conduction band of insulators). Another possible choice for the occupation function could be a Gaussian, for instance to extract bands in a specific energy range from a fully entangled manifold.

While the SCDM method allows to avoid the disentanglement procedure, and so the need to specify the frozen and outer window, it does not provide a recipe to set the smearing function parameters \(\mu\) and \(\sigma\). In addition, the number of Wannier functions remains to be set and a sensible value typically requires some chemical consideration, such as counting the number of atomic orbitals of a given orbital character (e.g. \(s\), \(p\), \(d\), \(sp^3\), …).

The Wannier90BandsWorkChain (distributed in the aiida-wannier90-workflows package) is an AiiDA workchain that implements a protocol that deals with the choice of the number of Wannier functions and sets the parameters \(\mu\) and \(\sigma\) defining the smearing function. For a full explanation of the protocol we refer to the article Automated high-throughput wannierisation, while here we just outline the main features.

The workflow starts by running a DFT calculation (with Quantum ESPRESSO) using automated workchains distributed within the aiida-quantumespresso plugin, that take care of automation for the calculations performed with Quantum ESPRESSO (QE). This is followed by a calculation of the projectability, that is the projection of the Kohn-Sham states onto localised atomic orbitals:

\[p_{n\mathbf{k}} = \sum_{I,l,m}|\langle \psi_{n\mathbf{k}} | \phi_{Ilm}^{\mathbf{k}}\rangle|^2,\]

where the \(\phi_{Ilm}(\mathbf{k})\) are the pseudo-atomic orbitals (PAO) employed in the generation of the pseudopotentials, \(I\) is an index running over the atoms in the cell and \(lm\) define the usual angular momentum quantum numbers.

The workflow is designed for the specific use case where we are interested in wannierising the occupied bands (plus, optionally, some unoccupied or partially occupied bands) in insulators and in metals.

The number of Wannier functions is automatically set equal to the number of PAOs defined in the pseudopotentials (and the pseudopotentials are automatically taken from the SSSP efficiency library). The projectabilities on these PAO orbitals are then computed and used to set the optimal smearing function (erfc) parameters \(\mu\) and \(\sigma\), as explained in the paper Automated high-throughput wannierisation. After the calculation of the projectabilities, the workflow proceeds with the usual Wannierisation step: first it computes the overlap and projection matrices using pw2wannier90, and then it runs the Wannier90 code.

Here we summarise the main steps of the Wannier90BandsWorkChain:

  • SCF (QuantumESPRESSO pw.x)

  • NSCF (QuantumESPRESSO pw.x)

  • Projectability (QuantumESPRESSO projwfc.x)

  • Wannier90 pre-processing (wannier90.x -pp)

  • Overlap matrices \(M_{mn}\), initial projections with SCDM \(A_{mn}\) (QuantumESPRESSO pw2wannier90.x)

  • Wannierisation (wannier90.x)

The output of the workflow includes several nodes, among which the projectabilities and interpolated band structure, that we are going to inspect after the run.

3.3. Running the workflow

If you have not launch the script yet, please do it now!

Here we focus on how to run the Wannier90BandsWorkChain, the AiiDA workchain that implements the automation workflow to obtain MLWFs.

Important note: For this tutorial, in order to keep the total simulation time down to ~5-10 minutes on a typical laptop, we run the workflow using the testing peotocol, where all the wavefunction cutoffs are halved to speed up the calculations (with respect to the converged and tested cutoffs suggested by the SSSP pseudopotential library). For production please use the theos-ht-1.0 protocol (or any other protocol with converged cutoffs).

We remind you that, to get a list of all the AiiDA processes (including calculations and workchains) that are running and their status you can use:

verdi process list

or alternatively

verdi process list -p1 -a

to see all workflows and calculations, even completed, that have been launched in the past 1 day.

Since you have already run the workflow by now, run the commands above to retrieve the corresponding PKs.

Here is the script that you have run:

#!/usr/bin/env runaiida
import sys
from aiida.orm import StructureData, Bool, Code, Dict
from aiida.engine import submit
from ase.io import read as aseread
from aiida_wannier90_workflows.workflows import Wannier90BandsWorkChain

# Codenames for pw.x, pw2wannier90.x, projwfc.x and wannier90.x
# Please modify these according to your machine
pw_code = Code.get_from_string("<CODE LABEL>")  # e.g. 'qe-6.5-pw@localhost'
pw2wan_code = Code.get_from_string("<CODE LABEL>")  # e.g. 'qe-6.5-pw2wannier90@localhost'
projwfc_code = Code.get_from_string("<CODE LABEL>")  # e.g. 'qe-6.5-projwfc@localhost'
wan_code = Code.get_from_string("<CODE LABEL>")  # e.g. 'wannier90-3.1.0-wannier@localhost'

# The 1st commandline argument specifies the structure to be calculated
xsf_file = sys.argv[1]  # e.g. 'CsH.xsf'

# Read xsf file and convert into a stored StructureData
structure = StructureData(ase=aseread(xsf_file))

# Prepare the builder to launch the workchain
builder = Wannier90BandsWorkChain.get_builder()
builder.structure = structure
builder.code = {
    'pw': pw_code,
    'pw2wannier90': pw2wan_code,
    'projwfc': projwfc_code,
    'wannier90': wan_code
}
# For this tutorial, we are using the 'testing' protocol,
# with all cutoffs halved, to speed up the simulations
builder.protocol = Dict(dict={'name': 'testing'})
# Flags to control the workchain behaviour
builder.controls = {
    # If True, compute only valence bands (NB: use for insulators only!)
    'only_valence': Bool(False),
    # If True, perform maximal-localisation (MLWF) procedure, i.e., minimise the spread Omega
    'do_mlwf': Bool(True),
}

# Submits the workchain
workchain = submit(builder)

print('launched WorkChain<{}> for structure {}'.format(workchain.pk, structure.get_formula()))
print("Use `verdi process list` or `verdi process show {}` to check the progress".format(workchain.pk))

Inspect the script in detail now, and make sure that you understand all its parts. The comments should guide you through it. As you will notice, the amount of information to provide to the workflow is really minimal: just the codes to use, the crystal structure, and some flags to control the behavior (which protocol to use, if you want to compute only valence bands, and if you want to run the MLWF step).

3.4. Analyzing the outputs of the workflow

Now we analyse the reports and outputs of the workflow using the command line.

Both while the Wannier90BandsWorkChain is running, and after its completion, you can monitor its progress by looking at its “report”, using the command

verdi process report <PK>

where <PK> corresponds to the workchain PK. You will see a list of log with messages produced by the workchain, including the PKs of all its sub-workchains and of the calculations launched by the Wannier90BandsWorkChain. The report will be similar to the following:

2020-03-19 07:58:27 [88  | REPORT]: [495|Wannier90BandsWorkChain|setup_protocol]: running the workchain with the "testing" protocol
2020-03-19 07:58:27 [89  | REPORT]: [495|Wannier90BandsWorkChain|setup]: workchain controls found in inputs: valence + conduction bands
2020-03-19 07:58:27 [90  | REPORT]: [495|Wannier90BandsWorkChain|run_seekpath]: running seekpath to get primitive structure for: CsH
2020-03-19 07:58:28 [91  | REPORT]: [495|Wannier90BandsWorkChain|setup_parameters]: number of machines 1 auto-set according to number of atoms
2020-03-19 07:58:30 [92  | REPORT]: [495|Wannier90BandsWorkChain|run_wannier_workchain]: launching Wannier90WorkChain<514>
2020-03-19 07:58:31 [93  | REPORT]:   [514|Wannier90WorkChain|run_scf]: scf step - launching PwBaseWorkChain<517> in scf mode
2020-03-19 07:58:33 [94  | REPORT]:     [517|PwBaseWorkChain|run_calculation]: launching PwCalculation<523> iteration #1
2020-03-19 07:58:46 [95  | REPORT]:     [517|PwBaseWorkChain|inspect_calculation]: PwCalculation<523> completed successfully
2020-03-19 07:58:46 [96  | REPORT]:     [517|PwBaseWorkChain|results]: work chain completed after 1 iterations
2020-03-19 07:58:46 [97  | REPORT]:     [517|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2020-03-19 07:58:47 [98  | REPORT]:   [514|Wannier90WorkChain|inspect_scf]: scf PwBaseWorkChain successfully finished
2020-03-19 07:58:47 [99  | REPORT]:   [514|Wannier90WorkChain|run_nscf]: nscf number of bands set as 24
2020-03-19 07:58:48 [100 | REPORT]:   [514|Wannier90WorkChain|run_nscf]: nscf step - launching PwBaseWorkChain<537> in nscf mode
2020-03-19 07:58:49 [101 | REPORT]:     [537|PwBaseWorkChain|run_calculation]: launching PwCalculation<540> iteration #1
2020-03-19 08:01:16 [102 | REPORT]:     [537|PwBaseWorkChain|inspect_calculation]: PwCalculation<540> completed successfully
2020-03-19 08:01:16 [103 | REPORT]:     [537|PwBaseWorkChain|results]: work chain completed after 1 iterations
2020-03-19 08:01:16 [104 | REPORT]:     [537|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2020-03-19 08:01:17 [105 | REPORT]:   [514|Wannier90WorkChain|inspect_nscf]: nscf PwBaseWorkChain successfully finished
2020-03-19 08:01:17 [106 | REPORT]:   [514|Wannier90WorkChain|should_do_projwfc]: SCDM mu & sigma are auto-set using projectability
2020-03-19 08:01:18 [107 | REPORT]:   [514|Wannier90WorkChain|run_projwfc]: projwfc step - launching ProjwfcCalculation<546>
2020-03-19 08:01:27 [108 | REPORT]:   [514|Wannier90WorkChain|inspect_projwfc]: projwfc ProjwfcCalculation successfully finished
2020-03-19 08:01:28 [109 | REPORT]:   [514|Wannier90WorkChain|run_wannier90_pp]: number of Wannier functions extracted from projections: 14
2020-03-19 08:01:29 [110 | REPORT]:   [514|Wannier90WorkChain|run_wannier90_pp]: wannier90 postproc step - launching Wannier90Calculation<559> in postproc mode
2020-03-19 08:01:35 [111 | REPORT]:   [514|Wannier90WorkChain|inspect_wannier90_pp]: wannier90 postproc Wannier90Calculation successfully finished
2020-03-19 08:01:38 [112 | REPORT]:   [514|Wannier90WorkChain|run_pw2wannier90]: pw2wannier90 step - launching Pw2Wannier90Calculation<567>
2020-03-19 08:02:16 [113 | REPORT]:   [514|Wannier90WorkChain|inspect_pw2wannier90]: Pw2wannier90Calculation successfully finished
2020-03-19 08:02:16 [114 | REPORT]:   [514|Wannier90WorkChain|run_wannier90]: wannier90 step - launching Wannier90Calculation<572>
2020-03-19 08:03:06 [115 | REPORT]:   [514|Wannier90WorkChain|inspect_wannier90]: Wannier90Calculation successfully finished
2020-03-19 08:03:06 [116 | REPORT]:   [514|Wannier90WorkChain|results]: final step - preparing outputs
2020-03-19 08:03:06 [117 | REPORT]:   [514|Wannier90WorkChain|results]: Wannier90WorkChain successfully completed
2020-03-19 08:03:07 [118 | REPORT]: [495|Wannier90BandsWorkChain|results]: wannier90 interpolated bands pk: 575
2020-03-19 08:03:07 [119 | REPORT]: [495|Wannier90BandsWorkChain|results]: Wannier90BandsWorkChain successfully completed

Once the workchain has finished to run, you can look at all its inputs and outputs with

verdi node show <PK>

You should obtain an output similar to what follows:

Property     Value
-----------  ------------------------------------
type         Wannier90BandsWorkChain
state        Finished [0]
pk           404
uuid         df46175b-4634-4edf-a303-e90af98a27fc
label
description
ctime        2020-03-18 18:38:36.294955+00:00
mtime        2020-03-18 18:50:59.245496+00:00
computer     [1] localhost

Inputs                          PK    Type
------------------------------  ----  -------------
code
    wannier90                   17    Code
    projwfc                     6     Code
    pw2wannier90                7     Code
    pw                          1     Code
controls
    kpoints_distance_for_bands  401   Float
    do_disentanglement          400   Bool
    plot_wannier_functions      399   Bool
    retrieve_hamiltonian        398   Bool
    auto_projections            397   Bool
    do_mlwf                     396   Bool
    only_valence                395   Bool
protocol                        403   Dict
structure                       402   StructureData

Outputs                         PK  Type
----------------------------  ----  --------------
nscf_parameters                454  Dict
primitive_structure            409  StructureData
projwfc_bands                  460  BandsData
projwfc_projections            459  ProjectionData
pw2wannier90_remote_folder     477  RemoteData
scf_parameters                 437  Dict
seekpath_parameters            407  Dict
wannier90_interpolated_bands   484  BandsData
wannier90_parameters           485  Dict
wannier90_remote_folder        482  RemoteData
wannier90_retrieved            483  FolderData

Called      PK  Type
--------  ----  ----------------
CALL       423  WorkChainNode
CALL       406  CalcFunctionNode

Log messages
---------------------------------------------
There are 7 log messages for this calculation
Run 'verdi process report 404' to see them

3.5. Analyzing and comparing the band structure

First let’s give a look at the interpolated band structure by exporting it to a PDF file with

verdi data bands export --format mpl_pdf --output band_structure.pdf <PK_bands>

where <PK_bands> stands for the BandsData PK produced by the workflow. You can find it from the output of the verdi node show <PK> command that you run before. You should obtain a PDF like the following:

../../../_images/CsH_wan_band.png

Now we compare the Wannier-interpolated bands with the full DFT bands calculation. For convenience, we have already computed for you all the full DFT band structures for all the compounds for which we provided the XSF above (even if you could easily recompute these band structures using the plugins and workflows included in the aiida-quantumespresso package). You can download the full DFT bands in xmgrace format (.agr) form this list:

In particular, taking CsH as an example, you can first export the Wannier-interpolated bands that you just computed earlier in xmgrace format with

verdi data bands export --format agr --output CsH_wan_bands.agr <PK_bands>

and then you can compare them with the full DFT band structure using xmgrace typing:

xmgrace CsH_dft_bands.agr CsH_wan_bands.agr

where you can replace CsH with the correct chemical formula of the crystal structure that you used.

For reference, here are the two xmgrace files for CsH: CsH_dft_bands.agr CsH_wan_bands.agr.

In the case of CsH, you should obtain something like the following plot (note that we slightly updated the colors and graphical appearance of the plot with respect to the default one you would get by running the command above):

../../../_images/CsH_diff_bands.png

3.6. Analyzing the projectabilities

Now you will see how to look at the projectabilities that have been computed by the workchain and then used to obtain \(\mu\) and \(\sigma\) in the automation protocol. You can download the following script plot_projectabilities.py and run it

verdi run plot_projectabilities.py <PK>

where PK stands for the Wannier90BandsWorkChain pk.

You should obtain a plot similar to the following:

../../../_images/CsH_proj.png

Exercise: Open the script and try to understand what it is doing.

As you can see, the protocol to choose \(\mu\) and \(\sigma\) ensures that the SCDM algorithm is applied to a density-matrix that includes (with a large weight) only those Kohn-Sham states that have a large projection on the manifold spanned by the PAOs.

3.7. Analyzing the provenance graph

We begin by generating the provenance graph with

verdi node graph generate <PK>

where the PK corresponds to the workflow you have just run. You should obtain something like the following:

../../../_images/CsH.dot.jpg

Fig. 3.2 Provenance graph for a single Wannier90BandsWorkChain run. (PDF version CsH.dot.pdf)

As you can see, AiiDA has tracked all the inputs provided to all calculations (and their outputs), allowing you (or anyone else) to reproduce it later on.

3.8. (Optional) Maximal localisation and SCDM

Try to modify the launch_auto-wannier_workflow.py script to disable the MLWF procedure in order to obtain Wannier functions with SCDM projections that are not maximally localised.

Exercise: Run the workflow for 1 or 2 materials of the dataset. Do you notice any difference when using or not the MLWF procedure? Which one gives better results? Do your results agree with the findings of the Automated high-throughput wannierisation paper?

3.9. (Optional) Browse your database with the REST API

Connect to the AiiDA REST API and browse your database! Follow the instructions that you find on the Materials Cloud website.

3.10. (Optional) More on AiiDA

You now have a first taste of the type of problems AiiDA tries to solve, and you have seen how it is possible, thanks to the tools provided by AiiDA, to develop plugins and workflows to automate your research tasks, while preserving the full provenance and guaranteeing reproducibility of your results.

Here are some options for how to continue:

  • Continue with the in-depth tutorial and learn more about the verdi, verdi shell and python interfaces to AiiDA. There is more than enough material to keep you busy for a day. You can check for instance the tutorial held in EPFL in May 2019 (note that this has been tested on a beta version of AiiDA 1.0 and we didn’t check yet if anything needs to be adapted for AiiDA 1.1, that you have in your VM).

  • For advanced Linux & python users: try setting up AiiDA directly on your laptop. Note that AiiDA depends on a number of services and software that require some time to set up. Unfortunately, we will not have the time to help you solve issues related to your setup in a tutorial context, but you can refer to the AiiDA documentation and to the AiiDA mailing list.