Work functions#

A work function is the simplest of the two types of workflows in AiiDA. It can call one or more calculations and return data that has been created by the calculation it has called. Moreover, work functions can also call other workflows, allowing you to write nested workflows.

In this section, you will learn to:

  1. Understand how to add simple Python functions to the provenance.

  2. Learn how to write and launch a simple workflow in AiiDA.

Note that the output of verdi data core.structure import will probably show a different value for the PK of the structure node you just created: make a note of this PK, as you will need to replace it in code snippets later in this tutorial. You can also manually download the Si.cif structure file and copy it in your work environment instead of using wget.

Calculation functions#

Calculation functions are a great way to keep track of steps that are part of your scientific workflow and written in Python to the provenance of AiiDA. In order to do so, you have to add a calcfunction decorator to the Python function. A simple example is the multiply calculation function from the AiiDA basics section:

from aiida.engine import calcfunction

@calcfunction
def multiply(x, y):
    return x * y

In a sense, this example is deceptively simple. Let’s consider a slightly more complicated example: a rescale function that takes an ASE Atoms structure and rescales the unit cell with a certain scale factor:

def rescale(structure, scale):

    new_cell = structure.get_cell() * scale
    structure.set_cell(new_cell, scale_atoms=True)

    return structure

Open a verdi shell or Jupyter notebook (with the AiiDA magic: %aiida) and use the code snippet above to define the rescale function. Next, load any StructureData, for example using the QueryBuilder:

In [2]: from aiida.orm import StructureData
   ...: structure = QueryBuilder().append(StructureData).first()[0]

In order to test the method, we need to convert the StructureData into an ASE Atoms instance. This can be easily done using the get_ase() method:

In [3]: ase_structure = structure.get_ase()

Let’s have a look at what structure we found:

In [4]: ase_structure
Out[4]: Atoms(symbols='NaNbO3', pbc=True, cell=[3.9761497211, 3.9761497211, 3.9761497211], masses=...)

Next, use the rescale function to double the lattice vectors of the unit cell:

In [5]: rescale(ase_structure, 2)
Out[5]: Atoms(symbols='NaNbO3', pbc=True, cell=[7.9522994422, 7.9522994422, 7.9522994422], masses=...)

Great! That all seems to be working as expected. Now it’s time to convert our Python function into a calculation function.

Working with nodes#

Try to adapt the rescale function above into a calculation function by adding a calcfunction decorator:

from aiida.engine import calcfunction

@calcfunction
def rescale(structure, scale):

    new_cell = structure.get_cell() * scale
    structure.set_cell(new_cell, scale_atoms=True)

    return structure

Maybe you already see why just adding the calcfunction decorator is not sufficient. Trying to run the method again with the ase_structure and 2 scaling factor will fail, since neither are a Data node:

In [7]: rescale(ase_structure, 2)
(...)
ValueError: Error occurred validating port 'inputs.structure': value 'structure' is not of the right type.
Got '<class 'ase.atoms.Atoms'>', expected '(<class 'aiida.orm.nodes.data.data.Data'>,)'

However, passing the originally imported StructureData stored in structure and Float(2) won’t work either:

In [8]: rescale(structure, Float(2))
(...)
AttributeError: 'StructureData' object has no attribute 'get_cell'

The reason for these failures is that we need to adjust the rescale function further, to make sure it can both accept AiiDA nodes as inputs, as well as returns an AiiDA node:

from aiida.engine import calcfunction


@calcfunction
def rescale(structure, scale):
    """Calculation function to rescale a structure

    :param structure: An AiiDA `StructureData` to rescale
    :param scale: The scale factor (for the lattice constant)
    :return: The rescaled structure
    """
    from aiida.orm import StructureData

    ase_structure = structure.get_ase()
    scale_value = scale.value

    new_cell = ase_structure.get_cell() * scale_value
    ase_structure.set_cell(new_cell, scale_atoms=True)

    return StructureData(ase=ase_structure)

Let’s explain the required changes in more detail:

    from aiida.orm import StructureData

Here the StructureData class is imported, since we need it later to convert the ASE Atoms structure into a StructureData node so we can output it.

    ase_structure = structure.get_ase()
    scale_value = scale.value

These two lines simply convert the inputs, which have to be AiiDA nodes, into the corresponding ASE Atoms structure and the Python float base type that we need to scale the unit cell.

    return StructureData(ase=ase_structure)

After the ase_structure has been rescaled, we need to convert it back into a StructureData node that is then returned by the rescale function as an output.

So, in reality we have to do two things in order to adapt a regular Python function into a calculation function that can be tracked in the provenance:

  1. Add the calcfunction decorator.

  2. Make sure the function expects and returns AiiDA Data nodes. This often involves converting the input nodes into other Python objects, and converting the result of the analysis back into an AiiDA Data node.

Exercises#

(1) Run the calculation function version of rescale with AiiDA nodes as inputs. Convert the output StructureData node back into an ASE Atoms structure. Is the result what you expected?

(2) Why was the multiply function so deceptively simple? That is, why was conversion to/from AiiDA nodes not an issue there?

(3) Since calculation functions are tracked in the provenance, you should be able to find those you have just run using the verdi process list command. If you’ve tried the incorrect rescale calculation function above, this list will contain one Excepted result. Use what you’ve learned in the Troubleshooting module to figure out what went wrong here.

Writing a work function#

Writing a work function whose provenance is automatically stored can be achieved by writing a Python function and decorating it with the workfunction() decorator:

"""Basic calcfunction-based workflows for demonstration purposes."""
from aiida.engine import calcfunction, workfunction


@calcfunction
def add(x, y):
    return x + y


@calcfunction
def multiply(x, y):
    return x * y


@workfunction
def add_multiply(x, y, z):
    """Add two numbers and multiply it with a third."""
    addition = add(x, y)
    product = multiply(addition, z)
    return product

It is important to reiterate here that the workfunction()-decorated add_multiply() function does not create any new data nodes. The add() and multiply() calculation functions create the Int data nodes, all the work function does is return the results of the multiply() calculation function. Moreover, both calculation and work functions can only accept and return data nodes, i.e. instances of classes that subclass the Data class.

Copy the code snippet above and put it into a Python file (e.g. add_multiply.py), or download it directly using the link next to it. In the terminal, navigate to the folder where you stored the script. Next, import the add_multiply work function in the verdi shell:

In [1]: from add_multiply import add_multiply

Important

If you’re running the verdi shell commands in a Jupyter notebook, you need to restart the notebook kernel before it will recognise any new Python modules, or changes in a Python module. The notebook can be restarted with the circular arrow button at the top or by pressing zero 0 twice in quick succession.

Restarting the Jupyter notebook

Fig. 10 Restarting the Jupyter Notebook.#

Similar to a calculation function, running a work function is as simple as calling a typical Python function: simply call it with the required input arguments:

In [2]: result = add_multiply(Int(2), Int(3), Int(5))

Here, the add_multiply work function returns the output Int node and we assign it to the variable result. Again, note that the input arguments of a work function must be an instance of a Data node, or any of its subclasses. Just calling the add_multiply function with regular integers will result in a ValueError, as these cannot be stored in the provenance graph.

When we check the AiiDA list of all processes that have terminated in the past day:

$ verdi process list -a -p 1
  PK  Created    Process label    Process State    Process status
----  ---------  ---------------  ---------------  ----------------
...
1859  1m ago     add_multiply     ⏹ Finished [0]
1860  1m ago     add              ⏹ Finished [0]
1862  1m ago     multiply         ⏹ Finished [0]

Copy the PK of the add_multiply work function and check its status with verdi process status (in the above example, the PK is 1859):

$ verdi process status <PK>
add_multiply<1859> Finished [0]
    ├── add<1860> Finished [0]
    └── multiply<1862> Finished [0]

Finally, you can also check the details of the inputs and outputs of the work function:

$ verdi process show <PK>

Notice that each input and output to the work function add_multiply is stored as a node, and that the work chain has CALLED both the add and multiply calculation functions:

Property     Value
-----------  ------------------------------------
type         add_multiply
state        Finished [0]
pk           1859
uuid         c65df725-6065-40ec-8343-6ee9ef68ca9a
label        add_multiply
description
ctime        2021-06-07 14:48:06.342948+00:00
mtime        2021-06-07 14:48:06.835870+00:00

Inputs      PK  Type
--------  ----  ------
x         1856  Int
y         1857  Int
z         1858  Int

Outputs      PK  Type
---------  ----  ------
result     1863  Int

Called      PK  Type
--------  ----  --------
CALL      1860  add
CALL      1862  multiply

Exercise#

Let’s look at multiple ways to generate the provenance graph and what this can teach us.

(1) Generate the provenance graph of the add_multiply work function without any additional options. Does anything seem missing here?

(2) Try to generate the provenance graph again, but this time with the -i, --process-in option. You can use verdi node graph generate --help for more information about the various options of this command.

(3) Finally, try to generate the data provenance by:

  1. Targetting the multiply calculation function instead of the add_multiply method.

  2. Using the -l, --link-types option to select the data links only.