A work function is the simplest of the two types of workflows in AiiDA. It can call one or more calculation functions and return data that has been created by the calculation functions it has called. Moreover, work functions can also call other work functions, allowing you to write nested workflows.
In this section, you will learn to:
Understand how to add simple python functions to the provenance.
Learn how to write and launch a simple workflow in AiiDA.
Calculation functions are a great way to keep track of steps that are part of your scientific workflow and written in Python to the provenance of AiiDA.
In order to do so, you have to add a
calcfunction decorator to the Python function.
A simple example is the
multiply calculation function from the AiiDA basics section:
from aiida.engine import calcfunction @calcfunction def multiply(x, y): return x * y
In a sense, this example is deceptively simple.
Let’s consider a slightly more complicated example: a
rescale function that takes an ASE
Atoms structure and rescales the unit cell with a certain
def rescale(structure, scale): new_cell = structure.get_cell() * scale structure.set_cell(new_cell, scale_atoms=True) return structure
verdi shell or Jupyter notebook (with the AiiDA magic:
%aiida) and use the code snippet above to define the
Next, load any
StructureData, for example using the
In : from aiida.orm import StructureData ...: structure = QueryBuilder().append(StructureData).first()
In order to test the method, we need to convert the
StructureData into an ASE
This can be easily done using the
In : ase_structure = structure.get_ase()
Let’s have a look at what structure we found:
In : ase_structure Out: Atoms(symbols='NaNbO3', pbc=True, cell=[3.9761497211, 3.9761497211, 3.9761497211], masses=...)
Next, use the
rescale function to double the lattice vectors of the unit cell:
In : rescale(ase_structure, 2) Out: Atoms(symbols='NaNbO3', pbc=True, cell=[7.9522994422, 7.9522994422, 7.9522994422], masses=...)
Great! That all seems to be working as expected. Now it’s time to convert our Python function into a calculation function.
Working with nodes¶
Try to adapt the
rescale function above into a calculation function by adding a
from aiida.engine import calcfunction @calcfunction def rescale(structure, scale): new_cell = structure.get_cell() * scale structure.set_cell(new_cell, scale_atoms=True) return structure
Maybe you already see why just adding the
calcfunction decorator is not sufficient.
Trying to run the method again with the
2 scaling factor will fail, since neither are a
In : rescale(ase_structure, 2) (...) ValueError: Error occurred validating port 'inputs.structure': value 'structure' is not of the right type. Got '<class 'ase.atoms.Atoms'>', expected '(<class 'aiida.orm.nodes.data.data.Data'>,)'
However, passing the originally imported
StructureData stored in
Float(2) won’t work either:
In : rescale(structure, Float(2)) (...) AttributeError: 'StructureData' object has no attribute 'get_cell'
The reason for these failures is that we need to adjust the
rescale function further, to make sure it can both accept AiiDA nodes as inputs, as well as returns an AiiDA node:
from aiida.engine import calcfunction @calcfunction def rescale(structure, scale): """Calculation function to rescale a structure :param structure: An AiiDA `StructureData` to rescale :param scale: The scale factor (for the lattice constant) :return: The rescaled structure """ from aiida.orm import StructureData ase_structure = structure.get_ase() scale_value = scale.value new_cell = ase_structure.get_cell() * scale_value ase_structure.set_cell(new_cell, scale_atoms=True) return StructureData(ase=ase_structure)
Let’s explain the required changes in more detail:
from aiida.orm import StructureData
StructureData class is imported, since we need it later to convert the ASE
Atoms structure into a
StructureData node so we can output it.
ase_structure = structure.get_ase() scale_value = scale.value
These two lines simply convert the inputs, which have to be AiiDA nodes, into the corresponding ASE
Atoms structure and the Python
float base type that we need to scale the unit cell.
ase_structure has been rescaled, we need to convert it back into a
StructureData node that is then returned by the
rescale function as an output.
So, in reality we have to do two things in order to adapt a regular Python function into a calculation function that can be tracked in the provenance:
(1) Run the calculation function version of
rescale with AiiDA nodes as inputs.
Convert the output
StructureData node back into an ASE
Is the result what you expected?
After redefining the
rescale method with the code snippet above, running the calculation function with our originally imported
StructureData node and a
Float node works without a hitch:
In : new_structure = rescale(structure, Float(2))
Converting this into an ASE
Atoms object using the
In : new_structure.get_ase() Out: Atoms(symbols='NaNbO3', pbc=True, cell=[7.9522994422, 7.9522994422, 7.9522994422], masses=...)
we can see that the lattice cell vectors are twice as large as initially, which is the desired result.
(2) Why was the
multiply function so deceptively simple?
That is, why was conversion to/from AiiDA nodes not an issue there?
In the case of the
multiply function, the
y inputs are simply multiplied using
y are AiiDA
Int nodes, this results in a new
Int node whose value is the product of the two nodes:
In : Int(2) * Int(3) Out: <Int: uuid: cfff6b68-69a2-47e2-8feb-cbd039bb0588 (unstored) value: 6>
You can see that the result of multiplying two
Int nodes is simply another
This can then be directly returned in the
multiply method, avoiding the conversion issues we encountered for the
(3) Since calculation functions are tracked in the provenance, you should be able to find those you have just run using the
verdi process list command.
If you’ve tried the incorrect
rescale calculation function above, this list will contain one
Use what you’ve learned in the Troubleshooting module to figure out what went wrong here.
Looking at all processes that have completed in the last day:
$ verdi process list -a -p 1 PK Created Process label Process State Process status ---- --------- --------------- --------------- ---------------- (...) 2732 7m ago rescale ⨯ Excepted 2734 7m ago rescale ⏹ Finished 
Looking at the process report:
$ verdi process report <PK> 2021-07-06 23:20:04 : [2732|rescale|on_except]: Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/plumpy/process_states.py", line 230, in execute result = self.run_fn(*self.args, **self.kwargs) File "/opt/conda/lib/python3.7/site-packages/aiida/engine/processes/functions.py", line 395, in run result = self._func(*args, **kwargs) File "/tmp/ipykernel_2974/3046260054.py", line 6, in rescale new_cell = structure.get_cell() * scale AttributeError: 'StructureData' object has no attribute 'get_cell'
It’s clear that this corresponds to the case where we attempted to pass a
StructureData, but the function failed since the
get_cell() method is defined for the
Atoms class, not the
Writing a work function¶
Writing a work function whose provenance is automatically stored can be achieved by writing a Python function and decorating it with the
"""Basic calcfunction-based workflows for demonstration purposes.""" from aiida.engine import calcfunction, workfunction @calcfunction def add(x, y): return x + y @calcfunction def multiply(x, y): return x * y @workfunction def add_multiply(x, y, z): """Add two numbers and multiply it with a third.""" addition = add(x, y) product = multiply(addition, z) return product
It is important to reiterate here that the
add_multiply() function does not create any new data nodes.
multiply() calculation functions create the
Int data nodes, all the work function does is return the results of the
multiply() calculation function.
Moreover, both calculation and work functions can only accept and return data nodes, i.e. instances of classes that subclass the
Copy the code snippet above and put it into a Python file (e.g.
add_multiply.py), or download it directly using the link next to it.
In the terminal, navigate to the folder where you stored the script.
Next, import the add_multiply work function in the
In : from add_multiply import add_multiply
Similar to a calculation function, running a work function is as simple as calling a typical Python function: simply call it with the required input arguments:
In : result = add_multiply(Int(2), Int(3), Int(5))
add_multiply work function returns the output
Int node and we assign it to the variable
Again, note that the input arguments of a work function must be an instance of a
Data node, or any of its subclasses.
Just calling the
add_multiply function with regular integers will result in a
ValueError, as these cannot be stored in the provenance graph.
When we check the AiiDA list of all processes that have terminated in the past day:
$ verdi process list -a -p 1 PK Created Process label Process State Process status ---- --------- --------------- --------------- ---------------- ... 1859 1m ago add_multiply ⏹ Finished  1860 1m ago add ⏹ Finished  1862 1m ago multiply ⏹ Finished 
Copy the PK of the
add_multiply work function and check its status with
verdi process status (in the above example, the PK is
$ verdi process status <PK> add_multiply<1859> Finished  ├── add<1860> Finished  └── multiply<1862> Finished 
Finally, you can also check the details of the inputs and outputs of the work function:
$ verdi process show <PK>
Notice that each input and output to the work function
add_multiply is stored as a node, and that the work chain has
CALLED both the
multiply calculation functions:
Property Value ----------- ------------------------------------ type add_multiply state Finished  pk 1859 uuid c65df725-6065-40ec-8343-6ee9ef68ca9a label add_multiply description ctime 2021-06-07 14:48:06.342948+00:00 mtime 2021-06-07 14:48:06.835870+00:00 Inputs PK Type -------- ---- ------ x 1856 Int y 1857 Int z 1858 Int Outputs PK Type --------- ---- ------ result 1863 Int Called PK Type -------- ---- -------- CALL 1860 add CALL 1862 multiply
Let’s look at multiple ways to generate the provenance graph and what this can teach us.
(1) Generate the provenance graph of the
add_multiply work function without any additional options.
Does anything seem missing here?
(2) Try to generate the provenance graph again, but this time with the
-i, --process-in option.
You can use
verdi node graph generate -h for more information about the various options of this command.
(3) Finally, try to generate the data provenance by:
multiplycalculation function instead of the
-l, --link-typesoption to select the