(workflows-workfunction)= # Work functions A *work function* is the simplest of the two types of workflows in AiiDA. It can call one or more calculation functions and *return* data that has been *created* by the calculation functions it has called. Moreover, work functions can also call other work functions, allowing you to write nested workflows. In this section, you will learn to: 1. Understand how to add simple python functions to the provenance. 2. Learn how to write and launch a simple workflow in AiiDA. (workflows-workfunction-calcfunction)= ## Calculation functions :::{margin} {{ python }} **Decorators** A decorator can be used to add functionality to an existing function. You can read more about them [here](https://pythonbasics.org/decorators/). ::: Calculation functions are a great way to keep track of steps that are part of your scientific workflow and written in Python to the provenance of AiiDA. In order to do so, you have to add a `calcfunction` decorator to the Python function. A simple example is the `multiply` calculation function from the {ref}`AiiDA basics section `: ```{code-block} python from aiida.engine import calcfunction @calcfunction def multiply(x, y): return x * y ``` In a sense, this example is _deceptively_ simple. Let's consider a slightly more complicated example: a `rescale` function that takes an ASE `Atoms` structure and rescales the unit cell with a certain `scale` factor: ```{code-block} python def rescale(structure, scale): new_cell = structure.get_cell() * scale structure.set_cell(new_cell, scale_atoms=True) return structure ``` Open a `verdi shell` or Jupyter notebook (with the AiiDA magic: `%aiida`) and use the code snippet above to define the `rescale` function. Next, load _any_ `StructureData`, for example using the `QueryBuilder`: ```{code-block} ipython In [2]: from aiida.orm import StructureData ...: structure = QueryBuilder().append(StructureData).first()[0] ``` In order to test the method, we need to convert the `StructureData` into an ASE `Atoms` instance. This can be easily done using the `get_ase()` method: ```{code-block} ipython In [3]: ase_structure = structure.get_ase() ``` Let's have a look at what structure we found: ```{code-block} ipython In [4]: ase_structure Out[4]: Atoms(symbols='NaNbO3', pbc=True, cell=[3.9761497211, 3.9761497211, 3.9761497211], masses=...) ``` Next, use the `rescale` function to double the lattice vectors of the unit cell: ```{code-block} ipython In [5]: rescale(ase_structure, 2) Out[5]: Atoms(symbols='NaNbO3', pbc=True, cell=[7.9522994422, 7.9522994422, 7.9522994422], masses=...) ``` Great! That all seems to be working as expected. Now it's time to convert our Python function into a calculation function. ### Working with nodes Try to adapt the `rescale` function above into a calculation function by adding a `calcfunction` decorator: ```{code-block} python from aiida.engine import calcfunction @calcfunction def rescale(structure, scale): new_cell = structure.get_cell() * scale structure.set_cell(new_cell, scale_atoms=True) return structure ``` Maybe you already see why just adding the `calcfunction` decorator is not sufficient. Trying to run the method again with the `ase_structure` and `2` scaling factor will fail, since neither are a `Data` node: ```{code-block} ipython In [7]: rescale(ase_structure, 2) (...) ValueError: Error occurred validating port 'inputs.structure': value 'structure' is not of the right type. Got '', expected '(,)' ``` However, passing the originally imported `StructureData` stored in `structure` and `Float(2)` won't work either: ```{code-block} ipython In [8]: rescale(structure, Float(2)) (...) AttributeError: 'StructureData' object has no attribute 'get_cell' ``` The reason for these failures is that we need to adjust the `rescale` function further, to make sure it can both accept AiiDA nodes as _inputs_, as well as _returns_ an AiiDA node: ```{literalinclude} include/code/realworld/rescale.py :language: python :emphasize-lines: 12, 14-15, 20 ``` Let's explain the required changes in more detail: ```{literalinclude} include/code/realworld/rescale.py :language: python :lines: 12 ``` Here the `StructureData` class is imported, since we need it later to convert the ASE `Atoms` structure into a `StructureData` node so we can output it. ```{literalinclude} include/code/realworld/rescale.py :language: python :lines: 14-15 ``` These two lines simply convert the inputs, which _have_ to be AiiDA nodes, into the corresponding ASE `Atoms` structure and the Python `float` base type that we need to scale the unit cell. ```{literalinclude} include/code/realworld/rescale.py :language: python :lines: 20 ``` After the `ase_structure` has been rescaled, we need to convert it back into a `StructureData` node that is then _returned_ by the `rescale` function as an output. So, in reality we have to do two things in order to adapt a regular Python function into a calculation function that can be tracked in the provenance: 1. Add the `calcfunction` decorator. 2. Make sure the function expects and returns AiiDA {class}`~aiida.orm.nodes.data.data.Data` nodes. This often involves converting the input nodes into other Python objects, and converting the result of the analysis back into an AiiDA {class}`~aiida.orm.nodes.data.data.Data` node. ### Exercises (1) Run the calculation function version of `rescale` with AiiDA nodes as inputs. Convert the output `StructureData` node back into an ASE `Atoms` structure. Is the result what you expected? :::{dropdown} **Solution** After redefining the `rescale` method with the code snippet above, running the calculation function with our originally imported `StructureData` node and a `Float` node works without a hitch: ```{code-block} ipython In [10]: new_structure = rescale(structure, Float(2)) ``` Converting this into an ASE `Atoms` object using the `get_ase()` method: ```{code-block} ipython In [11]: new_structure.get_ase() Out[11]: Atoms(symbols='NaNbO3', pbc=True, cell=[7.9522994422, 7.9522994422, 7.9522994422], masses=...) ``` we can see that the lattice cell vectors are twice as large as initially, which is the desired result. ::: (2) Why was the `multiply` function so deceptively simple? That is, why was conversion to/from AiiDA nodes not an issue there? :::{dropdown} **Solution** In the case of the `multiply` function, the `x` and `y` inputs are simply multiplied using `*`. Since `x` and `y` are AiiDA `Int` nodes, this results in a new `Int` node whose value is the product of the two nodes: ```{code-block} ipython In [12]: Int(2) * Int(3) Out[12]: ``` You can see that the result of multiplying two `Int` nodes is simply another `Int` node. This can then be directly returned in the `multiply` method, avoiding the conversion issues we encountered for the `rescale` example. ::: (3) Since calculation functions are tracked in the provenance, you should be able to find those you have just run using the `verdi process list` command. If you've tried the _incorrect_ `rescale` calculation function above, this list will contain one `Excepted` result. Use what you've learned in the {ref}`Troubleshooting module ` to figure out what went wrong here. :::{dropdown} **Solution** Looking at _all_ processes that have completed _in the last day_: ```{code-block} console $ verdi process list -a -p 1 PK Created Process label Process State Process status ---- --------- --------------- --------------- ---------------- (...) 2732 7m ago rescale ⨯ Excepted 2734 7m ago rescale ⏹ Finished [0] ``` Looking at the process report: ```{code-block} console $ verdi process report 2021-07-06 23:20:04 [122]: [2732|rescale|on_except]: Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/plumpy/process_states.py", line 230, in execute result = self.run_fn(*self.args, **self.kwargs) File "/opt/conda/lib/python3.7/site-packages/aiida/engine/processes/functions.py", line 395, in run result = self._func(*args, **kwargs) File "/tmp/ipykernel_2974/3046260054.py", line 6, in rescale new_cell = structure.get_cell() * scale AttributeError: 'StructureData' object has no attribute 'get_cell' ``` It's clear that this corresponds to the case where we attempted to pass a `StructureData`, but the function failed since the `get_cell()` method is defined for the `Atoms` class, not the `StructureData` one. ::: ## Writing a work function Writing a work function whose provenance is automatically stored can be achieved by writing a Python function and decorating it with the {func}`~aiida.engine.processes.functions.workfunction` decorator: :::{margin} {{ download }} {download}`Download the Python file! ` ::: ```{literalinclude} include/code/add_multiply.py :language: python :start-after: start-marker ``` It is important to reiterate here that the {func}`~aiida.engine.processes.functions.workfunction`-decorated `add_multiply()` function does not *create* any new data nodes. The `add()` and `multiply()` calculation functions create the `Int` data nodes, all the work function does is *return* the results of the `multiply()` calculation function. Moreover, both calculation and work functions can only accept and return data nodes, i.e. instances of classes that subclass the {class}`~aiida.orm.nodes.data.data.Data` class. Copy the code snippet above and put it into a Python file (e.g. `add_multiply.py`), or download it directly using the link next to it. In the terminal, navigate to the folder where you stored the script. Next, import the add_multiply work function in the `verdi shell`: ```{code-block} ipython In [1]: from add_multiply import add_multiply ``` Similar to a calculation function, running a work function is as simple as calling a typical Python function: simply call it with the required input arguments: ```{code-block} ipython In [2]: result = add_multiply(Int(2), Int(3), Int(5)) ``` Here, the `add_multiply` work function returns the output `Int` node and we assign it to the variable `result`. Again, note that the input arguments of a work function must be an instance of a {class}`~aiida.orm.nodes.data.data.Data` node, or any of its subclasses. Just calling the `add_multiply` function with regular integers will result in a `ValueError`, as these cannot be stored in the provenance graph. :::{margin} {{ aiida }} **Further reading** Although the example above shows the most straightforward way to run the `add_and_multiply` work function, there are several other ways of running processes that can return more than just the result. For example, the `run_get_node` function from the AiiDA engine returns both the result of the workflow and the work function node. See the {ref}`corresponding topics section for more details `. ::: When we check the AiiDA list of _all_ processes that have terminated _in the past day_: ```{code-block} console $ verdi process list -a -p 1 PK Created Process label Process State Process status ---- --------- --------------- --------------- ---------------- ... 1859 1m ago add_multiply ⏹ Finished [0] 1860 1m ago add ⏹ Finished [0] 1862 1m ago multiply ⏹ Finished [0] ``` Copy the PK of the `add_multiply` work function and check its status with `verdi process status` (in the above example, the PK is `1859`): ```{code-block} $ verdi process status add_multiply<1859> Finished [0] ├── add<1860> Finished [0] └── multiply<1862> Finished [0] ``` Finally, you can also check the details of the inputs and outputs of the work function: ```{code-block} console $ verdi process show ``` Notice that each input and output to the work function `add_multiply` is stored as a node, and that the work chain has `CALLED` both the `add` and `multiply` calculation functions: ```{code-block} console Property Value ----------- ------------------------------------ type add_multiply state Finished [0] pk 1859 uuid c65df725-6065-40ec-8343-6ee9ef68ca9a label add_multiply description ctime 2021-06-07 14:48:06.342948+00:00 mtime 2021-06-07 14:48:06.835870+00:00 Inputs PK Type -------- ---- ------ x 1856 Int y 1857 Int z 1858 Int Outputs PK Type --------- ---- ------ result 1863 Int Called PK Type -------- ---- -------- CALL 1860 add CALL 1862 multiply ``` ### Exercise Let's look at multiple ways to generate the provenance graph and what this can teach us. (1) Generate the provenance graph of the `add_multiply` work function without any additional options. Does anything seem missing here? :::{dropdown} **Solution** You should know the correct command by now: ```{code-block} console $ verdi node graph generate ``` You might notice here that there are no links between the `Int` nodes and the calculation functions called by the work function. These are not shown by default, but in the next exercise you'll use an option to do so. ```{figure} include/images/workfunction/add_multiply_default.png :width: 400px Default provenance graph for the `add_multiply` work function. ``` ::: (2) Try to generate the provenance graph again, but this time with the `-i, --process-in` option. You can use `verdi node graph generate -h` for more information about the various options of this command. :::{dropdown} **Solution** By using the `-i, --process-in` option, you can see that the `INPUT_CALC` links missing from the previous provenance graph are now included: ```{code-block} console $ verdi node graph generate -i ``` ```{figure} include/images/workfunction/add_multiply_all.png :width: 500px Provenance graph for the `add_multiply` work function using the `-i, --process-in` option. ``` ::: (3) Finally, try to generate the _data_ provenance by: 1. Targetting the `multiply` calculation function instead of the `add_multiply` method. 2. Using the `-l, --link-types` option to select the `data` links only. :::{dropdown} **Solution** Use the `-l` option with `data` as an argument: ```{code-block} console $ verdi node graph generate -l data ``` Note that `` here should be replaced by the PK of the `multiply` calculation function! We now only see the _data_ provenance, i.e. the workflows are no longer in the provenance graph. ```{figure} include/images/workfunction/multiply_data_only.png :width: 400px Provenance graph for the `multiply` calculation function when only selecting the data provenance with the `-l, --link-types` option. ``` :::