5. Querying the AiiDA database {#sec:querybuilder}¶
We will now learn how to query the AiiDA database using the python interface. Queries are, in essence, questions to your database and you will use the answers you obtain in order to identify good candidate materials for methane storage.
5.1. The AiiDA QueryBuilder¶
AiiDA provides a querying tool called the QueryBuilder.
When working in the verdi shell
or a jupyter notebook with %aiida
enabled,
the QueryBuilder
class has already been imported.
Let’s create a new querybuilder for our query:
qb = QueryBuilder()
Now we need to say which nodes of the AiiDA graph we are interested in.
All node types of the AiiDA graph derive from the Node
class, so in order to select the most
generic type of node, simply append Node
:
qb.append(Node)
At this point, we can finish our query by asking for all nodes matching these criteria:
qb.all() # Returns all nodes in the database
Note We could also have used
qb.first()
to get the first node. Always use tab-completion when in doubt which functions can be used on a python object.
This command will return us all the Nodes directly, which may not be the most wise thing to do considering that is the biggest family of AiiDA stored objects that we can query. To understand the size of the result, we can type the following command:
qb.count() # Returns an integer, the number of nodes in the database
If you are interested to retrieve a subclass of a node, append that specific subclass instead of Node:
CifData = DataFactory('cif')
qb = QueryBuilder() # Creating a new QueryBuilder instance
qb.append(CifData) # Telling the QueryBuilder instance that I want a cif data type
qb.all() # Asking for all the results!
Note Remember to create a new
qb = QueryBuilder()
instance for each new query.
Other data types we will need in the tutorial are:
ParameterData = DataFactory('parameter')
NetworkCalculation = CalculationFactory('zeopp.network')
RaspaCalculation = CalculationFactory('raspa')
5.1.1. Exercise¶
Find the number of CifData
and NetworkCalculation
nodes in the database.
Is there one calculation for every structure?
You may also be interested to learn where the structures are coming from.
Node & subclasses |
Number in DB |
---|---|
Node |
2519 |
StructureData |
0 |
CifData |
503 |
ParameterData |
503 |
SinglefileData |
1 |
Code |
1 |
JobCalculation |
503 |
List of some Node subclasses and how many times they occur in our test database.
Note Advanced users who are familiar with SQL (Structured Query Language) syntax, can inspect the generated SQL query via:
str(qb)
5.2. Projections¶
So far, we’ve retrieved node objects but we are interested in the information they contain. In database language, performing a projection means to extract one or more specific columns from a table. In AiiDA language it means to specify which properties of the selected nodes our query should return.
|———| ————————-|
Entity |
Properties |
---|---|
Node |
id, uuid, type, label, description, ctime, mtime |
Computer |
id, uuid, name, hostname, description, enabled, transport_type, scheduler_type |
User |
id, email, first_name, last_name, institution |
Group |
id, uuid, name, type, time, description |
——— |
————————- |
5.2.1. A selection of entities and some of their properties¶
For example, we might be interested only in type and universally unique identifier (UUID) of each node:
qb = QueryBuilder()
qb.append(Node, project=["type", "uuid"])
qb.all()
Note Please use the
id
keyword in QueryBuilder queries to access the PK of a node.
5.3. Filters¶
We already know how to filter by the type of node, but often we would like to filter the results by other node properties.
For example, we might want to select all
the calculations that were launched on a specific date. In database
language, this is called “adding a filter” to a query. A filter is a
boolean operator that returns True
or False
.
Operator |
Datatype |
Example |
---|---|---|
== |
All |
|
in |
All |
|
>,<,<=,>= |
floats, integers, dates |
|
like |
Chars |
|
ilike |
Chars |
|
or |
|
|
and |
|
5.3.1. Selection of filter operators¶
If you want to add filters to your query, use the filters
and provide a dictionary with the filters you would like to apply.
Let’s filter out the structure with label “HKUST1”:
qb = QueryBuilder() # Instantiating a new QueryBuilder
qb.append(CifData, # I want structures
project=["id"], # I want to know the id
filters={"label": {"==":"HKUST1"}}) # Only structures with this label
qb.all()