Query Engine#
1. What is the Query Engine?#
The GraphRecord Query Engine enables users to find node and edges’ indices stored in the graph structure efficiently. Thanks to an intuitive interface, complex queries can be performed, allowing you to filter nodes and edges by their properties and relationships. This section introduces the basic concepts of querying GraphRecords and explores advanced techniques for working with complex datasets.
2. Example dataset#
An example dataset for the following demonstrations is created manually with users, products, and their relationships.
users = pd.DataFrame(
[
["pat_0", 20, "M"],
["pat_1", 30, "F"],
["pat_2", 40, "M"],
["pat_3", 50, "F"],
["pat_4", 60, "M"],
],
columns=["index", "age", "gender"],
)
products = pd.DataFrame(
[
["drug_0", "fentanyl injection"],
["drug_1", "aspirin tablet"],
["drug_2", "insulin pen"],
],
columns=["index", "description"],
)
user_product = pd.DataFrame(
[
["pat_0", "drug_0", 100, 1, "2020-01-01"],
["pat_1", "drug_0", 150, 2, "2020-02-15"],
["pat_1", "drug_1", 50, 1, "2020-03-10"],
["pat_2", "drug_1", 75, 12, "2020-04-20"],
["pat_2", "drug_2", 200, 1, "2020-05-05"],
["pat_3", "drug_2", 180, 12, "2020-06-30"],
["pat_4", "drug_0", 120, 1, "2020-07-15"],
["pat_4", "drug_1", 60, 2, "2020-08-01"],
],
columns=["source", "target", "cost", "quantity", "time"],
)
graphrecord = (
gr.GraphRecord.builder()
.add_nodes((users, "index"), group="user")
.add_nodes((products, "index"), group="product")
.add_edges((user_product, "source", "target"), group="user_product")
.build()
)
This example dataset includes a set of users and products. For this section, we will use the users, products and the edges that connect these two groups.
users_df.head(10)
gender age
pat_0 M 20
pat_1 F 30
pat_2 M 40
pat_3 F 50
pat_4 M 60
products_df.head(10)
description
drug_0 fentanyl injection
drug_1 aspirin tablet
drug_2 insulin pen
user_product_edges.head(10)
source target cost time quantity
0 pat_0 drug_0 100 2020-01-01 1
1 pat_1 drug_0 150 2020-02-15 2
2 pat_1 drug_1 50 2020-03-10 1
3 pat_2 drug_1 75 2020-04-20 12
4 pat_2 drug_2 200 2020-05-05 1
5 pat_3 drug_2 180 2020-06-30 12
6 pat_4 drug_0 120 2020-07-15 1
7 pat_4 drug_1 60 2020-08-01 2
3. Node Queries#
The NodeOperand querying class allow you to define specific criteria for selecting nodes within a GraphRecord. These operands enable flexible and complex queries by combining multiple conditions, such as group membership, attributes’ selection and querying, attribute values, and relationships to other nodes or edges. This section introduces the basic usage of node operands to create a powerful foundation for your data queries.
The function query_nodes() and its counterpart query_edges() are the main ways to use these queries. They can retrieve different types of data from the GraphRecord, such as the indices of some nodes that fulfill some criteria (using index()), or even the mean age of those nodes (mean()).
# Basic node query
def query_node_in_user(node: NodeOperand) -> NodeIndicesOperand:
node.in_group("user")
return node.index()
graphrecord.query_nodes(query_node_in_user)
['pat_1', 'pat_0', 'pat_2', 'pat_3', 'pat_4']
Methods used in the snippet
in_group(): Query nodes that belong to that group.index(): Returns aNodeIndicesOperandrepresenting the indices of the nodes queried.query_nodes(): Retrieves information on the nodes from the GraphRecord given the query.
You can get to the same result via different approaches. That makes the query engine very versatile and adaptive to your specific needs. Let’s complicate it a bit more involving more than one operand.
# Intermediate node query
def query_node_user_older_than_30(node: NodeOperand) -> NodeIndicesOperand:
node.in_group("user")
node.index().contains("pat")
node.has_attribute("age")
node.attribute("age").greater_than(30)
return node.index()
graphrecord.query_nodes(query_node_user_older_than_30)
['pat_2', 'pat_3', 'pat_4']
Methods used in the snippet
in_group(): Query nodes that belong to that group.index(): Returns aNodeIndicesOperandrepresenting the indices of the nodes queried.contains(): Query node indices containing that argument.has_attribute(): Query nodes that have that attribute.attribute(): Returns aNodeMultipleValuesWithIndexOperandto query on the values of the nodes for that attribute.greater_than(): Query values that are greater than that value.query_nodes(): Retrieves information on the nodes from the GraphRecord given the query.
Note
The has_attribute() method is not needed in this example, since the attribute() one already checks whether the nodes have the attribute. It is placed there merely for educational purposes. This will happen in different examples in this user guide to ensure the maximum amount of methods are portrayed.
3.1. Reusing Node Queries#
As you can see, the query engine can prove to be highly useful for finding nodes that fulfill different criteria, these criteria being as specific and narrowing as we like. A key feature of the query engine is that it allows for re-using previous queries in new ones. For instance, the previous query can be written as follows:
# Reusing node query
def query_node_reused(node: NodeOperand) -> NodeIndicesOperand:
query_node_in_user(node)
node.index().contains("pat")
node.has_attribute("age")
node.attribute("age").greater_than(30)
return node.index()
graphrecord.query_nodes(query_node_reused)
['pat_2', 'pat_3', 'pat_4']
Methods used in the snippet
index(): Returns aNodeIndicesOperandrepresenting the indices of the nodes queried.contains(): Query node indices containing that argument.has_attribute(): Query nodes that have that attribute.attribute(): Returns aNodeMultipleValuesWithIndexOperandto query on the values of the nodes for that attribute.greater_than(): Query values that are greater than that value.query_nodes(): Retrieves information on the nodes from the GraphRecord given the query.
3.2. Neighbors#
Another very useful method is neighbors(), which can be used to query through the nodes that are neighbors to those nodes (they have edges connecting them).
In this following example we are selecting the nodes that fulfill the following criteria:
Are in group
user.Their node index contains the string “pat”
Their attribute
ageis greater than 30, and their attributegenderis equal to “M”.They are connected to nodes which attribute
descriptioncontains the word “fentanyl” in either upper or lowercase.
# Node query with neighbors function
def query_node_neighbors(node: NodeOperand) -> NodeIndicesOperand:
query_node_user_older_than_30(node)
description_neighbors = node.neighbors().attribute("description")
description_neighbors.lowercase()
description_neighbors.contains("fentanyl")
return node.index()
graphrecord.query_nodes(query_node_neighbors)
['pat_4']
Methods used in the snippet
neighbors(): Returns aNodeOperand()to query the neighbors of those nodes.attribute(): Returns aNodeMultipleValuesWithIndexOperand()to query on the values of the nodes for that attribute.lowercase(): Converts the values that are strings to lowercase.contains(): Query node indices containing that argument.index(): Returns aNodeIndicesOperandrepresenting the indices of the nodes queried.query_nodes(): Retrieves information on the nodes from the GraphRecord given the query.
4. Edge Queries#
The querying class EdgeOperand provides a way to query through the edges contained in a GraphRecord. Edge operands show the same functionalities as Node operands, creating a very powerful tandem to query throughout your data. In this section, we will portray different ways the edge operands can be employed.
# Basic edge query
def query_edge_user_product(edge: EdgeOperand) -> EdgeIndicesOperand:
edge.in_group("user_product")
return edge.index()
edges = graphrecord.query_edges(query_edge_user_product)
edges[0:5]
[6, 7, 0, 3, 5]
Methods used in the snippet
in_group(): Query edges that belong to that group.index(): Returns aEdgeIndicesOperandrepresenting the indices of the edges queried.query_edges(): Retrieves information on the edges from the GraphRecord given the query.
The edge operand follows the same principles as the node operand, with some extra queries applicable only to edges like source_node() or target_node() (instead of neighbors()).
# Advanced edge query
def query_edge_old_user_cheap_item(edge: EdgeOperand) -> EdgeIndicesOperand:
edge.in_group("user_product")
edge.attribute("cost").less_than(200)
edge.source_node().attribute("age").is_max()
edge.target_node().attribute("description").contains("insulin")
return edge.index()
graphrecord.query_edges(query_edge_old_user_cheap_item)
[]
Methods used in the snippet
in_group(): Query edges that belong to that group.attribute(): Returns aEdgeMultipleValuesWithoutIndexOperand()to query on the values of the edges for that attribute.less_than(): Query values that are less than that value.source_node(): Returns aNodeOperand()to query on the source nodes for those edges.is_max(): Query on the values that hold the maximum value.target_node(): Returns aNodeOperand()to query on the target nodes for those edges.contains(): Query values containing that argument.index(): Returns aEdgeIndicesOperandrepresenting the indices of the edges queried.query_edges(): Retrieves information on the edges from the GraphRecord given the query.
5. Combining Node & Edge Queries#
The full power of the query engine appears once you combine both operands inside the queries. In the following query, we are able to query for nodes that:
Are in group
userTheir attribute
ageis greater than 30, and their attributegenderis equal to “M”.They have at least an edge that is in in the
user_productgroup, which attributecostis less than 200 and its attributequantityis equal to 1.
# Combined node and edge query
def query_edge_combined(edge: EdgeOperand) -> EdgeIndicesOperand:
edge.in_group("user_product")
edge.attribute("cost").less_than(200)
edge.attribute("quantity").equal_to(1)
return edge.index()
def query_node_combined(node: NodeOperand) -> NodeIndicesOperand:
node.in_group("user")
node.attribute("age").is_int()
node.attribute("age").greater_than(30)
node.attribute("gender").equal_to("M")
query_edge_combined(node.edges())
return node.index()
graphrecord.query_nodes(query_node_combined)
['pat_4']
Methods used in the snippet
in_group(): Query edges that belong to that group.attribute(): Returns aEdgeMultipleValuesWithIndexOperand()to query on the values of the edges for that attribute.less_than(): Query values that are less than that value.equal_to(): Query values that are equal to that value.index(): Returns aEdgeIndicesOperandrepresenting the indices of the edges queried.is_int(): Query on the values which format isint.greater_than(): Query values that are greater than that value.edges(): Returns aEdgeOperand()to query on the edges of those nodes.index(): Returns aNodeIndicesOperandrepresenting the indices of the nodes queried.query_nodes(): Retrieves information on the nodes from the GraphRecord given the query.
6. Clones#
Since the statements in the query engine are additive, every operation modifies the state of the query. That means that it is not possible to revert to a previous state unless the entire query is rewritten from scratch for that intermediate step. This can become inefficient and redundant, particularly when multiple branches of a query or comparisons with intermediate results are required.
To address this limitation, the clone() method was introduced. This method allows users to create independent copies - or clones - of operands or computed values at any point in the query chain. Clones are completely decoupled from the original object, meaning that modifications of the clone do not affect the original, and vice versa. This functionality applies to all types of operands.
# Clone query
def query_node_clone(node: NodeOperand) -> NodeIndicesOperand:
node.in_group("user")
node.index().contains("pat")
mean_age_original = node.attribute("age").mean()
mean_age_clone = mean_age_original.clone() # Clone the mean age
# Subtract 5 fom the cloned mean age (original remains unchanged)
mean_age_clone.subtract(5)
node.attribute("age").less_than(mean_age_original) # Mean age
node.attribute("age").greater_than(mean_age_clone) # Mean age minus 5
return node.index()
graphrecord.query_nodes(query_node_clone)
[]
Methods used in the snippet
in_group(): Query nodes that belong to that group.index(): Returns aNodeIndexOperandto query on the indices.contains(): Query node indices containing that argument.attribute(): Returns aNodeMultipleValuesWithIndexOperand()to query on the values of the edges for that attribute.mean(): Returns aNodeSingleValueWithoutIndexOperandcontaining the mean of those values.clone(): Returns a clone of the operand.subtract(): Subtract the argument from the single value operand.greater_than(): Query values that are greater than that value.less_than(): Query values that are less than that value.index(): Returns aNodeIndicesOperandrepresenting the indices of the edges queried.query_nodes(): Retrieves information on the nodes from the GraphRecord given the query.
7. Full example Code#
The full code examples for this chapter can be found here:
import pandas as pd
import graphrecords as gr
from graphrecords.querying import (
EdgeIndicesOperand,
EdgeOperand,
NodeIndicesOperand,
NodeOperand,
)
# Create example dataset manually
users = pd.DataFrame(
[
["pat_0", 20, "M"],
["pat_1", 30, "F"],
["pat_2", 40, "M"],
["pat_3", 50, "F"],
["pat_4", 60, "M"],
],
columns=["index", "age", "gender"],
)
products = pd.DataFrame(
[
["drug_0", "fentanyl injection"],
["drug_1", "aspirin tablet"],
["drug_2", "insulin pen"],
],
columns=["index", "description"],
)
user_product = pd.DataFrame(
[
["pat_0", "drug_0", 100, 1, "2020-01-01"],
["pat_1", "drug_0", 150, 2, "2020-02-15"],
["pat_1", "drug_1", 50, 1, "2020-03-10"],
["pat_2", "drug_1", 75, 12, "2020-04-20"],
["pat_2", "drug_2", 200, 1, "2020-05-05"],
["pat_3", "drug_2", 180, 12, "2020-06-30"],
["pat_4", "drug_0", 120, 1, "2020-07-15"],
["pat_4", "drug_1", 60, 2, "2020-08-01"],
],
columns=["source", "target", "cost", "quantity", "time"],
)
graphrecord = (
gr.GraphRecord.builder()
.add_nodes((users, "index"), group="user")
.add_nodes((products, "index"), group="product")
.add_edges((user_product, "source", "target"), group="user_product")
.build()
)
# Basic node query
def query_node_in_user(node: NodeOperand) -> NodeIndicesOperand:
node.in_group("user")
return node.index()
graphrecord.query_nodes(query_node_in_user)
# Intermediate node query
def query_node_user_older_than_30(node: NodeOperand) -> NodeIndicesOperand:
node.in_group("user")
node.index().contains("pat")
node.has_attribute("age")
node.attribute("age").greater_than(30)
return node.index()
graphrecord.query_nodes(query_node_user_older_than_30)
# Reusing node query
def query_node_reused(node: NodeOperand) -> NodeIndicesOperand:
query_node_in_user(node)
node.index().contains("pat")
node.has_attribute("age")
node.attribute("age").greater_than(30)
return node.index()
graphrecord.query_nodes(query_node_reused)
# Node query with neighbors function
def query_node_neighbors(node: NodeOperand) -> NodeIndicesOperand:
query_node_user_older_than_30(node)
description_neighbors = node.neighbors().attribute("description")
description_neighbors.lowercase()
description_neighbors.contains("fentanyl")
return node.index()
graphrecord.query_nodes(query_node_neighbors)
# Basic edge query
def query_edge_user_product(edge: EdgeOperand) -> EdgeIndicesOperand:
edge.in_group("user_product")
return edge.index()
edges = graphrecord.query_edges(query_edge_user_product)
edges[0:5]
# Advanced edge query
def query_edge_old_user_cheap_item(edge: EdgeOperand) -> EdgeIndicesOperand:
edge.in_group("user_product")
edge.attribute("cost").less_than(200)
edge.source_node().attribute("age").is_max()
edge.target_node().attribute("description").contains("insulin")
return edge.index()
graphrecord.query_edges(query_edge_old_user_cheap_item)
# Combined node and edge query
def query_edge_combined(edge: EdgeOperand) -> EdgeIndicesOperand:
edge.in_group("user_product")
edge.attribute("cost").less_than(200)
edge.attribute("quantity").equal_to(1)
return edge.index()
def query_node_combined(node: NodeOperand) -> NodeIndicesOperand:
node.in_group("user")
node.attribute("age").is_int()
node.attribute("age").greater_than(30)
node.attribute("gender").equal_to("M")
query_edge_combined(node.edges())
return node.index()
graphrecord.query_nodes(query_node_combined)
# Either/or query
def query_edge_either(edge: EdgeOperand) -> None:
edge.in_group("user_product")
edge.attribute("cost").less_than(200)
edge.attribute("quantity").equal_to(1)
def query_edge_or(edge: EdgeOperand) -> None:
edge.in_group("user_product")
edge.attribute("cost").less_than(200)
edge.attribute("quantity").equal_to(12)
def query_node_either_or(node: NodeOperand) -> NodeIndicesOperand:
node.in_group("user")
node.attribute("age").greater_than(30)
node.edges().either_or(query_edge_either, query_edge_or)
return node.index()
graphrecord.query_nodes(query_node_either_or)
def query_node_either_or_component(node: NodeOperand) -> None:
node.in_group("user")
node.attribute("age").greater_than(30)
node.edges().either_or(query_edge_either, query_edge_or)
# Exclude query
def query_node_exclude(node: NodeOperand) -> NodeIndicesOperand:
node.in_group("user")
node.exclude(query_node_either_or_component)
return node.index()
graphrecord.query_nodes(query_node_exclude)
# Clone query
def query_node_clone(node: NodeOperand) -> NodeIndicesOperand:
node.in_group("user")
node.index().contains("pat")
mean_age_original = node.attribute("age").mean()
mean_age_clone = mean_age_original.clone() # Clone the mean age
# Subtract 5 fom the cloned mean age (original remains unchanged)
mean_age_clone.subtract(5)
node.attribute("age").less_than(mean_age_original) # Mean age
node.attribute("age").greater_than(mean_age_clone) # Mean age minus 5
return node.index()
graphrecord.query_nodes(query_node_clone)
# Node queries as function arguments
graphrecord.unfreeze_schema()
graphrecord.add_group("old_male_user", nodes=query_node_user_older_than_30)
graphrecord.groups
graphrecord.node[query_node_either_or]
graphrecord.groups_of_node(query_node_user_older_than_30)
graphrecord.edge_endpoints(query_edge_old_user_cheap_item)