Query Engine#

1.  What is the Query Engine?#

The MedRecord Query Engine enables users to find node and edges’ indices stored in the graph structure efficiently. Thanks to an intuitive interface, complex queries can be performed, allowing you to filter nodes and edges by their properties and relationships. This section introduces the basic concepts of querying MedRecords and explores advanced techniques for working with complex datasets.

2.  Example dataset#

An example dataset for the following demonstrations was generated with the method from_simple_example_dataset from the MedRecord class.

medrecord = MedRecord().from_simple_example_dataset()

This example dataset includes a set of patients, drugs, diagnoses and procedures. For this section, we will only use the patients, drugs and the edges that connect these two groups.

patients.head(10)
      age gender
pat_1  42      M
pat_2  22      F
pat_3  96      F
pat_4  19      M
pat_5  37      M
drugs.head(10)
                                                    description
drug_1043400  Acetaminophen 21.7 MG/ML / Dextromethorphan Hy...
drug_1049625  Acetaminophen 325 MG / Oxycodone Hydrochloride...
drug_1648755  nitrofurantoin  macrocrystals 25 MG / nitrofur...
drug_1665060                        cefazolin 2000 MG Injection
drug_1808217                 100 ML Propofol 10 MG/ML Injection
drug_1860491  12 HR Hydrocodone Bitartrate 10 MG Extended Re...
drug_245134       72 HR Fentanyl 0.025 MG/HR Transdermal System
drug_309362                       Clopidogrel 75 MG Oral Tablet
drug_310798               Hydrochlorothiazide 25 MG Oral Tablet
drug_311034   insulin  regular  human 100 UNT/ML Injectable ...
patients_drugs_edges.head(10)
   source        target quantity                 time    cost
60  pat_1   drug_856987        3  2014-04-08 12:54:59  215.58
61  pat_1   drug_856987        3  2015-04-14 12:54:59   281.4
62  pat_1   drug_856987        3  2017-04-25 12:54:59  105.15
63  pat_1   drug_856987        3  2018-05-01 12:54:59  386.43
64  pat_1   drug_562251        1  2021-01-05 12:54:59  742.23
65  pat_1   drug_856987        3  2023-05-30 12:54:59  162.03
66  pat_2   drug_834061        1  2014-08-12 09:01:28  381.67
67  pat_2   drug_748856       12  2015-12-18 15:01:28  7822.2
68  pat_2   drug_313782        1  2018-06-15 09:25:40   141.5
69  pat_2  drug_1648755        1  2018-12-12 06:33:04  129.94

3.  Node Queries#

The NodeOperand querying class allow you to define specific criteria for selecting nodes within a MedRecord. These operands enable flexible and complex queries by combining multiple conditions, such as group membership, attributes’ selection and querying, attribute values, and relationships to other nodes or edges. This section introduces the basic usage of node operands to create a powerful foundation for your data queries.

The function query_nodes() and its counterpart query_edges() are the main ways to use these queries. They can retrieve different types of data from the MedRecord, such as the indices of some nodes that fulfill some criteria (using index()), or even the mean age of those nodes (mean()).

def query_node_in_patient(node: NodeOperand) -> NodeIndicesOperand:
    node.in_group("patient")

    return node.index()


medrecord.query_nodes(query_node_in_patient)
['pat_2', 'pat_4', 'pat_3', 'pat_5', 'pat_1']
Methods used in the snippet

You can get to the same result via different approaches. That makes the query engine very versatile and adaptive to your specific needs. Let’s complicate it a bit more involving more than one operand.

def query_node_patient_older_than_30(node: NodeOperand) -> NodeIndicesOperand:
    node.in_group("patient")
    node.index().contains("pat")

    node.has_attribute("age")
    node.attribute("age").greater_than(30)

    return node.index()


medrecord.query_nodes(query_node_patient_older_than_30)
['pat_3', 'pat_5', 'pat_1']
Methods used in the snippet

Note

The has_attribute() method is not needed in this example, since the attribute() one already checks whether the nodes have the attribute. It is placed there merely for educational purposes. This will happen in different examples in this user guide to ensure the maximum amount of methods are portrayed.

3.1.  Reusing Node Queries#

As you can see, the query engine can prove to be highly useful for finding nodes that fulfill different criteria, these criteria being as specific and narrowing as we like. A key feature of the query engine is that it allows for re-using previous queries in new ones. For instance, the previous query can be written as follows:

def query_node_reused(node: NodeOperand) -> NodeIndicesOperand:
    query_node_in_patient(node)
    node.index().contains("pat")

    node.has_attribute("age")
    node.attribute("age").greater_than(30)

    return node.index()


medrecord.query_nodes(query_node_reused)
['pat_3', 'pat_5', 'pat_1']
Methods used in the snippet

3.2.  Neighbors#

Another very useful method is neighbors(), which can be used to query through the nodes that are neighbors to those nodes (they have edges connecting them).

In this following example we are selecting the nodes that fulfill the following criteria:

  • Are in group patient.

  • Their node index contains the string “pat”

  • Their attribute age is greater than 30, and their attribute gender is equal to “M”.

  • They are connected to nodes which attribute description contains the word “fentanyl” in either upper or lowercase.

def query_node_neighbors(node: NodeOperand) -> NodeIndicesOperand:
    query_node_patient_older_than_30(node)

    description_neighbors = node.neighbors().attribute("description")
    description_neighbors.lowercase()
    description_neighbors.contains("fentanyl")

    return node.index()


medrecord.query_nodes(query_node_neighbors)
['pat_5']
Methods used in the snippet

4.  Edge Queries#

The querying class EdgeOperand provides a way to query through the edges contained in a MedRecord. Edge operands show the same functionalities as Node operands, creating a very powerful tandem to query throughout your data. In this section, we will portray different ways the edge operands can be employed.

def query_edge_patient_drug(edge: EdgeOperand) -> EdgeIndicesOperand:
    edge.in_group("patient_drug")
    return edge.index()


edges = medrecord.query_edges(query_edge_patient_drug)
edges[0:5]
[90, 106, 83, 71, 64]
Methods used in the snippet

The edge operand follows the same principles as the node operand, with some extra queries applicable only to edges like source_node() or target_node() (instead of neighbors()).

def query_edge_old_patient_cheap_insulin(edge: EdgeOperand) -> EdgeIndicesOperand:
    edge.in_group("patient_drug")
    edge.attribute("cost").less_than(200)

    edge.source_node().attribute("age").is_max()
    edge.target_node().attribute("description").contains("insulin")
    return edge.index()


medrecord.query_edges(query_edge_old_patient_cheap_insulin)
[76]
Methods used in the snippet

5.  Combining Node & Edge Queries#

The full power of the query engine appears once you combine both operands inside the queries. In the following query, we are able to query for nodes that:

  • Are in group patient

  • Their attribute age is greater than 30, and their attribute gender is equal to “M”.

  • They have at least an edge that is in in the patient_drug group, which attribute cost is less than 200 and its attribute quantity is equal to 1.

def query_edge_combined(edge: EdgeOperand) -> EdgeIndicesOperand:
    edge.in_group("patient_drug")
    edge.attribute("cost").less_than(200)
    edge.attribute("quantity").equal_to(1)

    return edge.index()


def query_node_combined(node: NodeOperand) -> NodeIndicesOperand:
    node.in_group("patient")
    node.attribute("age").is_int()
    node.attribute("age").greater_than(30)
    node.attribute("gender").equal_to("M")

    query_edge_combined(node.edges())

    return node.index()


medrecord.query_nodes(query_node_combined)
['pat_5']
Methods used in the snippet

6.  Clones#

Since the statements in the query engine are additive, every operation modifies the state of the query. That means that it is not possible to revert to a previous state unless the entire query is rewritten from scratch for that intermediate step. This can become inefficient and redundant, particularly when multiple branches of a query or comparisons with intermediate results are required.

To address this limitation, the clone() method was introduced. This method allows users to create independent copies - or clones - of operands or computed values at any point in the query chain. Clones are completely decoupled from the original object, meaning that modifications of the clone do not affect the original, and vice versa. This functionality applies to all types of operands.

def query_node_clone(node: NodeOperand) -> NodeIndicesOperand:
    node.in_group("patient")
    node.index().contains("pat")

    mean_age_original = node.attribute("age").mean()
    mean_age_clone = mean_age_original.clone()  # Clone the mean age

    # Subtract 5 fom the cloned mean age (original remains unchanged)
    mean_age_clone.subtract(5)

    node.attribute("age").less_than(mean_age_original)  # Mean age
    node.attribute("age").greater_than(mean_age_clone)  # Mean age minus 5

    return node.index()


medrecord.query_nodes(query_node_clone)
['pat_1']
Methods used in the snippet

7.  Full example Code#

The full code examples for this chapter can be found here:

from medmodels import MedRecord
from medmodels.medrecord.querying import (
    EdgeIndicesOperand,
    EdgeOperand,
    NodeIndicesOperand,
    NodeOperand,
)

medrecord = MedRecord().from_simple_example_dataset()


# Basic node query
def query_node_in_patient(node: NodeOperand) -> NodeIndicesOperand:
    node.in_group("patient")

    return node.index()


medrecord.query_nodes(query_node_in_patient)


# Intermediate node query
def query_node_patient_older_than_30(node: NodeOperand) -> NodeIndicesOperand:
    node.in_group("patient")
    node.index().contains("pat")

    node.has_attribute("age")
    node.attribute("age").greater_than(30)

    return node.index()


medrecord.query_nodes(query_node_patient_older_than_30)


# Reusing node query
def query_node_reused(node: NodeOperand) -> NodeIndicesOperand:
    query_node_in_patient(node)
    node.index().contains("pat")

    node.has_attribute("age")
    node.attribute("age").greater_than(30)

    return node.index()


medrecord.query_nodes(query_node_reused)


# Node query with neighbors function
def query_node_neighbors(node: NodeOperand) -> NodeIndicesOperand:
    query_node_patient_older_than_30(node)

    description_neighbors = node.neighbors().attribute("description")
    description_neighbors.lowercase()
    description_neighbors.contains("fentanyl")

    return node.index()


medrecord.query_nodes(query_node_neighbors)


# Basic edge query
def query_edge_patient_drug(edge: EdgeOperand) -> EdgeIndicesOperand:
    edge.in_group("patient_drug")
    return edge.index()


edges = medrecord.query_edges(query_edge_patient_drug)
edges[0:5]


# Advanced edge query
def query_edge_old_patient_cheap_insulin(edge: EdgeOperand) -> EdgeIndicesOperand:
    edge.in_group("patient_drug")
    edge.attribute("cost").less_than(200)

    edge.source_node().attribute("age").is_max()
    edge.target_node().attribute("description").contains("insulin")
    return edge.index()


medrecord.query_edges(query_edge_old_patient_cheap_insulin)


# Combined node and edge query
def query_edge_combined(edge: EdgeOperand) -> EdgeIndicesOperand:
    edge.in_group("patient_drug")
    edge.attribute("cost").less_than(200)
    edge.attribute("quantity").equal_to(1)

    return edge.index()


def query_node_combined(node: NodeOperand) -> NodeIndicesOperand:
    node.in_group("patient")
    node.attribute("age").is_int()
    node.attribute("age").greater_than(30)
    node.attribute("gender").equal_to("M")

    query_edge_combined(node.edges())

    return node.index()


medrecord.query_nodes(query_node_combined)


# Either/or query
def query_edge_either(edge: EdgeOperand) -> None:
    edge.in_group("patient_drug")
    edge.attribute("cost").less_than(200)
    edge.attribute("quantity").equal_to(1)


def query_edge_or(edge: EdgeOperand) -> None:
    edge.in_group("patient_drug")
    edge.attribute("cost").less_than(200)
    edge.attribute("quantity").equal_to(12)


def query_node_either_or(node: NodeOperand) -> NodeIndicesOperand:
    node.in_group("patient")
    node.attribute("age").greater_than(30)

    node.edges().either_or(query_edge_either, query_edge_or)

    return node.index()


medrecord.query_nodes(query_node_either_or)


def query_node_either_or_component(node: NodeOperand) -> None:
    node.in_group("patient")
    node.attribute("age").greater_than(30)

    node.edges().either_or(query_edge_either, query_edge_or)


# Exclude query
def query_node_exclude(node: NodeOperand) -> NodeIndicesOperand:
    node.in_group("patient")
    node.exclude(query_node_either_or_component)

    return node.index()


medrecord.query_nodes(query_node_exclude)


# Clone query
def query_node_clone(node: NodeOperand) -> NodeIndicesOperand:
    node.in_group("patient")
    node.index().contains("pat")

    mean_age_original = node.attribute("age").mean()
    mean_age_clone = mean_age_original.clone()  # Clone the mean age

    # Subtract 5 fom the cloned mean age (original remains unchanged)
    mean_age_clone.subtract(5)

    node.attribute("age").less_than(mean_age_original)  # Mean age
    node.attribute("age").greater_than(mean_age_clone)  # Mean age minus 5

    return node.index()


medrecord.query_nodes(query_node_clone)

# Node queries as function arguments
medrecord.unfreeze_schema()
medrecord.add_group("old_male_patient", nodes=query_node_patient_older_than_30)
medrecord.groups

medrecord.node[query_node_either_or]
medrecord.groups_of_node(query_node_patient_older_than_30)
medrecord.edge_endpoints(query_edge_old_patient_cheap_insulin)