MedRecord#
1. Preface#
Every major library has a central object that constitutes its core. For PyTorch, it is the torch.Tensor
, whereas for Numpy, it is the np.array
. In our case, MedModels centres around the MedRecord
as its foundational structure.
MedModels delivers advanced data analytics methods out-of-the-box by utilizing a structured approach to data storage. This is enabled by the MedRecord
class, which organizes data of any complexity within a graph structure. With its Rust backend implementation, MedRecord guarantees high performance, even when working with extremely large datasets.
import medmodels as mm
2. Adding Nodes to a MedRecord#
Let’s begin by introducing some sample medical data:
ID |
Age |
Sex |
Loc |
---|---|---|---|
Patient 01 |
72 |
M |
USA |
Patient 02 |
74 |
M |
USA |
Patient 03 |
64 |
F |
GER |
This data, stored for example in a Pandas DataFrame, looks like this:
patients = pd.DataFrame(
[
["Patient 01", 72, "M", "USA"],
["Patient 02", 74, "M", "USA"],
["Patient 03", 64, "F", "GER"],
],
columns=["ID", "Age", "Sex", "Loc"],
)
In the example below, we create a new MedRecord using the builder pattern. We instantiate a MedRecordBuilder
and instruct it to add the Pandas DataFrame as nodes, using the ‘ID’ column for indexing. Additionally, we assign these nodes to the group ‘Patients’.
The Builder Pattern simplifies creating complex objects by constructing them step by step. It improves flexibility, readability, and consistency, making it easier to manage and configure objects in a controlled way.
record = mm.MedRecord.builder().add_nodes((patients, "ID"), group="Patients").build()
Methods used in the snippet
builder()
: Creates a newMedRecordBuilder
instance to build aMedRecord
.add_nodes()
: Adds nodes to the MedRecord from different data formats and optionally assigns them to a group.build()
: Constructs a MedRecord instance from the builder’s configuration.
The MedModels MedRecord object, record
, now contains three patients. Each patient is identified by a unique index and has specific attributes, such as age, sex, and location. These patients serve as the initial nodes in the graph structure of our MedRecord, and are represented as follows:

We can now proceed by adding additional data, such as the following medications.
medications = pd.DataFrame(
[["Med 01", "Insulin"], ["Med 02", "Warfarin"]], columns=["ID", "Name"]
)
Using the builder pattern to construct the MedRecord allows us to pass as many nodes and edges as needed. If nodes are not added during the initial graph construction, they can easily be added later to an existing MedRecord by calling add_nodes()
, where you provide the DataFrame and specify the column containing the node indices.
record.add_nodes((medications, "ID"), group="Medications")
Methods used in the snippet
add_nodes()
: Adds nodes to the MedRecord from different data formats and optionally assigns them to a group.
This will expand the MedRecord, adding several new nodes to the graph. However, these nodes are not yet connected, so let’s establish relationships between them!

Note
Nodes can be added to the MedRecord in a lot of different formats, such as a Pandas DataFrame (as previously shown), but also from a Polars DataFrame:
patient_tuples = [
("Patient 04", {"Age": 45, "Sex": "F", "Loc": "CHI"}),
("Patient 05", {"Age": 26, "Sex": "M", "Loc": "SPA"}),
]
record.add_nodes(patient_tuples, group="Patients")
Or from a NodeTuple
:
patient_polars = pl.DataFrame(
[
["Patient 06", 55, "F", "GER"],
["Patient 07", 61, "F", "USA"],
["Patient 08", 73, "M", "CHI"],
],
schema=["ID", "Age", "Sex", "Loc"],
orient="row",
)
record.add_nodes((patient_polars, "ID"), group="Patients")
3. Adding Edges to a MedRecord#
To capture meaningful relationships between nodes, such as linking patients to prescribed medications, we add edges to the MedRecord. These edges must be specified in a relation table, such as the one shown below:
Pat_ID |
Med_ID |
time |
---|---|---|
Patient 02 |
Med 01 |
2020/06/07 |
Patient 02 |
Med 02 |
2018/02/02 |
Patient 03 |
Med 02 |
2019/03/02 |
We can add these edges then to our MedRecord Graph:
record.add_edges((patient_medication, "Pat_ID", "Med_ID"))
Methods used in the snippet
add_edges()
: Adds edges to the MedRecord from different data formats and optionally assigns them to a group.
This results in an enlarged Graph with more information.

4. Adding Groups to a MedRecord#
For certain analyses, we may want to define specific subcohorts within our MedRecord for easier access. We can do this by defining named groups withing our MedRecord.
record.add_group("US-Patients", nodes=["Patient 01", "Patient 02"])
Methods used in the snippet
add_group()
: Adds a group to the MedRecord instance with an optional list of node and/or edge indices.
This group will include all the defined nodes, allowing for easier access during complex analyses. Both nodes and edges can be added to a group, with no limitations on group size. Additionally, nodes and edges can belong to multiple groups without restriction.

5. Saving and Loading MedRecords#
When building a MedRecord, you may want to save it to create a persistent version. This can be done by storing it as a RON (Rusty Object Notation) file. The MedRecord can then be reloaded, allowing you to create a new instance from the saved RON file.
record.to_ron("record.ron")
new_record = mm.MedRecord.from_ron("record.ron")
Methods used in the snippet
to_ron()
: Writes the MedRecord instance to a RON file.from_ron()
: Creates a MedRecord instance from a RON file.
6. Overview Tables#
The MedRecord class is designed to efficiently handle large datasets while maintaining a standardized data structure that supports complex analysis methods. As a result, the structure within the MedRecord can become intricate and difficult to manage. To address this, MedModels offers tools to help keep track of the graph-based data. One such tool is the overview_nodes()
method, which prints an overview over all nodes in the MedRecord.
record.overview_nodes()
---------------------------------------------------------------------------
Nodes Group Count Attribute Type Data
---------------------------------------------------------------------------
Medications 2 Name Categorical Categories: Insulin, Warfarin
Patients 8 Age Continuous min: 26
max: 74
mean: 58.75
Loc Categorical Categories: CHI, GER, SPA, USA
Sex Categorical Categories: F, M
US-Patients 2 Age Continuous min: 72
max: 74
mean: 73.00
Loc Categorical Categories: USA
Sex Categorical Categories: M
Ungrouped Nodes 1 - - -
---------------------------------------------------------------------------
Methods used in the snippet
overview_nodes()
: Gets a summary for all nodes in groups and their attributes.
As shown, we have two groups of nodes - Patients and Medications - created when adding the nodes. Additionally, there’s a group called ‘US-Patients’ that we created. For each group of nodes, we can view their attributes along with a brief statistical summary, such as the minimum, maximum, and mean for numeric variables. It also shows the number of Ungrouped Nodes we have in the MedRecord.
We can do the same to get an overview over edges in our MedRecord by using the overview_edges()
method:
record.overview_edges()
------------------------------------------
Edges Group Count Attribute Type Data
------------------------------------------
Ungrouped Edges 3 - - -
------------------------------------------
However, they need to belong to a group in order to show their attributes in the overview.
record.add_group("Patient-Medication", edges=record.edges)
record.overview_edges()
---------------------------------------------------------------------
Edges Group Count Attribute Type Data
---------------------------------------------------------------------
Patient-Medication 3 Date Temporal min: 2018-02-02 00:00:00
max: 2020-06-07 00:00:00
---------------------------------------------------------------------
Methods used in the snippet
edges
: Lists the edge indices in the MedRecord instance.add_group()
: Adds a group to the MedRecord instance with an optional list of node and/or edge indices.overview_edges()
: Gets a summary for all edges in groups and their attributes.
Note
In this case, we are using the edges
property to add all edges in the MedRecord to that group since there are no type of edges in our MedRecord. In any other case, you should provide the specific indices of the edges you want to add to that group. You can learn how to select specific edges in the Query Engine user guide.
7. Accessing Elements in a MedRecord#
Now that we have stored some structured data in our MedRecord, we might want to access certain elements of it. The main way to do this is by either selecting the data with their indices or via groups that they are in.
We can, for example, get all available nodes:
record.nodes
['Patient 05', 'Patient 06', 'Patient 07', 'Patient 08', 'Patient 03', 'Med 01', 'Patient 04', 'Patient 01', 'Patient 02', 'Med 02', 'Patient 09']
Or access the attributes of a specific node:
record.node["Patient 01"]
{'Loc': 'USA', 'Age': 72, 'Sex': 'M'}
Or a specific edge:
record.edge[0]
{'Date': datetime.datetime(2020, 6, 7, 0, 0)}
Or get all available groups:
record.groups
['Patients', 'US-Patients', 'Medications', 'Patient-Medication']
Or get all that nodes belong to a certain group:
record.nodes_in_group("Medications")
['Med 01', 'Med 02']
Methods used in the snippet
nodes()
: Lists the node indices in the MedRecord instance.node[]
: Provides access to node information within the MedRecord instance via an indexer, returning a dictionary with node indices as keys and node attributes as values.edge[]
: Provides access to edge attributes within the MedRecord via an indexer, returning a dictionary with edge indices and edge attributes as values.groups()
: Lists the groups in the MedRecord instance.nodes_in_group()
: Retrieves the node indices associated with the specified group(s) in the MedRecord.
The MedRecord can be queried in very advanced ways in order to find very specific nodes based on time, relations, neighbors or other. These advanced querying methods are covered in one of the next sections of the user guide, Query Engine.
8. Full example Code#
The full code examples for this chapter can be found here:
import pandas as pd
import polars as pl
import medmodels as mm
# Patients DataFrame (Nodes)
patients = pd.DataFrame(
[
["Patient 01", 72, "M", "USA"],
["Patient 02", 74, "M", "USA"],
["Patient 03", 64, "F", "GER"],
],
columns=["ID", "Age", "Sex", "Loc"],
)
# Medications DataFrame (Nodes)
medications = pd.DataFrame(
[["Med 01", "Insulin"], ["Med 02", "Warfarin"]], columns=["ID", "Name"]
)
# Patients-Medication Relation (Edges)
patient_medication = pd.DataFrame(
[
["Patient 02", "Med 01", pd.Timestamp("20200607")],
["Patient 02", "Med 02", pd.Timestamp("20180202")],
["Patient 03", "Med 02", pd.Timestamp("20190302")],
],
columns=["Pat_ID", "Med_ID", "Date"],
)
record = mm.MedRecord.builder().add_nodes((patients, "ID"), group="Patients").build()
record.add_nodes((medications, "ID"), group="Medications")
patient_tuples = [
("Patient 04", {"Age": 45, "Sex": "F", "Loc": "CHI"}),
("Patient 05", {"Age": 26, "Sex": "M", "Loc": "SPA"}),
]
record.add_nodes(patient_tuples, group="Patients")
patient_polars = pl.DataFrame(
[
["Patient 06", 55, "F", "GER"],
["Patient 07", 61, "F", "USA"],
["Patient 08", 73, "M", "CHI"],
],
schema=["ID", "Age", "Sex", "Loc"],
orient="row",
)
record.add_nodes((patient_polars, "ID"), group="Patients")
record.add_edges((patient_medication, "Pat_ID", "Med_ID"))
record.add_group("US-Patients", nodes=["Patient 01", "Patient 02"])
record.add_nodes(
(
pd.DataFrame(
[["Patient 09", 65, "M", "USA"]], columns=["ID", "Age", "Sex", "Loc"]
),
"ID",
),
)
record.overview_nodes()
record.overview_edges()
# Adding edges to a certain group so that they are shown in the overview
record.add_group("Patient-Medication", edges=record.edges)
record.overview_edges()
# Getting all available nodes
record.nodes
# Accessing a certain node
record.node["Patient 01"]
# Accessing a certain edge
record.edge[0]
# Getting all available groups
record.groups
# Getting the nodes that are within a certain group
record.nodes_in_group("Medications")
record.to_ron("record.ron")
new_record = mm.MedRecord.from_ron("record.ron")