# medmodels.matching.algorithms.propensity_score

#### calculate_propensity

```
def calculate_propensity(
x_train: NDArray[Union[np.int64, np.float64]],
y_train: NDArray[Union[np.int64, np.float64]],
treated_test: NDArray[Union[np.int64, np.float64]],
control_test: NDArray[Union[np.int64, np.float64]],
model: Model = "logit",
hyperparam: Optional[Dict[str, Any]] = None
) -> Tuple[NDArray[np.float64], NDArray[np.float64]]
```

Trains a classification algorithm on training data, predicts the probability of being in the last class for treated and control test datasets, and returns these probabilities.

This function supports multiple classification algorithms and allows specifying hyperparameters. It is designed for binary classification tasks, focusing on the probability of the positive class.

**Arguments**:

`x_train`

*NDArray[Union[np.int64, np.float64]]*- Feature matrix for training.`y_train`

*NDArray[Union[np.int64, np.float64]]*- Target variable for training.`treated_test`

*NDArray[Union[np.int64, np.float64]]*- Feature matrix for the treated group to predict probabilities.`control_test`

*NDArray[Union[np.int64, np.float64]]*- Feature matrix for the control group to predict probabilities.`model`

*Model, optional*- Classification algorithm to use. Options: "logit", "dec_tree", "forest".`hyperparam`

*Optional[Dict[str, Any]], optional*- Manual hyperparameter settings. Uses default if None.

**Returns**:

Tuple[NDArray[np.float64], NDArray[np.float64]: Probabilities of the positive class for treated and control groups.

**Example**:

For "dec_tree" model with iris dataset inputs, returns probabilities of the last class for treated and control sets, e.g., ([0.], [0.]).

#### run_propensity_score

```
def run_propensity_score(
treated_set: pl.DataFrame,
control_set: pl.DataFrame,
model: Model = "logit",
metric: Metric = "absolute",
number_of_neighbors: int = 1,
hyperparam: Optional[Dict[str, Any]] = None,
covariates: Optional[MedRecordAttributeInputList] = None
) -> pl.DataFrame
```

Executes Propensity Score matching using a specified classification algorithm. Constructs the training target by assigning 1 to the treated set and 0 to the control set, then predicts the propensity score. This score is used for matching using the nearest neighbor method.

This function simplifies the process of propensity score matching, focusing on the use of the propensity score as the sole covariate for matching.

**Arguments**:

`treated_set`

*pl.DataFrame*- Data for the treated group.`control_set`

*pl.DataFrame*- Data for the control group.`model`

*Model, optional*- Classification algorithm for predicting probabilities. Options include "logit", "dec_tree", "forest".`metric`

*Metric, optional*- Metric for matching. Options include "absolute", "mahalanobis", "exact". Defaults to "absolute".`number_of_neighbors`

*int, optional*- Number of nearest neighbors to find for each treated unit. Defaults to 1.`hyperparam`

*Optional[Dict[str, Any]], optional*- Hyperparameters for model tuning. Increases computation time if set. Uses default if None.`covariates`

*Optional[MedRecordAttributeInputList], optional*- Features for matching. Uses all if None.

**Returns**:

`pl.DataFrame`

- Matched subset from the control set corresponding to the treated set.

# medmodels.matching.algorithms.classic_distance_models

#### nearest_neighbor

```
def nearest_neighbor(
treated_set: pl.DataFrame,
control_set: pl.DataFrame,
metric: metrics.Metric,
number_of_neighbors: int = 1,
covariates: Optional[MedRecordAttributeInputList] = None
) -> pl.DataFrame
```

Performs nearest neighbor matching between two dataframes using a specified metric. This method employs a greedy algorithm to pair elements from the treated set with their closest matches in the control set based on the given metric. The algorithm does not optimize for the best overall matching but ensures a straightforward, commonly used approach. The method is flexible to different metrics and requires preliminary size comparison of treated and control sets to determine the direction of matching. It supports optional specification of covariates for focused matching.

**Arguments**:

`treated_set`

*pl.DataFrame*- DataFrame for which matches are sought.`control_set`

*pl.DataFrame*- DataFrame from which matches are selected.`metric`

*metrics.Metric*- Metric to measure closeness between units, e.g., "absolute", "mahalanobis". The metric must be available in the metrics module.`number_of_neighbors`

*int, optional*- Number of nearest neighbors to find for each treated unit. Defaults to 1.`covariates`

*Optional[MedRecordAttributeInputList], optional*- Covariates considered for matching. Defaults to all variables.

**Returns**:

`pl.DataFrame`

- Matched subset from the control set.