{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Example: Generating data\n",
    "========================\n",
    "\n",
    "Motivation\n",
    "----------\n",
    "\n",
    "Given the fundamental problem of Causal Inference, simulating or\n",
    "generating data is of particular relevance when working with ATE or\n",
    "CATE estimation: it allows to have a ground truth that we don't get\n",
    "from the real world.\n",
    "\n",
    "For instance, when generating data, we can have access to the\n",
    "Individual Treatment Effect and use that ground truth to evaluate a\n",
    "treatment effect method at hand.\n",
    "\n",
    "In the following example we will describe how the modules\n",
    "{mod}`metalearners.data_generation` and\n",
    "{mod}`metalearners.outcome_functions` can be used to generate data in\n",
    "light of treatment effect estimation.\n",
    "\n",
    "\n",
    "How-to\n",
    "------\n",
    "\n",
    "In the context of treatment effect estimation, our data usually\n",
    "consists of 3 ingredients:\n",
    "\n",
    "- Covariates\n",
    "- Treatment assignments\n",
    "- Observed outcomes\n",
    "\n",
    "In this particular scenario of simulating data, we can add some\n",
    "quantities of interest which are not available in the real world:\n",
    "\n",
    "- Potential outcomes\n",
    "- True CATE or true ITE\n",
    "\n",
    "Let's generate those quantities one after another.\n",
    "\n",
    "\n",
    "### Covariates\n",
    "\n",
    "Let's start by generating covariates. We will use\n",
    "{func}`metalearners.data_generation.generate_covariates` for that\n",
    "purpose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "from metalearners.data_generation import generate_covariates\n",
    "\n",
    "features, categorical_features_idx, n_categories = generate_covariates(\n",
    "        n_obs=1000,\n",
    "        n_features=8,\n",
    "        n_categoricals=3,\n",
    "        format=\"pandas\",\n",
    ")\n",
    "features.head() # type: ignore"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that we generated a DataFrame with 8 columns of which the last\n",
    "three are categoricals.\n",
    "\n",
    "\n",
    "### Treatment assignments\n",
    "\n",
    "In this example we will replicate the setup of an RCT, i.e. where the\n",
    "treatment assignments are independent of the covariates. We rely on\n",
    "{func}`metalearners.data_generation.generate_treatment`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from metalearners.data_generation import generate_treatment\n",
    "\n",
    "# We use a fair conflip as a reference.\n",
    "propensity_scores = .5 * np.ones(1000)\n",
    "treatment = generate_treatment(propensity_scores)\n",
    "type(treatment), np.unique(treatment), treatment.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we would expect, an array of binary assignments is generated. The\n",
    "average approximately corresponds to the universal propensity score of\n",
    ".5.\n",
    "\n",
    "\n",
    "### Potential outcomes\n",
    "\n",
    "In this example we will rely on\n",
    "{func}`metalearners.outcome_functions.linear_treatment_effect`, which\n",
    "generates additive treatment effects which are linear in the features.\n",
    "Note that there are other potential outcome functions available."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "from metalearners._utils import get_linear_dimension\n",
    "from metalearners.outcome_functions import linear_treatment_effect\n",
    "\n",
    "dim = get_linear_dimension(features)\n",
    "outcome_function = linear_treatment_effect(dim)\n",
    "potential_outcomes = outcome_function(features)\n",
    "potential_outcomes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see it generates one column with the potential outcome {math}`Y(0)` and one column\n",
    "with the potential outcome {math}`Y(1)`. The individual treatment\n",
    "effect can be inferred as a subtraction of both.\n",
    "\n",
    "### Observed outcomes\n",
    "\n",
    "Lastly, we can combine the treatment assignments and potential\n",
    "outcomes to generate the observed outcomes. Note that there might be\n",
    "noise which distinguishes the potential outcome from the observed\n",
    "outcome. For that purpose we can use\n",
    "{func}`metalearners.data_generation.compute_experiment_outputs` and run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "from metalearners.data_generation import compute_experiment_outputs\n",
    "\n",
    "observed_outcomes, true_cate = compute_experiment_outputs(\n",
    "    potential_outcomes,\n",
    "    treatment,\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}