{"cells": [{"cell_type": "code", "execution_count": null, "id": "618bcfd4", "metadata": {"tags": []}, "outputs": [], "source": "%matplotlib inline\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom scipy import stats\nimport scipy.io as spio"}, {"cell_type": "markdown", "id": "eb240ca9", "metadata": {"tags": []}, "source": "# Bayesian linear regression\n\nIn this exercise session we consider the supervised regression problem of finding a function $f(x)$ that describes the relationship between a scalar input $x$ and a scalar output $y$:\n\n$$\ny = f(x) + \\epsilon, \\qquad \\epsilon \\sim \\mathcal{N}(0, \\beta^{-1}).\n$$\n\nWe model this with a Bayesian linear regression model\n\n$$\nf(x) = \\boldsymbol{\\phi}(x)^{\\mathsf{T}} \\mathbf{w}, \\qquad \\mathbf{w} \\sim \\mathcal{N}(\\boldsymbol{\\mu}_0, \\mathbf{S}_0),\n$$\n\nwhere $\\boldsymbol{\\phi}(x)$ is a vector of the input features.\nNote that we used the notation $\\mathbf{x}$ for the input features in the lecture.\nWe changed this notation to $\\boldsymbol{\\phi}(x)$ here in order to not mix it up with the scalar input $x$.\n\nThe Bayesian linear regression model is then given by\n\n$$\n\\begin{aligned}\np(\\mathbf{y} \\,|\\, \\mathbf{w}) &= \\mathcal{N}(\\mathbf{y}; \\boldsymbol{\\Phi}\\mathbf{w}, \\beta^{-1}\\mathbf{I}_N) \\qquad && \\text{(likelihood)}, \\\\\np(\\mathbf{w}) &= \\mathcal{N}(\\mathbf{w}; \\mathbf{m}_0, \\mathbf{S}_0) \\qquad && \\text{(prior)},\n\\end{aligned}\n$$\n\nwhere\n\n$$\n\\boldsymbol{\\Phi} = \\begin{bmatrix} \\boldsymbol{\\phi}(x_1)^{\\mathsf{T}} \\\\ \\vdots \\\\ \\boldsymbol{\\phi}(x_N)^{\\mathsf{T}} \\end{bmatrix}\n\\qquad \\text{and} \\qquad\n\\mathbf{y} = \\begin{bmatrix} y_1 \\\\ \\vdots \\\\ y_N \\end{bmatrix}.\n$$\n\nGiven a set of training data of inputs and outputs $\\mathcal{D} = \\{(x_i, y_i)\\}_{i=1}^N$, we are interested in finding the posterior of the weights $p(\\mathbf{w} \\,|\\, \\mathbf{y})$ and also the predictive distribution $p(f(x_{\\star}) \\,|\\, \\mathbf{y})$ of unseen input $x_{\\star}$.\nFor further information about the Bayesian linear regression model see Lecture 3 and/or Christopher Bishop's book [\"Pattern recognition and machine learning\"](https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/)."}, {"cell_type": "markdown", "id": "c8bc748e", "metadata": {"tags": []}, "source": "\n## Exercise 3.1: Understanding the code\n\nDownload the files <a href=\"https://uni-tuebingen.de/fileadmin/Uni_Tuebingen/Fakultaeten/MatNat/Fachbereiche/Informatik/Lehrstuehle/MethMaschLern/Probabilistic_ML/Notebook_Vorlesung_7___9/lindata.mat\" download=\"lindata.mat\">`lindata.mat`</a> and <a href=\"https://uni-tuebingen.de/fileadmin/Uni_Tuebingen/Fakultaeten/MatNat/Fachbereiche/Informatik/Lehrstuehle/MethMaschLern/Probabilistic_ML/Notebook_Vorlesung_7___9/nlindata.mat\" download=\"nlindata.mat\">`nlindata.mat`</a> and save them to the folder of this notebook.\nThese datasets are borrowed from Philipp Hennig's course [\"Probabilistic machine learning\"](https://uni-tuebingen.de/en/180804), given at the University of T\u00fcbingen.\n\nThe following code cell loads inputs, outputs, and precision parameter from `lindata.mat` and plots the feature vector\n\n$$\n\\boldsymbol{\\phi}(x)^{\\mathsf{T}} = [1, x].\n$$"}, {"cell_type": "code", "execution_count": null, "id": "0f6b1173", "metadata": {"tags": []}, "outputs": [], "source": "# Load data from disk\n# File should be in the same folder as the Jupyter notebook,\n# otherwise you have to adjust the path\ndata = spio.loadmat(\"lindata.mat\")\nx = data[\"X\"].flatten() # inputs\ny = data[\"Y\"].flatten() # outputs\nbeta = float(data[\"sigma\"])**(-2) # measurement noise precision\nN = x.size\n\n# Define the feature vector\ndef Phi(a):  # Phi(a) = [1, a]\n    return np.power(np.reshape(a, (-1, 1)), range(2))\n\n# Plot the features\nplt.plot(x, Phi(x), '-o')\nplt.title('features')\nplt.xlabel('x')\nplt.ylabel('y')\nplt.show()"}, {"cell_type": "markdown", "id": "94fd062e", "metadata": {"tags": []}, "source": "We then compute the posterior distribution of the weights $\\mathbf{w}$ of a Bayesian linear regression model using these features."}, {"cell_type": "code", "execution_count": null, "id": "b51efc98", "metadata": {"tags": []}, "outputs": [], "source": "# Define the prior on the weights\n# p(w) = N(w; m0, S0)\nD = Phi(0).size  # number of features\nm0 = np.zeros(D)\nS0 = 10*np.eye(D) / D\n\n# Compute the posterior distribution of the Bayesian linear regression model\n# p(w | y) = N(w; mN, SN)\nSN = np.linalg.inv(np.linalg.inv(S0) + beta * Phi(x).T @ Phi(x))\nmN = SN @ (np.linalg.inv(S0) @ m0 + beta * Phi(x).T @ y)"}, {"cell_type": "markdown", "id": "49a08758", "metadata": {"tags": []}, "source": "We visualize the posterior distribution by plotting the functions $f$ corresponding to different samples of $\\mathbf{w}$."}, {"cell_type": "code", "execution_count": null, "id": "6f9ebde2", "metadata": {"tags": []}, "outputs": [], "source": "# Generate grid of new inputs x* for plotting\nn = 100  # number of grid-points\nxs = np.linspace(-8, 8, n)\n\n# Visualize the posterior p(w | y) = N(w; mN, SN)\n# For samples of w, f(x) = phi(x)^T w is evaluated at inputs xs\n# Draw samples of w from the posterior\nsamples = 5\nseed = 100\nws = stats.multivariate_normal(mean=mN, cov=SN, allow_singular=True).rvs(samples, random_state=seed)\n\n# Compute corresponding values f(x*)\nfs = Phi(xs) @ ws.T\n\n# Plot the samples\nplt.plot(xs, fs, 'gray') # samples\nplt.scatter(x, y, zorder=3)\nplt.title('posterior - samples')\nplt.xlabel('x')\nplt.ylabel('y')\nplt.show()"}, {"cell_type": "markdown", "id": "2415e984", "metadata": {"tags": []}, "source": "Next we plot samples from and credibility regions of the predictive distribution."}, {"cell_type": "code", "execution_count": null, "id": "b0ec3962", "metadata": {"tags": []}, "outputs": [], "source": "# Compute the predictive distribution of the outputs y*\n# p(y* | y) = N(y*; m*, S*)\nmstar = Phi(xs) @ mN\nSstar = Phi(xs) @ SN @ Phi(xs).T + beta**(-1) * np.eye(n)\n\n# Extract standard deviation of predictive distribution\nstdpred= np.sqrt(np.diag(Sstar))\n\n# Plot credibility regions\nplt.plot(xs, mstar, 'black') # predictive mean\nplt.fill_between(xs, mstar + 3*stdpred, mstar - 3*stdpred, color='lightgray')\nplt.fill_between(xs, mstar + 2*stdpred, mstar - 2*stdpred, color='darkgray')\nplt.fill_between(xs, mstar + 1*stdpred, mstar - 1*stdpred, color='gray')\nplt.scatter(x, y, zorder=3)\nplt.title('predictive distribution')\nplt.xlabel('x')\nplt.ylabel('y')\nplt.show()"}, {"cell_type": "markdown", "id": "0013d607", "metadata": {"tags": []}, "source": "### (a)\n\nGo through the code and make sure that you can map the code to the model and regression method explained in Lecture 3.\nAlso run the code.\nMake sure you understand the figures."}, {"cell_type": "markdown", "id": "1ec0520b", "metadata": {"tags": []}, "source": "### (b)\n\nReduce the training data to only the first 5 data points in the training data.\nWhat impact does this have on the predictive distribution?"}, {"cell_type": "code", "execution_count": null, "id": "7f58d784", "metadata": {"tags": []}, "outputs": [], "source": ""}, {"cell_type": "markdown", "id": "7862dce0", "metadata": {"tags": []}, "source": "## Exercise 3.2: Feature vectors\n\n### (a)\n\nLoad `nlindata.mat` instead of `lindata.mat` and run the code for this data.\nUse all data, not only the first five data points as in Exercise 3.1 (b).\nDo you think the model performs well on this data?"}, {"cell_type": "code", "execution_count": null, "id": "17323022", "metadata": {"tags": []}, "outputs": [], "source": ""}, {"cell_type": "markdown", "id": "7cb5f804", "metadata": {"tags": []}, "source": "### (b)\n\nIn order to improve the performance, consider instead a feature vector with an additional quadratic term\n\n$$\n\\boldsymbol{\\phi}(x)^{\\mathsf{T}} = [1, x, x^2].\n$$\n\nChange the code accordingly and run it.\n\n*Hint:* Only a very minor modification in the code is required to accommodate for this change."}, {"cell_type": "code", "execution_count": null, "id": "741dc020", "metadata": {"tags": []}, "outputs": [], "source": ""}, {"cell_type": "markdown", "id": "971f6a00", "metadata": {"tags": []}, "source": "### (c)\n\nUse step functions\n\n$$\nh(x) = \\begin{cases}\n1, \\qquad x \\geq 0,\\\\\n0, \\qquad x < 0,\n\\end{cases}\n$$\n\nas features and change the code accordingly.\nPlace in total 9 of these features with two steps apart between $x = -8$ and $x = 8$.\nThe feature vector is then\n\n$$\n\\boldsymbol{\\phi}(x)^{\\mathsf{T}} = [h(x-8), h(x - 6), \\ldots, h(x + 8)].\n$$"}, {"cell_type": "code", "execution_count": null, "id": "a131e515", "metadata": {"tags": []}, "outputs": [], "source": ""}, {"cell_type": "markdown", "id": "3f78e53b", "metadata": {"tags": []}, "source": "### (d)\n\nCan you come up with any other features that improve performance even further?"}, {"cell_type": "code", "execution_count": null, "id": "989217ea", "metadata": {"tags": []}, "outputs": [], "source": ""}, {"cell_type": "markdown", "id": "29bbcaea", "metadata": {"tags": []}, "source": "## Exercise 3.3: Marginal likelihood\n\nTo get a quantitative measure of the performance of the proposed feature vectors, we want to compare them by computing the marginal likelihood $p(\\mathbf{y})$ for each of the models.\nRefer to Exercise 2.12(a) for the expression of the marginal likelihood of the Bayesian linear regression model."}, {"cell_type": "markdown", "id": "cb469eb1", "metadata": {"tags": []}, "source": "### (a)\n\nExtend the code to also compute the logarithm of the marginal likelihood.\nWhich one of the three feature vectors in Exercise 3.2 gives the largest log marginal likelihood on the data `nlindata.mat`?"}, {"cell_type": "code", "execution_count": null, "id": "1740bcd3", "metadata": {"tags": []}, "outputs": [], "source": ""}, {"cell_type": "markdown", "id": "7b657800", "metadata": {"tags": []}, "source": "### (b)\n\nPerform the same comparison on the data `lindata.mat`.\nWhat are your conclusions?"}, {"cell_type": "code", "execution_count": null, "id": "725372bc", "metadata": {"tags": []}, "outputs": [], "source": ""}, {"cell_type": "markdown", "id": "cfd64388", "metadata": {"tags": []}, "source": "### (c)\n\nCan you come up with any other feature vectors and/or values for prior/likelihood precisions that give an even larger log marginal likelihood?"}], "metadata": {"@webio": {"lastCommId": null, "lastKernelId": null}, "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6"}, "vscode": {"interpreter": {"hash": "e0466e4bbc1f6f77653f92f7ee99fe375173484495b8b5339e7493ccb72bc580"}}}, "nbformat": 4, "nbformat_minor": 5}