{"cells": [{"cell_type": "code", "execution_count": null, "metadata": {"tags": []}, "outputs": [], "source": "import pandas as pd\nimport numpy as np\nimport matplotlib\nimport matplotlib.pyplot as plt\n\nfrom sklearn import tree\nfrom sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier\n\n%matplotlib inline"}, {"cell_type": "markdown", "metadata": {"tags": []}, "source": "# AdaBoost\n## 9.1 Exponential loss vs misclasification loss\nIn boosting, we label (for convenience) the classes as $+1$ and $-1$, respectively. We furthermore let our classifier $\\widehat{y}(x)$ be on the form $\\widehat{y}(x) = \\operatorname{sign}\\left(c(x)\\right)$, i.e., a thresholding at $0$ of some real-valued function $c(x)$. Assume that we have learned some function $\\widehat{c}(x)$ from training data such that we can make predictions $\\widehat{y}(x) = \\operatorname*{sign}\\left(\\widehat{c}(x)\\right)$. Fill in the missing columns in the table below:\n\n| ${\\widehat{c}{(x_\\star)}}$ | $\\widehat{y}_\\star$ | Exponential loss
$\\mathrm{exp}{\\left(-y_\\star\\widehat{c}{(x_\\star)}\\right)}$ | Misclassification loss
$\\mathbb{I}\\left(y_\\star\\neq\\widehat{y}_\\star\\right)$ | $y_\\star$|\n|:---:|:---:|:---:|:---:|:---:|\n| 0.3 | | | | -1 |\n|-0.2 | | | | -1 |\n| 1.5 | | | | 1 |\n|-4.3 | | | | 1 |\n\nIn what sense is the exponential loss more informative than the misclassification loss, and how can that information be used when training a classifier?"}, {"cell_type": "markdown", "metadata": {"tags": []}, "source": "## 9.2 AdaBoost for spam classification\nConsider the very same setting data set for the data set email.csv as in problem 8.2, but now use AdaBoost instead. AdaBoost is available in sklearn.ensemble as the command AdaBoostClassifier(). What test error do you achieve?"}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": []}, "outputs": [], "source": "# url = 'data/email.csv'\nurl = 'https://uu-sml.github.io/course-sml-public/data/email.csv'\nemail = pd.read_csv(url)"}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": []}, "outputs": [], "source": ""}, {"cell_type": "markdown", "metadata": {"tags": []}, "source": "## 9.3 Exploring the AdaBoost algorithm\nThe script below illustrates the AdaBoost algorithm in the same way as was done in lecture 7. Familiarize yourself with the code and compare it to the pseudocode given in the course literature (Chapter 7 in the lecture notes). Explore what happens if you make changes in the input data and the number of trees (stumps), B, used!"}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": []}, "outputs": [], "source": "X1 = np.array([0.2358,0.1252,0.4278,0.6398,0.6767,0.8733,0.3648,0.6336,0.3433,0.8410])\nX2 = np.array([0.1761,0.4465,0.7539,0.9037,0.7111,0.8414,0.6060,0.5107,0.1430,0.1994])\nX = np.column_stack([X1,X2])\ny = np.array([1,1,1,1,1,-1,-1,-1,-1,-1])\nn = len(y)\n# note: the class labels are 1 and -1\n\nfig, ax = plt.subplots()\nax.set_xlim((0, 1))\nax.set_ylim((0, 1))\n\n# plot the data\nax.plot(X1[y==1], X2[y==1], 'x', color='red')\nax.plot(X1[y==-1], X2[y==-1], 'o', color='blue')\n\nax.set_xlabel('$X_1$')\nax.set_ylabel('$X_2$')\n\n\n# define some variables we will fill out in the loop below\nstumps = []\nalpha = []\nw = np.repeat(1/n, n) # the boosting weights, start with equal weights\nB = 3\n\n# learn the boosting classifier\nfor b in range(B):\n # prepare plots\n fig, ax = plt.subplots()\n ax.set_xlim((0, 1))\n ax.set_ylim((0, 1))\n # use a decision stump (tree with depth 1) as base classifier\n model = tree.DecisionTreeClassifier(max_depth=1)\n model.fit(X, y, sample_weight=w)\n stumps.append(model) # Saving the model in each step\n # plot the decision boundary\n split_variable = model.tree_.feature\n split_value = model.tree_.threshold\n if split_variable == 0:\n x1 = [0, split_value, split_value, 0]\n x2 = [0, 0, 1, 1]\n ax.fill(x1, x2, 'lightsalmon', alpha=0.5)\n x1 = [split_value, 1, 1, split_value]\n ax.fill(x1, x2, 'skyblue', alpha=0.5)\n elif split_variable == 1:\n x1 = [0, 1, 1, 0]\n x2 = [split_value, split_value, 1, 1]\n ax.fill(x1, x2, 'lightsalmon', alpha=0.5)\n x2 = [0, 0, split_value, split_value]\n ax.fill(x1, x2, 'skyblue', alpha=0.5)\n \n ax.set_xlabel('$x_1$')\n ax.set_ylabel('$x_2$')\n \n # plot the data\n ax.scatter(X1[y==1], X2[y==1], s=50, facecolors='red')\n ax.scatter(X1[y==-1], X2[y==-1], s=50, facecolors='blue')\n \n # compute and plot the boosting weights for the next iteration\n correct = model.predict(X) == y\n ax.scatter(X1[~correct], X2[~correct], s=300, facecolors='none', edgecolors='k')\n\n W = np.sum(w) \n We = np.sum(w[~correct]) # Sum of incorrect weights\n\n em = We/W\n a = 1/2*np.log((1-em)/em) # compute the alpha weights\n alpha.append(a)\n\n w[correct] = w[correct] * np.exp(-a) # Omega weights\n w[~correct] = w[~correct] * np.exp(a)\n\n \n# Using the boosting model to predict test inputs\ndef boost_predict(X):\n pred = []\n for j in range(len(X)):\n predictions = np.empty(B)\n for i in range(B):\n predictions[i] = stumps[i].predict(X[j,:].reshape(1,-1))\n C = np.sum(np.array(alpha)[predictions==1]) - np.sum(np.array(alpha)[predictions==-1])\n pred.append(np.sign(C))\n return pred\n\n\n# predict all pixels in the plot, i.e., plot the decision boundary\nfig, ax = plt.subplots()\nax.set_xlim((0, 1))\nax.set_ylim((0, 1))\nax.set_xlabel('$x_1$')\nax.set_ylabel('$x_2$')\nres = 0.1 # resolution of the squares\nxs1 = np.arange(0, 1.02, 0.03)\nxs2 = np.arange(0, 1.02, 0.03)\nx1, x2 = np.meshgrid(xs1, xs2) \nX_all = np.column_stack([x1.flatten(),x2.flatten()])\nprediction = boost_predict(X_all)\n\ncolors = np.where(np.array(prediction)==1,'lightsalmon', 'skyblue')\nax.scatter(x1.flatten(), x2.flatten(), s = 90, marker='s', c=colors)\nax.scatter(X1[y==1], X2[y==1], s=50, facecolors='red')\nax.scatter(X1[y==-1], X2[y==-1], s=50, facecolors='blue')\n\nplt.show()"}, {"cell_type": "markdown", "metadata": {"tags": []}, "source": "## 9.4 Deriving the AdaBoost weights\nThe AdaBoost classifier can be written as \n\\begin{equation}\n \\widehat{y}_{\\text{boost}}(\\mathbf{x}) = \\text{sign}\\left( c^{(B)}(\\mathbf{x}) \\right)\n\\end{equation}\nwhere the functions $c^{(1)}(\\mathbf{x}), \\, \\dots, c^{(B)}(\\mathbf{x})$ are constructed sequentially as\n\\begin{equation}\n c^{(b)}(\\mathbf{x}) = c^{(b-1)}(\\mathbf{x}) + \\alpha^{(b)} \\widehat y^{(b)}(\\mathbf{x}),\n\\end{equation}\ninitialized with $c^{(0)}(x) \\equiv 0$. The $b$th ensemble member $\\widehat y^{(b)}(x)$ is found by applying the chosen base classifier to a weighted version of the training data.\nOnce this has been found, we also need to compute the corresponding\n\"confidence\" coefficient $\\alpha^{(b)}$. This is done by minimizing the weighted exponential loss of the training data,\n\\begin{align}\n \\alpha^{(b)} = \\arg\\min_{\\alpha}\\left\\{\\sum_{i=1}^n w_i^{(b)} \\exp \\left(- \\alpha y_i \\widehat y^{(b)}(\\mathbf{x}_i) \\right) \\right\\}\n\\end{align}\nwhere $w_i^{(b)} = \\exp\\left(-y_i c^{(b-1)}(\\mathbf{x}_i)\\right)$.\n\t\t\nShow that the optimal solution is given by\n\\begin{equation}\n \\alpha^{(b)} = \\frac{1}{2}\\ln\\left( \\frac{1 - E_{\\text{train}}^{(b)}}{E_{\\text{train}}^{(b)}} \\right)\n\\end{equation}\nwhere \n\\begin{equation}\n E_{\\text{train}}^{(b)} = \\sum_{i=1}^n \\frac{w_i^{(b)}}{\\sum_{j=1}^n w_{j}^{(b)}} \\mathbb{I}(y_i \\neq \\widehat{y}^{(b)}(\\mathbf{x}_i)).\n\\end{equation}\n\t\t\n_Hint 1: We use class labels $-1$ and $1$, i.e. $y_i \\in \\{-1, 1\\}$ and $\\widehat{y}^{(b)}(\\mathbf{x}_i) \\in \\{-1, 1\\}$. Using this fact, divide the sum in the objective function in the equation for $\\alpha^{(b)}$ into one sum ranging over all correctly classified training data points and one sum ranging over all misclassified training data points._\n\t\t\n_Hint 2: You are allowed to use the fact that the objective function in the equation for $\\alpha^{(b)}$ has a single stationary point corresponding to the global minima._"}, {"cell_type": "markdown", "metadata": {"tags": []}, "source": "## 9.5 Gradient boosting\nExplore gradient boosting by using GradientBoostingClassifier() on the spam data set email.csv"}, {"cell_type": "code", "execution_count": null, "metadata": {"tags": []}, "outputs": [], "source": ""}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4"}}, "nbformat": 4, "nbformat_minor": 2}