{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# LFA - Laboratoire 0\n", "## Introduction to python\n", "\n", " Modified from lab-1 MLBD, courtesy of Julien Rebetez, Aitana Lebrand. \n", "- Professeur: Carlos Peña (carlos.pena@heig-vd.ch)\n", "- Assistant: Héctor Satizábal (hector-fabio.satizaba-mejia@heig-vd.ch)\n", "\n", "Date: Printemps 2016" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Throughout the laboratories, questions that you should answer are highlighted as follow :\n", ">
0. This is a question.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Introduction to IPython notebooks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this laboratory you are going to learn how to perform interactive computing using **IPython**. **IPython** is an interactive shell for python that has more functionnalities than the basic one. This guide does not start from the basics of the general purpose language **python**. If you do not know this language, it is recommended to follow a **python** tutorial in order to learn the basic concepts and commands. You can have a look at the [official python tutorial](https://docs.python.org/2/tutorial/) or [Google's python tutorial](https://developers.google.com/edu/python/) for example.\n", "\n", "Note that for this course, we will use the Python 2.7.X series. (Important: do not use Python 3.5.X as it will require performing many adaptations to the current version of the laboratories)\n", "\n", "You will use a browser-based notebook to interactively explore a dataset by:\n", "- Reading raw data from ascii files\n", "- Reading typed data (data frames) from ascii files\n", "- Selecting specific columns and/or rows from a dataset\n", "- Filtering datasets\n", "- Plotting the information in the dataset (e.g., scatter-plot, boxplot, histogram)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using the notebooks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You are in an IPython notebook right now. An IPython notebook is a web interface to a Python interpreter.\n", "\n", "A notebook is made of cells. Each cell has a type which defines what happens when it is run. \n", "\n", "- Markdown cells allow you to write [Markdown](http://daringfireball.net/projects/markdown/) text in them. They are just displayed as HTML when run.\n", "- Code cells contain python code. When the cell is run, the code is sent to the python interpreter, executed and you get the result in the cell output.\n", "- Various header cells that allow you to structure your document.\n", "\n", "You can change the type of a cell using the drop-down menu in the toolbar.\n", "\n", "You can click (for Code cells) or double-click (for headers and markdown cells) on cells to edit their content. You can then use keyboard shortcuts to run them :\n", "\n", "- Ctrl + Enter : run in place\n", "- Shift + Enter : run and move to next cell\n", "- Alt + Enter : run and insert new cell after" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4\n" ] } ], "source": [ "# This is a code cell containing python code !\n", "print(2+2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The python interpreter that executes the code you write in the notebook is called a *Kernel*. You can restart the kernel (the interpreter) using the *Kernel* menu. This is useful if you want to delete all your variables." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ipython has also \"magic\" functions that start with % . They allow you to do a lot of useful things with your ipython environment :\n", "\n", "http://nbviewer.ipython.org/github/ipython/ipython/blob/1.x/examples/notebooks/Cell%20Magics.ipynb\n", "\n", "The %who magic gives you a list of the defined python variables. object? can be used to get documentation about an object :" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a\t \n" ] } ], "source": [ "a = 2\n", "%who" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n", "This is the traditional python help() function :\n", "\n", "Help on function my_documented_function in module __main__:\n", "\n", "my_documented_function(a)\n", " This is a revolutionary function that returns a + 1\n", "\n", "End of help() command output \n", "\n" ] } ], "source": [ "def my_documented_function(a):\n", " \"\"\"This is a revolutionary function that returns a + 1\"\"\"\n", " return a + 1\n", "\n", "print(my_documented_function(2))\n", "print('This is the traditional python help() function :\\n')\n", "help(my_documented_function)\n", "print('End of help() command output \\n')\n", "\n", "# We can access the same info with just ? (note that you have to run this cell to view the effect).\n", "my_documented_function?\n", "# The output may be provided in a separate pane in your browser" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Scientific computing with NumPy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python has a number of packages (libraries) dedicated to scientific programming :\n", "\n", "The foundation is [numpy](http://www.numpy.org/) which provides a N-dimensional array implementation with a nice indexing syntax (similar to MATLAB).\n", "\n", "Then comes [scipy](http://www.scipy.org/) which contains a number of algorithms (signal processing, distance computation, etc...) built on top of numpy.\n", "\n", "[matplotlib](http://matplotlib.org/) is a library to create 2D plots.\n", "\n", "[pandas](http://pandas.pydata.org/) provides a DataFrame implementation, which is a layer on top of numpy arrays that makes some things (handling missing values, date indexing) easier. Heavily inspired by the [R](http://www.r-project.org/) statistical computing language.\n", "\n", "[scikit-learn](http://scikit-learn.org/stable/) is a machine learning library that contains implementations of many of the most popular machine learning algorithms.\n", "\n", "[theano](http://deeplearning.net/software/theano/) allows you to write programs that are compiled and run on a GPU.\n", "\n", "Finally, this is not a python package, but [stack overflow](http://stackoverflow.com/) is a really good question-and-answers site where you can probably find answers to the most common problems you'll have :-)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this course, we strongly suggest that you install and use [anaconda](https://store.continuum.io/cshop/anaconda/). It is a \"python distribution\" that comes with a package manager (conda) and all of the scientific packages listed above (and many others) pre-installed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Quick numpy introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numpy allows you to define [multidimensionnal arrays](http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html) (recommended reading)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2, 3)\n" ] }, { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6]])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Makes the numpy function available as np.4.1. By looking at this scatter plot, do you feel that building a classifier of iris class based on petal length and petal width will be an easy task? (Justify briefly)
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Votre réponse..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Classification" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To build and evaluate a machine learning model, we need to split our data into training and testing sets. Scikit learn has a [cross_validation](http://scikit-learn.org/stable/modules/cross_validation.html#cross-validation) module that helps with this task." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "train shape : (90,)\n", "test shape : (60,)\n" ] } ], "source": [ "from sklearn import cross_validation\n", "# train and test are indices arrays containing the indices of train/test samples\n", "X_train, X_test, y_train, y_test = cross_validation.train_test_split(\n", " X, Y, test_size=0.4, random_state=0 # we fix random state for reproducibility)\n", ")\n", "\n", "print(\"train shape : \", train.shape)\n", "print(\"test shape : \", test.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll use the [k Nearest Neighbor](http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) to classify the iris flower dataset. This classifier will classify a new sample by assigning it the class of its k-nearest neighbors.\n", "\n", "The algorithm therefore consists of the following three steps:\n", "- compute the distance between a sample in the test set and all the samples in the training set\n", "- find the k-nearest neighbors in the training set\n", "- assign a class to the sample in the test set by taking the majority class among the k-nearest neighbors." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn import neighbors\n", "\n", "n_neighbors = 5 # k=5, i.e. 5-nearest neighbors\n", "\n", "clf = neighbors.KNeighborsClassifier(n_neighbors) \n", "clf.fit(X_train, y_train)\n", "\n", "# predict class of entries in the test set\n", "pred_kNN = clf.predict(X_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to visualize the performance of the classifier, represent the confusion matrix:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVgAAAEpCAYAAADWEjokAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xe8XFW9/vHPk4QSQu/FQK406QQklFySIMqNoCDSBK4C\nIiBNr8C9NsDQFH8oIl0pEpWuoIAgBJVeEwg9dBBCCL0llJTv74+1ThgmM3PmzDmTmX3O8+a1X5m9\n99p7rRmS76xZexVFBGZm1vP6tboAZma9lQOsmVmTOMCamTWJA6yZWZM4wJqZNYkDrJlZkzjAWtNI\nGijpaklvSbq0G/fZU9L1PVm2VpG0paRJrS6HzRtyP1iTtAdwGLAm8C4wETghIm7v5n2/DhwCbB4R\ns7td0DYnaTawWkQ80+qyWHtwDbaPk3QY8CvgeGBZYDBwBrB9D9x+FeCJvhBcS6jqCWnAvCyItYGI\n8NZHN2AxUo11pxppFgBOASbn7VfA/PncKOBFUu13KvASsHc+dwzwIfBRzuObwBjgDyX3HgLMBvrl\n/b2Bp4F3gGeAPUqO31py3RbAvcBbwD2kGnLHuZuAY4Hb8n2uB5aq8t46yv+/wCu5/F8BtgWeAF4H\nflCSfhhwJ/BmTnsaMF8+d0t+L+/l97tLyf3/D5gCjM3HXsjXrJrzGJr3VwReBUa0+u+Gt57ZXIPt\n2zYHFgSurJHmx6TAskHehgFHlpxfDliUFBz2Bc6QtFhE/AT4KXBJRCwSEecDVdujJA0Cfg2MjohF\nc9kmVki3JPA3UtBfEjgZ+JukJUqS7U4KyssC8wNH1Hh/y5G+RFYAjgbOBfYEhgJbAkdLWiWnnQl8\nF1gql29r4CCAiBiR06yf3+/lJfdfAlgZOKA044h4Gvg+8EdJA4HfAb+LiFtqlNcKxAG2b1sKeC1q\n/4TfAzg2Il6LiNdINdOvl5yfkc/PiojrSDW4NfM58cmfzFV/PmezgfUkDYyIqRHxaIU02wGPR8SF\nETE7Ii4BJvFxk0aQgtRTEfEBcBmwYY08Z5Dam2cBl5KC9ikRMS3n/2jH9RFxX0Tck/N9HvgtMLKO\n9/STiJiRy/MJEXEu8BSpJr4c6QvNegkH2L7tdWBpSbX+HqwIPF+y/+98bM49ygL0dGDhrhYkIqYB\nuwHfBl6SdI2kNSskXTGXodTzZWV6ueT1+52U5/WIiJK0kJo7Sq8fBCBpjVyuKZLeBk4gfUnV8mpE\nfNRJmnOBdYDTImJGJ2mtQBxg+7Y7Se2kO9ZI8xKprbTDyvlYI94DFirZX770ZETcEBHb5OOTgHMq\n3GMy6eFZqVXy8WY7i1SjXS0iFiPVNjv7N1Szm46khUnNHecCx5Q1dVjBOcD2YRHxNqnd8QxJO0ha\nSNJ8kr4o6ec52cXAkZKWlrR0Tv+HBrOcCIyQNFjSYsAPO05IWjaXYRDpZ/s0YFaFe1wHrCFpd0kD\nJO0GfAa4piRNZ00RjVqY9ABruqTPAAeWnZ9KenDVFb8G7omI/Ulty2d3u5TWNhxg+7iIOJnUC+BI\n0pP0f5Me3HQ8+DoeGA88mLfx+dicW9S6fen5iLiR1M75IKkXwNUl5/sB3yPVRF8nPWA6sPw+EfE6\n8CXgcOA10gOsL0XEG1XKFHRexlr7pY4gtUm/Q2p/vaQs/RhgrKQ3Je1cI+8AkLQDsA0fv8/DgI0k\n7V6jDFYgHmhgZtYkrsGamTWJA6yZWZM4wJqZNYkDrJlZk3jyiS6Q5CeCZg2KiB7rPtfVf4uleUsa\nDPyeNJQ6gN9GxKmSTiL1UPmINCfGPrkrY3nez5F6kswCZkTEsKrldC+C+kmK9Y++sdXF4OWbxrL8\nqL1aXQwA7j5q61YXAYDjjx3DkUePaXUx2k67fC4D51OPB9gFhx5aV9oP7j+tPMAuDywfERPzQI8J\npEl+PgX8IyJmSzoRICJ+UCHvZ4GNy7oGVuQmAjMrJqm+rUxEvBwRE/Pr94DHgBUjYlzJsO+7SQG3\nau71FNEB1syKSf3q22rdQhpCmjnt7rJT3wSurXJZADdKGi9pv1r3dxtsAS08ZINWF6HtjBg5qtVF\naEu9+nPp17/i4VnvvMDsd1/o9PLcPPAn4Lu5Jttx/MfARxFxUZVLh0fEFEnLAOMkTYqIWysldIAt\noIWH1Jp9r2/q1YGkG3r151Lh5z9A/8VWpv9iK8/Zn/XSnRUu1XzAn4E/RsRfSo7vTZpwverDhYiY\nkv98VdKVpDmSKwZYNxGYWTE12EQgScB5wKMRcUrJ8dGk1S12qDR3b06zkKRF8utBpLkkHqpWRNdg\nzayYqtRg6zAc+G/gQUn352M/Ak4lrYAxLsVg7oyIgyStCJwTEduRptK8Ip8fAFwYETdUy8gB1syK\nqUobbGci4jYq/3pfvUr6l0graRBpxeC62+gcYM2smDrpIdAOHGDNrJgabyKYZxxgzayYXIM1M2uS\n/o21wc5LDrBmVkyuwZqZNYnbYM3MmsQ1WDOzJnEN1sysSRocaDAvOcCaWTG5icDMrEncRGBm1iSu\nwZqZNUkB2mDb/yvAzKySxueDHSzpX5IekfSwpO/k40tKGifpCUk3SFq8YrbSaEmTJD0p6fu1iugA\na2bF1OCih8AM4HsRsQ6wGXCwpLWAHwDjImIN4B95vyxL9QdOB0YDawO752srcoA1s2JqsAZbZVXZ\nlYDtgbE52VjSUt7lhgFPRcRzETEDuATYoVoR3QZrZsXUA22wZavKLhcRU/OpqcByFS5ZCShdUfFF\nYNNq93eANbNi6mY3rbyq7J9Jq8q+q5L7RURIigqXVTpWlQOsmRWSqgTYWa9OYvarj3d2bceqsn8o\nWVV2qqTlI+JlSSsAr1S4dDIwuGR/MKkWW5EDrJkVUrUAO2DZtWDZj587zZx0Vfl1FVeVBa4C9gJ+\nnv/8C3MbD6yemxZeAnYDdq9WRj/kMrNiUp3b3DpWld1K0v15Gw2cCHxB0hPA5/I+klaU9DeAiJgJ\nHAJcDzwKXBoRj1UrYq+pwUraC7ghIqa0uixm1nz9+jVWP6yxqizA5yukn7OqbN6/Driunrx6Uw12\nb2DFVhfCzOYNSXVtrdTWAVbSIEl/kzRR0kOSdpW0saSbJI2X9HdJy0vaGfgscKGk+yQtKGnr/PpB\nSedJmj/f88Q8guMBSSflY1+WdFdOP07Ssq1832bWuSIE2HZvIhgNTI6I7QAkLUqqmm8fEa9L2g04\nISL2lXQwcHhE3CdpQeB3wOci4ilJY4EDJf0B+EpEfKbkfgC3RsRm+di3gP8DjpiXb9TMuqj9J9Nq\n+wD7IPALSScC1wBvAesCN+Zvpv6kJ3kdOj7yNYFnI+KpvD8WOJg0xO0DSefl+12Tzw+WdBmwPDA/\n8Gy1Ar1809g5rxcesgELD9mwO+/PrFe65eabuOXmm5qaR6NtsPNSWwfYiHhS0lBSA/PxwL+ARyJi\ni2qXVDmufL9ZkoYBWwM7k54Gbg2cBvwiIq6RNBIYU61My4/aq5G3YtanjBg5ihEjR83ZP+G4Y3o8\nj1b//K9HWwfY3Nn3zYi4UNLbwIHA0pI2i4i7cmfh1SPiUeBdoOMn/+PAEEmrRsTTwNeBmyQNAgZF\nxHWS7gCezukX5eOa8N7z5t2ZWXc4wHbfesBJkmYDH5EC7CzgVEmLkcr/K1J/tAuAsyVNB7YA9gEu\nlzQAuAc4G1ga+EtuoxXwvZzPmJz2TeCfwCrz5N2ZWePaP762d4CNiBuAGyqcGlkh7RXAFSWH/gls\nVJZsChUmZoiIq0ijOMysINwGa2bWJG4iMDNrlvaPrw6wZlZMrsGamTWJ22DNzJrENVgzs2Zp//jq\nAGtmxdRoDVbS+aTRoa9ExHr52CWkIfYAiwNvRcTQCtc+B7xD6o8/IyKG1crLAdbMCqkbTQS/Iw2P\n/33HgYj4Wsl9f0Ga96SSAEZFxBv1ZOQAa2aFpH6NBdiIuDUv+TL3PVPU3hXYqlbW9ebV/o/hzMwq\naNJ8sFsCU/McJpUEaTa/8ZL26+xmrsGaWSFVC54fvPgQH0x+qNHb7g5cVOP88IiYImkZYJykSRFx\na7XEDrBmVkjVAuzAweszcPD6c/bfueeSeu83ANiRuecwmaNjzb+IeFXSlcAwoGqAdROBmRWS+qmu\nrQs+DzyWFzmcOz9pIUmL5NeDgG2AmlVlB1gzK6RG22AlXQzcAawh6QVJ++RTuwEXl6Wds2Q3acWT\nWyVNBO4Grskz/lXlJgIzK6RGu2lFxO5Vju9T4dicJbsj4hmgS2tEOcCaWSEVYKSsA6yZFVO/BvvB\nzksOsGZWSJ7sxcysSQoQXx1gzayY3ERgZtYkrsGamTWJa7BmZk3ih1xmZk3iAGtm1iQFiK8OsGZW\nTG6DNTNrEjcRmJk1SQHiq6crNLNi6sZ0hedLmirpoZJjYyS9KOn+vI2ukudoSZMkPSnp+52V0TXY\nLrr7qK1bXYS2ssQmh7S6CG3nzXtPb3UR+oRutMHOtaosaa2tkyPi5GoXSeoPnE6amHsycK+kqyLi\nsaplbLSEZmatJNW3lctraL1Z6ZadZDkMeCoinouIGcAlwA61LnCANbNCasKqsodKekDSeZIWr3B+\nJeCFkv0X87GqHGDNrJAarcFWcRbwH6QVC6YAv6yQJrpaRrfBmlkhVaudvv30/bz99P1duldEvFJy\n33OBqyskmwwMLtkfTKrFVuUAa2aFVO0h1xKrb8QSq3+88vaLN17Q6b0krdCxJDdp6e5Kq8WOB1aX\nNAR4ibRIYsX1vTo4wJpZITU60CCvKjsSWFrSC8BPgFGSNiQ1AzwLHJDTrgicExHbRcRMSYcA1wP9\ngfNq9SAAB1gzK6hGBxpUWVX2/Cpp56wqm/evA66rNy8HWDMrJA+VNTNrEk/2YmbWJAWowDrAmlkx\n9StAhHWANbNCKkB8rR5gJZ1W47qIiO80oTxmZnXpX/A22Al8PDSs451Eft3lIWNmZj2p0L0IIuKC\n0n1JgyJiWtNLZGZWhwLE184ne5G0haRHgUl5f0NJZza9ZGZmNajO/1qpntm0TgFGA68BRMRE0jAz\nM7OW6d9PdW2tVFcvgoj4d1l7x8zmFMfMrD5FaCKoJ8D+W9JwAEnzA98Bak5wYGbWbEXoB1tPE8GB\nwMGkmbsnA0PzvplZy/TwhNtN0WkNNiJeBfaYB2UxM6tbEbpp1dOLYFVJV0t6TdKrkv4q6dPzonBm\nZtU0+pCryrLdJ0l6LK/JdYWkxSrlKek5SQ/mpb3v6ayM9TQRXARcBqwArAhcDlxcx3VmZk2jOrcK\nfkfqGVXqBmCdiNgAeAL4YZVsAxgVEUMjYlhnZawnwA6MiD9ExIy8/RFYsI7rzMyaptFVZSst2x0R\n4yJidt69G/hUrazrLWPVACtpSUlLAddJ+qGkIXn7Pl2Y0dvMrBn6qb6tAd8Erq1yLoAbJY2XtF9n\nN6r1kOs+PjnnwP75z465CH5QR0HNzJqiGRNuS/ox8FFEXFQlyfCImCJpGWCcpEm5RlxRrbkIhnSv\nqGZmzVOtF8HLj97L1MfGN3K/vYFtga2rpelYeTYiXpV0JTAM6HqALct4XWBtStpeI+L3dZXazKwJ\nqlVgV1xnE1ZcZ5M5+w9ecXan95I0GvhfYGREfFAlzUJA/4h4V9IgYBvgmFr37TTAShpDmntgHeBv\nwBeB2wAHWDNrmR5etvuHwPykn/0Ad0bEQaXLdgPLA1fk8wOACyPihlp51VOD3RnYALgvIvaRtBxw\nYUPvzMysh/RvMMA2umx3RDwDbNiVvOrppvV+RMwCZubOt68Ag7uSSU+RdIykqu0jNa4bJenqZpTJ\nzFqjVwyVBe6VtARwDjAemAbc0awCKde/I2KuVRMi4ifNyresDAMiwjOGmbWxIgyVrWcugoPyy7Ml\nXQ8sGhEPdHadpJ8BL0TEmXl/DPAuqda8C7AAcGVEjJE0BLgeuAvYGNhW0rH5dQDnRcSvJV0AXB0R\nf5a0CWmu2kHAh8DngFnAWfm6mcBhEXFTWbmWJP0c+A9gOrB/RDyUy7dqPv48sGdn79HMWqcA8bXm\noocdwa3SuY0i4r5O7n0pKQB2rH6wC/BzUj+yYZL6AX+VtCXwArAa8PWIuCfnvWJErJfzWzTfI4DI\n0yZeAuwaERMkLQx8APwPMCsi1pe0JnCDpDXKynUMMCEiviJpK9LDuqH53GeA/4yIDzt5b2bWYkWY\nrrBWDfaX1F7ccKtaN46IiZKWlbQCsCxpaNp6wDaS7s/JBpEC6wvA8xHRMXnC08CnJZ1K6rlQ+qRO\nwJrAlIiYkPN6DyDPW3tqPva4pOeB8gA7HPhqTvMvSUtJWiS/16scXM2KoRkDDXparYEGo3rg/peT\neiEsT6rRrgL8LCJ+W5ooNxHMWVAxIt6StD5pQoZvA7sC+5YWr0ae5Z96pbTV/s9Mr3FfAI4/dsyc\n1yNGjmLEyFGdXWLW59xy803ccvNNTc2jnif0rVbXQINuuBQ4F1gKGAGsDxwn6cKImCZpJeCj8ovy\nHAgzIuIKSU/wyT63ATwOrCDpsxExPtdAp5NGVOwJ/Cs3Dayc0y5dcn1HmuMljQJezR2H6/o6PPLo\nMfW/e7M+qrzyccJxNfvjN6RXPOTqjoh4NLePvhgRU0mdeNcC7swfzrvAf5PbVksuXQn4XW6nhbJ5\nDyJihqTdgNMkDSQF18+T2nvPkvQg6SHXXjlt6f3HAOdLeoBUa96r47bUrhmbWRspQAsBqtAbyqqQ\nFO/P8OdVaolNDml1EdrOm/ee3uoitJ2B84mI6LGQKCkOu2pSXWlP3v4zPZp3V9SzokE/SV+XdHTe\nX1lSpxPNmpk1UxOnK+y5MtaR5kxgcz5el+s9Pu56ZWbWEr1lJNemETG0o2tVRLwhab4ml8vMrKai\n94Pt8JGk/h07eaLZ2TXSm5k1Xf/2j691BdjTgCuBZSX9lNSv9cimlsrMrBNFqMF22gabFzn8PvAz\n4CVgh4i4rNkFMzOrpdE22CrLdi8paZykJyTdIGnxynlqtKRJkp7M6xPWVE8vgpVJ/UWvztu0fMzM\nrGW60Yug0rLdPwDGRcQawD+osOZgbio9PV+7NrB77tdfVT1NBNfycQf8BUmzTT1OWuHAzKwl+jfY\nBysibs3D80ttT1rlAGAscBNzB9lhwFMR8RyApEuAHYDHquVVz3SF65buS9oIOLiz68zMmqmH+7gu\nl0ebAkwFlquQZiXSxFQdXgQ2rXXTLg+VjYj7JNW8qZlZs6nKnE3PTLyLZx+4u+H7RkTk4fVznerq\nvepZ9PDwkt1+wEbA5K5mZGbWk6rVYFcbuhmrDd1szv4//3BaPbebKmn5iHg5T7H6SoU0k/nkclmD\nSbXY6mWsI+OFS7b5gWtI7Q5mZi3Tw0Nlr+LjiZ/2Av5SIc14YHVJQ/Kk/7vl66qqWYPNT80WjYjD\na6UzM5vXGn3IVWHZ7qOBE4HLJO0LPEeag5rSZbsjYqakQ0jLW/UnLWVV9QEX1F4yZkC+4XBJqrQI\noZlZqzQ6zqDKst2QpjwtTztn2e68fx1wXb151arB3kNqb51IWjvrcj6e8T8i4op6MzEz62lFGMlV\nK8B2lH5B4HXSqq2lHGDNrGVaPRVhPWoF2GUkHQY8VCONmVlL9C94DbY/sMi8KoiZWVcUIL7WDLAv\nR0TPr1RmZtYDit5EYGbWtor+kGuuLgtmZu2i0X6w81LVABsRr8/LgpiZdUUBKrBuIjCzYqpnnH+r\nOcCaWSGpAFVYB1gzK6T2D68OsGZWUEUfaGBm1rYKEF8L0U5sZjYXSXVtFa5bU9L9Jdvbkr5TlmZU\nPt6R5shGyugarJkVUqO1w4h4HBgKIKkfaaWCKyskvTkitm8wG8AB1swKqodGcn0eeDoiXqhwrtsZ\nOMBatzxyw0mtLkLbWfXQSpUh62k91E3ra8BFFY4HsIWkB0g13CMi4tGu3twB1swKqbsPkPK6Wl8G\nvl/h9H3A4IiYLumLpDW61uhqHg6wZlZI1WqwD997Bw+Pv6OeW3wRmBARr5afiIh3S15fJ+lMSUtG\nxBtdKaMDrJkVUrW5XtYftgXrD9tizv5lZ/+y2i12By6udELScsArERGShgHqanAFB1gzK6h+3XgG\nJWkQ6QHXfiXHDgCIiN8AOwMHSppJWovwa43k4wBrZoXUnWdcETENWLrs2G9KXp8BnNF4DokDrJkV\nkgowG4EDrJkVUhGGyjrAmlkhebIXM7MmKUB8dYA1s2JyG6yZWZMUYM1DB1gzK6aiL9ttZta23ERg\nZtYkbiIwM2sS12DNzJrENVgzsybxQy4zsyZp//DqAGtmRVWACOsAa2aF1J0mAknPAe8As4AZETGs\nQppTSaseTAf2joj7u5qPA6yZFVI3K7ABjKq2SoGkbYHVImJ1SZsCZwGbdTWT7q4bZmbWGqpzq32H\narYHxgJExN3A4nkZmS5xgDWzQlKd/1URwI2Sxkvar8L5lYAXSvZfBD7V1TK6icDMCqlaE+z4u25l\nwl23dXb58IiYImkZYJykSRFxa3kWZfvR1TI6wJpZIVULsJtsviWbbL7lnP1zfn3iXGkiYkr+81VJ\nVwLDgNIAOxkYXLL/qXysS9xEYGaF1GgTgaSFJC2SXw8CtgEeKkt2FfCNnGYz4K2ImNrVMroGa2aF\n1I1eWssBVyrdYABwYUTcULpsd0RcK2lbSU8B04B9GsnIAdbMCqnR+BoRzwIbVjj+m7L9QxrMYg4H\nWDMrJBVgLoK2aYOVtIKkyxu47m+SFu0kzTGStm68dGbWbqT6tlZqmxpsfqq3S/lxSQMiYmaN67ar\n494/6WbxzKzNtH/9tUU1WEk/k3RQyf4YSYdLeijv7y3pKkn/IPVRGyjpMkmPSLpC0l2SNsppn5O0\npKQhkh6T9FtJD0u6XtKCOc0FknbKrzeRdLukiZLulrRwvvYWSRPytnkLPhYz64ruj+RqulY1EVwK\n7Fqyvwtwd1maocBOEbEVcDDwekSsAxwFbFySrrTz72rA6RGxLvAWsFNJmpA0P3AJ8J2I2BDYGngf\nmAp8ISI2Br4GnNr9t2hmzdRPqmtrpZY0EUTEREnLSloBWBZ4k08OSwMYFxFv5dfDgVPytY9IerDK\nrZ+NiI5zE4AhJecErAlMiYgJ+V7vAeTAe7qkDUiz66zRnfdnZs1XhCaCVrbBXg7sDCxPqlWWm1a2\nX8/n+WHJ61nAwLLz1Ya6fY8UeL8uqT/wQbUMjj92zJzXI0aOYsTIUXUUy6xv+XDyw3z40sPNzaQA\nEbaVAfZS4FxgKWAEcwfDUreTmhRukrQ2sF4D+QXwOLCCpM9GxPg8mmM6sChpMgdIozf6V7vJkUeP\naSBrs75lgZXWZYGV1p2z/96Ey3o8jyIsetiybloR8SiwMPBiyRC0KPmztLZ5JrCMpEeA44BHgLfL\nril/Pdd+RMwAdgNOkzQRuB5YIN9/r3xsTeC9brw1M5sHitBNSxFdniBmnpPUD5gvIj6UtCowDlij\nVvetJpUj3p/R/p/XvPTSm++3ughtZ8uj/97qIrSdl87+KhHRY+FOUjz+cnkrYmVrLj+oR/Puirbp\nB9uJQcA/Jc1Hank5cF4HVzNrL0VoIihEgI2Id4FNWl0OM2sfrf75X49CBFgzs3IFiK/tMxeBmVlX\nSKprq3DdYEn/yiNDH5b0nQppRkl6W9L9eTuykTK6BmtmhdSNJoIZwPfygKeFgQmSxkXEY2Xpbo6I\n7btTRtdgzayQGp2KICJejoiJ+fV7wGPAilWy6BYHWDMrph6Y7EXSENK8J+VzoQSwhaQHJF2bBzh1\nmZsIzKyQujuRS24e+BPw3Y55SUrcBwyOiOmSvgj8hQbmKHGANbNCqhZe77ztZu68/Zba16Y+9X8G\n/hgRfyk/n7uGdry+TtKZkpaMiDe6UkYHWDMrpGoV2C22HMkWW46cs3/K/zuh7DoJOA94NCJOqXxv\nLQe8EhEhaRhp1GuXgis4wJpZYTXcRDAc+G/gQUn352M/AlaGOYsf7gwcKGkmaUKorzWSkQOsmRVS\nvwbja0TcRicP+CPiDOCMxnL4mAOsmRWSh8qamTWJJ3sxM2uW9o+vDrBmVkwFiK8OsGZWTK1eMbYe\nDrBmVkztH18dYM2smAoQXx1gzayYCtBC4ABrZsVUhDZYT1doZtYkrsGaWSEVoALrAGtmxeSRXGZm\nTdLoZC/zkgOsmRVTAQKsH3IV0C0339TqIrSduzqZwb6v+nDyw60uQtOozv8qXiuNljRJ0pOSvl8l\nzan5/AOShjZSRgfYAnKAndvdDrAVffhSLw6wqm+b+zr1B04HRgNrA7tLWqsszbbAahGxOrA/cFYj\nZXSANbNC6saissOApyLiuYiYAVwC7FCWZntgLEBE3A0snpeR6RIHWDMrJEl1bRWsBLxQsv9iPtZZ\nmk91tYx+yNVFA+drj5b1E447ptVFaDun/uKnrS5CW3pvwmWtLkJTLDR/w/8Wo8505RnUe90cDrBd\nEBHtEV3N+rhu/lucDAwu2R9MqqHWSvOpfKxL3ERgZn3NeGB1SUMkzQ/sBlxVluYq4BsAkjYD3oqI\nqV3NyDVYM+tTImKmpEOA64H+wHkR8ZikA/L530TEtZK2lfQUMA3Yp5G8FNHlZgUzM6uDmwjMzJrE\nAbaXUdbqcrQzSf57T/W/K/7703PcBtv79I+Ima0uRLuSpIiYnV9/A3gmIm5rcbHmuY4gGhEh6XPA\nMkC/iLg4H1O4/bDb/E3ei0g6CDhf0rGSRrS6PO2oI2hI2h7YG3iipQVqkcgkbQecBnwI/FrSjzvO\nuybbfQ6wvUR+KroLaYz1JsBPcxCxMpI2BfYF7omIV/KxPhdMJK0AHAF8FZgN/Bs4TNJJ8PGXkTXO\nAbYXkLQosARp/PQW+fBY4IhcQ+nTKrS5vgY8Cawj6T+hb9TYSttcJQ2KiCmkvp4LAcdGxGeBLwKH\nSzquhUXtNRxgCy5Po/Yu6WfeiqQguxPwV1Ifv4MlLdy6ErZWWZvrlyR9CVgW+AnwAPAlSZtD36ix\n5S+SLwEXS1o0Il4AFgHuykkWAM4Ebm1VGXsTB9gCk/Q/wLHAShHxFimgvk8aM/0F4EFg74h4r3Wl\nbLl+AJKwTQFMAAAMT0lEQVS+DfwM2Bi4AtgROBX4CNhT0rCWlXAeycF1K+CnwGkR8U4+NQNYQtJp\npJmlLoqIG3p7jX5ecC+CgpL0FWB34L8i4i1JK0TEo5JeBC4F1gF27Ghj7Gvy/J6PR8QsSSuRPqvd\nI+JhSVeSavjTSPN8fhN4rmWFbaI8xd4KETExH1oTODUixklaMCI+iIg7JQ0AVgD+EhF3uBdBz3CA\nLRhJ/fJP3lWA+4E18k++0ZJmAsNJ/1BmRMSrLSxqy+QmkcOB2ZK+HRGTJT0HDJTUPyIelHQEsF1E\nXCnppIj4qKWFboJcA90VuF7SQqRfN4uTft2cGxEf5HRbAC9ExK0l11kPcBNB8Sye//wDsBpwHPAY\nsBVp/soNIuKlvhpcs+mk3hSzgVPysReBw4DF8v6SwAL5C6vXBVeY06Z8BqmN/iTSRNOnA89LOl3S\nfJKGkx6IrlJ6nWuvPcNzERSIpP2BrwDPAhMj4pySc18hta19PiJealERW6qkdt/Rc2Bt4H+ByRHx\nI0lnA8uTAs5ngG9GxEMtK3CTlA4iyPuDgONJbfSXkXpRHEf6sl6M1IPgGjcL9DwH2IKQtBNwDGlW\nnzWBUcDrwJGkfozHADtHRO9dhKmGsuD6HwAR8ayktUnNBS9HxI8lrUOarf7JiHi2dSVuno5AmR/c\nDQImkJoHjiYF1N9HxHhJiwELRMQr5UHZeoYDbJsqr01I2gdYLCJOye1pawHfBcaQfhIvGBHPtaKs\n7UTSYaT5PQU8Qqq5DSJ9VjOBAzsCcW/U8UWTR/JdADwDPA78idT16khgCHAOcAd83AfYwbXnuQ22\nDUmaD9g6vz5U0ijgLVKf1rUjYnpETCANLlg6Il52cJ3zsGZX4POkh30zgcMi4kHgV6ThoMu2roTN\n0zGYIgfX4aSVUEeTFvN7gtQ3ekvSF85k0gTSc9paHVybwwG2PfUDdpJ0O+kfyvMRcSVwNnCqpFGS\ndiUFiz7Z3goVR2i9R3qYNV+k1UIPBDaVtG9uOvnfiHh5Xpez2SStCJydu1pBCqR7kH7xTCPVXp8A\n9gRGRcSPI+KR1pS2b3E3rTYUER9KuhjYBrgZeCHXas8idQo/gtRBfv+IKF9LqE8oa3Pdi9Rl7X3S\n57KBpPsi4m1JVwAfQPpcW1bgJoqIlySdAqws6c2IOFHS0qSgu1NEPJc/h/7AlNaWtm9xG2wbkrQM\n6ctvFvBzUs3shIh4WdJCETFdUkctrU+TdDBwALBrREyS9C1gW+ApUmDdDdghIia1sJhNkR9MzZme\nUtL5wFDgcxHxpqQxpC/pb0TEU5IW6K1fMu3KAbbNKM2KtS3wNKl/6+9J/RSfItXOdgSGR8TbLStk\nC0laCngnImbkn8YXA3uVtkFL2oa0CujqwAUR8XhLCttEpU/9Ja1G+kxekfQbUhe0HSPiDUknkoLs\ncOCjiJjVulL3PW6DbSOSdidNOXgAqSP8yDyPwL7AO8BSwJ59OLiuRvpsBuT215nAgsAb+fwCOel9\nEXE+8KPeGFw75OC6GamN9UJJi0XEAaSZwi6XtFRE/ADYLSLed3Cd91yDbRN5eOc2wPPAZ0lPfbcj\nBZEhuU/ngOjjqxVIWpI06mjZiLg+19jmAw7Itdp9SV9SOwIf9Oan45JGA0cB5wOHAhNJvSbekHQh\nqTvWqI6mJHfFmvccYNtAbkecn9Rm+HPg3ojo6Ka1P2lI7NEdY8f7mlxbndOlSNIxpOBxPumhzSHA\nCNJa9tsDX+8jI7QuIP1dOSPX3i8kze36tYh4R9KG8fEkL9YCDrAtprQW+7dIbWYvSvo5aSasg4Av\nk7pp7ekRWiDpq8DUiLhd0pGkEVl/Bv5J6v86HXgsIp5sWYGbpKzN9UukWvtgUkA9MwfUpUlTVF4U\nEUfk6z7x5WTzlttgW0jSQNIM8kcBH0g6kNQNa0NSh/CRwB59NbhC6jgPc0Zo/ZDc3hoRx5MeBH6N\nNCjjzxFxVW8MrvCJNbQ+S5pf4d+kEVqbABtLWoS0cOHdwHaS9svXzXZwbR33g22hiHhf0nWkiaBf\nJPUaeIb0U28MMLOvdsUqq7muR2pX3SLvfwFYMSJ+kWuy25GGffa6z0rSp0kzpF2Ze018D+gYyYek\nNUgPQb9LGj69A+mL2d2x2oADbOuNJXWSfzoiXpe0J7ApaQnlXhcw6lEWXL9MmlPgJdJE4lNINbWl\nJS0dEcfnP6e1rsRNtQTwoqRF8oCCfwAHSPpWRJwbEadJ+jupDf8dUvPSoaQvJGsxNxG0WKQZ5e8B\n3sqd5H8IHNKLA0anSoLrdqTJSaaTZgt7ATgjInYlfTEtlNO/1qKiNl2uqb4OTMhDfs8nzem6maRv\n5DRP5qGvHwE/ILXZP9ayQtscrsG2jwVII7d28T8OyP07xwKH5vkDXib9DCZ/ER1EWhG1V1Na8mUP\n4ETgW5JmRcQFkmaT2loVEWMBImKqpG0jYnory2wfc4BtE3n46wV99YFEabNAdj/wD+AoSVfk+RkG\nAisD/0UavdUXHv69BmxAaio4EDhf0syI+GOe3OX+0sQOru3F3bSs5Uo7wOfO84NIgeN14ATSkNev\nRsQ0SfMDA3p7IFFaqHFgnkNgGVKf35NIKwZfQFqFYGxO6wEEbcptsNZyJcH1COBHpHHzF5K6qx1B\n6o70rzzRzUd9ILgOIrXF/1JpovW3gduAwZEWJjyA1NsE8Fyu7cxNBNYWlJZyWTciRkg6nPRE/DbS\nygQ/JD3kWoY0lLhXyzX1H5GaBk4mrSM2krSC8PiIuBFccy0CNxFYyymtHfWf5DkGSIvxbZ/nFtgN\nuDEiXm9lGVsl931dm9S/9ZvAiI4+sNb+3ERg81zHsE9lpNrZcNJSJsuSJiyZkX8eH0XqYdEnRVqC\n/caIOBQYGhETOj4/a3+uwVrLlEwePgC4DniTNKJtVeAV0sitXaOPL2/S0RRQ/mery2Wdc4C1lpD0\nOdLS4/dExDV5kux1gb+TarFLAPeHF3O0AvNDLmuVZ4FPAyfl8fQzSFMN3h4RN7WyYGY9xTVYaylJ\na5LWzVqA1FvgT6TVT2f6Z7AVnQOstVyeLLofcDhwefTiZV6sb3GANTNrEnfTMjNrEgdYM7MmcYA1\nM2sSB1gzsyZxgDUzaxIHWDOzJnGAtYZJmiXpfkkPSbosrzjQ6L0ukLRTfn2OpLVqpB0pafMG8nhO\n0pL1Hi9L814X8xqTp120PswB1rpjekQMjYj1SAvufbv0ZJ7EpV6RNyJiv07WJduKvIR3F1Xr9F1P\nZ/Cudhh3B3NzgLUecyuwWq5d3irpr8DDkvpJOknSPZIekLQ/zJmq8HRJkySNI03wQj53k6SN8+vR\nkiZImihpnKRVSDP6fy/XnodLWkbSn3Ie90jaIl+7lKQbJD0s6RzS5N01SbpS0vh8zX5l507Ox2+U\ntHQ+tqqk6/I1t+Shv2aAJ3uxHpBrqtsC1+ZDQ4F1IuL5HFDfiohheUjsbZJuADYC1gDWIs3Y/yhw\nXr4+gMhrUf0W2DLfa/GIeEvS2cC7EXFyzv8i4FcRcbuklUkzcq0N/AS4JSKOl7QtsG8db+ebEfFm\nbu64R9KfIuJN0jph90bEYZKOyvc+NJfvgLx21qbAmcDWDX6U1ss4wFp3DJTUsarpLaSF+YaTpiDs\nWNplG2A9STvn/UVJixhuCVyUJ3SZIumfZfcWsBkpQD4PEBFvlZ3v8HlgrZJ5qBfJ61ptCeyYr71W\n0pt1vKfvSvpKfj04l/UeYDZwaT7+R+CKnMcWwOUlec9fRx7WRzjAWne8HxFDSw/kQDOtLN0hETGu\nLN22dP6Tvd52TAGbRsRHFcpS9+z/kkaRap+bRcQHkv4FLFglvyA1sb1Z/hmYdXAbrDXb9cBBHQ+8\nJK0haSFSjXe33Ea7AunBVakA7gJGSBqSr+140v8usEhJ2huA73TsSNogv7wF2CMf+yJpEu9aFiUF\nzA8kfYZUg+7QD9glv94DuDUi3gWe7aid53bl9TvJw/oQB1jrjko1zCg7fi6pffU+SQ8BZwH9I+JK\n4Ml8bixwx1w3ingN2J/0c3wicHE+dTWwY8dDLlJw/Wx+iPYI6SEYpJVoR0h6mNRUUG1F2o7y/h0Y\nIOlR4GfAnSVppgHD8nsYBRybj+8J7JvL9zBp0vBan4/1IZ6u0MysSVyDNTNrEgdYM7MmcYA1M2sS\nB1gzsyZxgDUzaxIHWDOzJnGANTNrEgdYM7Mm+f/a2QjPqGmo6wAAAABJRU5ErkJggg==\n", "text/plain": [ "4.2. Explain (briefly) what this confusion matrix shows.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Votre réponse..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to have a quantitative measure of performance, several metrics can be used. In the following, we use accuracy, i.e. the proportion of true results (both true positives and true negatives) among the total number of cases examined. Note that in general, you should use several metrics, depending on the problem." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy = 95.0%\n" ] } ], "source": [ "from sklearn.metrics import accuracy_score\n", "acc = accuracy_score(y_test, pred_kNN)\n", "print('Accuracy = '+str(100*acc)+'%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4.3. In our example, we only used two features (petal length and petal width). Try classifying with all four features and compare the performance. Try also varying the number of neighbors, k.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Votre réponse... Décrivez brièvement les résultats que vous obtenez et vos conclusions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ALLER PLUS LOIN...\n", "\n", "Si le machine learning vous parle et que vous souhaitez voir un cas d'application plus réel plutôt qu'un \"toy dataset\", vous pouvez consulter également le notebook suivant et aller directement à la section \"WINE\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Cool links" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scikit-learn cheatsheet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is a [cheatsheet](http://scikit-learn.org/stable/tutorial/machine_learning_map/) available on scikit-learn's website that help you choose an algorithm when you're lost. It is of course simplified and doesn't include all algorithms and criterions, but it is a good starting point." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Interesting ipython notebooks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[nbviewer.ipython.org](http://nbviewer.ipython.org/) has a nice collection of ipython notebooks that showcase various libraries (and even different languages) that you can use with ipython." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.9" } }, "nbformat": 4, "nbformat_minor": 0 }