{"id":1511,"date":"2017-04-18T16:58:43","date_gmt":"2017-04-18T14:58:43","guid":{"rendered":"http:\/\/www.datascience.rs\/?p=1511"},"modified":"2017-04-18T16:58:43","modified_gmt":"2017-04-18T14:58:43","slug":"inner-life-machine-learning-search-delicate-balance-insights-sas-forum-2017-milan-italy","status":"publish","type":"post","link":"http:\/\/imuno-srbija.com\/data-science\/en\/2017\/04\/18\/inner-life-machine-learning-search-delicate-balance-insights-sas-forum-2017-milan-italy\/","title":{"rendered":"The Inner Life of Machine Learning in Search of a Delicate Balance: Insights from SAS Forum 2017, Milan, Italy"},"content":{"rendered":"<div class=\"_2cuy _3dgx _2vxa\">Goran S. Milovanovi\u0107, PhD<\/div>\n<div class=\"_2cuy _3dgx _2vxa\"><\/div>\n<div class=\"_2cuy _3dgx _2vxa\">We are becoming an evidence driven culture: it&#8217;s all about data these days, but big and small data are nevertheless accompanied by big dilemmas in relation to their use and misuse in guiding our choices. Given the plenitude of statistical models and machine learning algorithms that we have at our disposal nowadays, how do we decide which one should decide in our place in the digital environment of evergrowing complexity? And when the recommendation that should help us guide further information search or decision making is automatically made, will we follow it blindly without reminding ourselves of the ultimate question of &#8220;<span class=\"_4yxp\">why<\/span>&#8221; &#8211; a question that proved to be so central to our human existence? The quote (famously (mis)attributed to Albert Einstein) &#8211; &#8220;<span class=\"_4yxp\">Every fool can know, the point is to understand<\/span>&#8221; &#8211; could prove to be the best piece of advice for anyone connected to the Internet, whenever and for whatever reason.<\/div>\n<div class=\"_2cuy _3dgx _2vxa\">Or to <span class=\"_4yxp\">anything <\/span>online, perhaps? Our evidence driven culture will be driven mostly by autonomous algorithms doing the number crunching to provide for recommendations, risk estimates, classifications, and inferences. We as species could never possibly compute all these data products with our natural cognitive systems &#8211; simply because we were never evolutionary designed to integrate data on a scale that is characteristic of our contemporary digital environments. That is why our machines <span class=\"_4yxp\">have to learn to learn<\/span>, and faster than we do: their adaption is our adaption indeed. But then, what is left for us once they master to recognize the most optimal of the structures that are found in the omnipresent data and infer the best course of action for us?<\/div>\n<div class=\"_2cuy _3dgx _2vxa\">Two answers come to mind. First, it us who define what is useful. Second &#8211; and maybe more important, because we can differ in our positions on what is &#8216;useful&#8217; widely &#8211; <span class=\"_4yxp\">we need to understand<\/span>, while the machines, in principle, do not. A sudden opportunity to attend the <a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Fwww.sas.com%2Fsas%2Fevents%2F17%2Fsas-forum-milan.html&amp;h=ATNghS7QZsL2Li40UwyRvseiuGuRNw1a89Bff32J1F2sup24OJz6ICPCG20BMYaFEJin3Dww_akx13v3oA82WC6qo_8hgAAJKT5S8_caA21IOIvShgXfkwEe1xQSgfY&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">SAS Forum 2017 in Milan, Italy<\/span><\/a>, together with the <a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Fwww.sas.com%2Fen_si%2Fcompany-information.html&amp;h=ATOQj9BQmvpu6dY7mhV-MmFoqczc8nlMz3giAwkwEpT3vveBIwHPsJv3fpklLD-vZfcppSfOVu8eAPtkCo7HBseqzneToZ2acIhkbGCjNvuckDiZNBQg4AiwNar286w&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">SAS Adriatic<\/span><\/a> team, and as a representative of <a href=\"https:\/\/l.facebook.com\/l.php?u=http%3A%2F%2Fwww.datascience.rs%2F&amp;h=ATNo21Fx7YNOJJTLQ9jZMmXa1fB1cBNmID0g8a5s2z_jxE-19lKTrWgQ_s4-14OtQA1EigHwHIt51r1iLn4lWGgmx6fswIv5ZhzPOLM2Cc7xStwh0nxz5RucnaQAlVo&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">Data Science Serbia<\/span><\/a>, inspired another round of my questioning of the present situation in Data Science and Analytics along these lines.<\/div>\n<div class=\"_2cuy _3dgx _2vxa\"><\/div>\n<div class=\"_2cuy _3dgx _2vxa\" style=\"text-align: center;\">* * *<\/div>\n<figure class=\"_2cuy _4nuy _2vxa\">\n<div class=\"_h2x _h2y\"><img id=\"u_t_2\" class=\"_h2z _297z _usd img\" src=\"https:\/\/scontent.fbeg5-1.fna.fbcdn.net\/v\/t31.0-8\/s960x960\/17991591_1846412095611071_1251746256930589718_o.png?oh=b0438ccc230484dd4f2d705cebfa2a92&amp;oe=59916200\" alt=\"\" \/><\/p>\n<div class=\"_h2w _50f8 _50f4\">Cellular automaton: Rule 193 with random conditions. Wikimedia Commons, 21 September 2013, 09:17:51, Author: Sofeykov.<\/div>\n<\/div>\n<\/figure>\n<div class=\"_2cuy _3dgx _2vxa\"><\/div>\n<div class=\"_2cuy _3dgx _2vxa\" style=\"text-align: center;\">* * *<\/div>\n<div class=\"_2cuy _3dgx _2vxa\">In his <a href=\"https:\/\/www.linkedin.com\/pulse\/difference-between-statistical-modeling-machine-i-see-schabenberger\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxp _4yxr\">The difference between Statistical Modeling and Machine Learning, as I see it<\/span><\/a> (2016), Mr Oliver Schabenberger, EVP and Chief Technology Officer at SAS, has provided an attempt at a concise delineation between statistical modeling and machine learning, relying on the following proposal that differentiates between (a) <span class=\"_4yxp\">statistical modeling<\/span>, (b) <span class=\"_4yxp\">classic machine learning<\/span>, and (c) <span class=\"_4yxp\">modern machine learning<\/span>:<\/div>\n<div class=\"_2cuy _3dgx _2vxa\">(a) The basic goal of <span class=\"_4yxp\">Statistical Modeling <\/span>is to answer the question, \u201c<span class=\"_4yxp\">Which probabilistic model could have generated the data I observed?<\/span>\u201d<\/div>\n<div class=\"_2cuy _3dgx _2vxa\">(b) Classical machine learning is a data-driven effort, focused on algorithms for regression and classification, and motivated by pattern recognition. The underlying stochastic mechanism is often secondary and not of immediate interest [&#8230;] the primary concern is to identify the algorithm or technique (or ensemble thereof) that performs the specific task.<\/div>\n<div class=\"_2cuy _3dgx _2vxa\">(c) A machine learning system is truly a learning system if it is not programmed to perform a <span class=\"_4yxp\">task<\/span>, but is programmed to <span class=\"_4yxp\">learn<\/span> to perform the task. [&#8230;] Like the classical variant, it is a data-driven exercise. Unlike the classical variant, <span class=\"_4yxp\">modern machine learning <\/span>does not rely on a rich set of algorithmic techniques. Almost all applications of this form of machine learning are based on deep neural networks.<\/div>\n<div class=\"_2cuy _3dgx _2vxa\">I was granted an opportunity to learn on some elaborations of this line of thinking from Mr Schabenberger directly during the SAS Forum 2017. In my interpretation &#8211; and this is necessary to stress, given the immense complexity of the topic under discussion &#8211; his words reassured me that the following trade-offs between (a) our understanding of what do we do with data analytics, and (b) simply being able to develop more and more complex methods to accomplish progressively complicated tasks hold:<\/div>\n<ul class=\"_5a_q _5yj1\" dir=\"ltr\">\n<li class=\"_2cuy _509q _2vxa\">In mathematical statistics (i.e. statistical modeling) as we know it, our understanding of the data generating process is guaranteed; making use of binary or multiple logistic regression, cumulative logit models, or even ordinary least-squares regression methods or various ANOVA experimental designs, as we all know, can bring about some problems of interpretation, but those problems are miniscule when taken from the perspective of us being able to understand the data generating process in general &#8211; simply because we know the assumptions under which such techniques work and have strict mathematical proofs that support our understanding. The drawback is evidently related to the question of whether the assumed data generating processes captures the true complexity of the empirical reality that we need to model and predict. At some point, the realistic underlying stochastic processes are too complex to be even approximated by our simplifying assumptions, which are more often then not introduced only in order to be able to provide for the necessary mathematical proofs that some generating process that we can conceptualize can be estimated by a model whose parameters we can understand.<\/li>\n<li class=\"_2cuy _509q _2vxa\">In what Schabenberger recognizes as &#8220;classical machine learning&#8221;, we can still establish some sort of interpretation of the results; given a typical back-propagation network, a multi-layer perceptron, for example, one can still <span class=\"_4yxp\">at least in principle<\/span> build an understanding of its inner workings by tracking the changes in the weights among the connections in the hidden layers and then perform the analyses (e.g. multivariate techniques like PCA) that reveal the patterns present in the model&#8217;s evolution towards an optimal state (i.e. where the model predicts or classifies correctly according to some criteria). Such methods were already used to provide an interpretation of the dynamical evolution of recurrent neural networks: for example Rogers and McClleland use multidimensional scaling in their book &#8220;<a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Fmitpress.mit.edu%2Fbooks%2Fsemantic-cognition&amp;h=ATNVwuzwY5YIQCNo7AkvOpnGLisK0PLdxcDbL4PY4iLbqVTyaXA4P35pLGGqJ5G_JbMsguhegwwnKC2ou1_uhwgGNmTXY-R4ZyoloXGJtTtBsHfDUmC14_Qta8vZ2l0&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxp _4yxr\">Semantic Cognition<\/span><\/a><a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Fmitpress.mit.edu%2Fbooks%2Fsemantic-cognition&amp;h=ATMKhtEcQp1ST2dwUKAIImbWZTjdntHYn6681-9UmQ_0Q1CeekR1aBHkEXku-Uy-0pdou8ZSychTBu8bOdGKFM_6Z-NZl9bgMqLXsW0-uLRQs4_2sx1JQvZ75zkqULE&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxp _4yxr\">: A Parallel Distributed Processing Approach<\/span><\/a>&#8221; (2004, Chapter 3, p. 89) to trace the evolution of a conceptual system modeled by a recurrent back-prop network.<\/li>\n<li class=\"_2cuy _509q _2vxa\">It seems that the problem of model interpretation &#8211; and consequently, the problem of our understanding of the analytical machinery on whose results we have to rely on &#8211; emerges very seriously in relation to what Schabenberger recognizes as &#8220;modern machine learning&#8221;. The gap between (a) our ability to solve very complex problems (i.e. the ability of our machines to dig out patterns from very complex datasets), and (b) our understanding of how the solution was reached &#8211; providing the explanatory foundations for the decisions that we are about to bring &#8211; could prove to be a true abyss in this case. Even a peak only <a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Farxiv.org%2Fpdf%2F1611.04558v1.pdf&amp;h=ATP-wiBkzQBbw1zaQxPn_y4WlSXckgSVbSZkFaFdf-1L4U6-C6GO4c75bdtbvYVtfUAd4p-Iz6IbnuPSHfxp_HcY8cCL4dh4UjxZHPnglTwtySWiVqSDcJt3HYqzUuU&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">into <\/span><\/a><a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Farxiv.org%2Fpdf%2F1611.04558v1.pdf&amp;h=ATP-5aX4rFVrxS_t6XySNm9z1o9ENzPBb1p-FkhVbaa2YxL7v8Dnb0KL60UFSFzwjaD2QvI8FoA1SFi1H-YJJQBFVvIf7268qlzQvVFiW9XC509InPzLdK2uSJ2pTvI&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">the results of <\/span><\/a><a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Farxiv.org%2Fpdf%2F1611.04558v1.pdf&amp;h=ATMSHhljP_dNGqPkCFx5hBGUmuxc0ryddbTlzuZ9sHXCShilt1z-w4mLMR_h-U5iVgedu5IMG_ealq9k3CGzOUoFjgqkqKSZ8sesm6-9kmPP5KaLGjAZpo3wEiwYwF0&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">Google&#8217;s <\/span><\/a><a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Farxiv.org%2Fpdf%2F1611.04558v1.pdf&amp;h=ATOSpOuRr-m3rsouPADu5Rokngv7xj3yMRwf8A2oxz4uQUXHGFvgfEEX3KU5yXFfpjz5clHO0R6a69zyAIRT99MFRlzBwJxqjFvdNSFoSWtr2SXklSPelijCrTcKhwQ&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">recent revolution in machine translation <\/span><\/a>uncovers a heroic struggle that the research team was facing to analytically understand the inner working of a complex learning system that has achieved a previously unimaginable performance in an extremely difficult task.<\/li>\n<\/ul>\n<div class=\"_2cuy _3dgx _2vxa\">Least to say, the gap will not (and it should not) slow down the skyrocketing or modern machine learning, probably spawning &#8220;the machine learning of machine learning&#8221; paradigm around our efforts to understand the machines that we have designed to serve our ends. However, it should present a friendly reminder &#8211; this fascinating characteristic of the <a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Fwww.weforum.org%2Fagenda%2F2016%2F01%2Fthe-fourth-industrial-revolution-what-it-means-and-how-to-respond%2F&amp;h=ATP0celHn7jbXAlOgg2ZGPpW9Ug69i2nxQ7WifRW5XrJVRVC6FCmfHFb3bZcIcbU7Ek97RCXL1VEiAbg74_EDXrty40Rifd8NdMghvH0ixfWj3R1_Fvkey9hqEJP1dY&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">Fourth <\/span><\/a><a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Fwww.weforum.org%2Fagenda%2F2016%2F01%2Fthe-fourth-industrial-revolution-what-it-means-and-how-to-respond%2F&amp;h=ATPh5hccO4xtTLPXdEMwe3ewhU_OSZh_gh0CFI6M30m-W0LV2OSHoHSyXCWcLj2nDcQZTgbuk1t3Z0cJEIBVv_QLtr9X_xveObzBEyyBRZsugFKbQuMvXzl0nFK15_o&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">Industrial Revolution<\/span><\/a> &#8211; that we are beginning to rely widely on automated systems whose inner workings we need to study scientifically and only eventually hope to understand fully, in spite of the fact that they were human designed from a beginning to an end. And any use of learning systems whose final outputs can be described as highly complex emerging properties &#8211; such as complex neural networks, evolutionary computation, and similar &#8211; will pose a similar problem to us.<\/div>\n<div class=\"_2cuy _3dgx _2vxa\" style=\"text-align: center;\">* * *<\/div>\n<div class=\"_2cuy _3dgx _2vxa\">The Data Analytics world will have to search for a delicate balance in respect to this dilemma. A typical Data Analyst (and maybe more important, a typical user of his or her recommendations too) is not at ease with buying an algorithm simply because it works, no matter how well motivated its development was. When I perform a logistic regression, assuming that the model assumptions hold, I can safely conclude that the exponential of the regression coefficient affects the odds ratio in a certain way, and I can rely confidently on my model because I can trace back exactly to an explanation of why is that so. I know <span class=\"_4yxp\">how<\/span> the model &#8220;reached the conclusion&#8221; that I have read out from its parameters, and thus I understand <span class=\"_4yxp\">why <\/span>some models work better than the others. Also, it is sometimes possible to demonstrate <a href=\"https:\/\/www.smartcat.io\/blog\/2017\/hybrid-content-based-and-collaborative-filtering-recommendations-with-ordinal-logistic-regression-2-recommendation-as-discrete-choice\/\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">how classic statistical modeling <\/span><\/a><a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Fwww.smartcat.io%2Fblog%2F2017%2Fhybrid-content-based-and-collaborative-filtering-recommendations-with-ordinal-logistic-regression-2-recommendation-as-discrete-choice%2F&amp;h=ATPhWCu6g0cC0H8ToMfLw3jephCqm21u1DFxwDF6w2G-23W5apTW7drQWIWzQaV66Db3h_6krfZ3iSJU6sgRtnmoF1uEeTH62qTOshbv5Gv3FUTrSipami5QiEpAC3o&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">can solve even very &#8220;modern&#8221; <\/span><\/a><a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Fwww.smartcat.io%2Fblog%2F2017%2Fhybrid-content-based-and-collaborative-filtering-recommendations-with-ordinal-logistic-regression-2-recommendation-as-discrete-choice%2F&amp;h=ATP8dn88v7gIz0DIepNEk85_XRlimAGoMUv0pBbYEyyYlgob-MfVRrkeEqLVMsXZGirj0Itc5PPP9aOYn1qjLAPF-LRy1gTUNTllLyxUhRwA9N1X_EDlT7JMRz528Z8&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">&#8211; very complex &#8211; <\/span><\/a><a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Fwww.smartcat.io%2Fblog%2F2017%2Fhybrid-content-based-and-collaborative-filtering-recommendations-with-ordinal-logistic-regression-2-recommendation-as-discrete-choice%2F&amp;h=ATPZ_OR7gnATCZf-k1U24OY5bC-R8nf-AqQlaPhn0yRacbyh9LYYucrJk_gDOd47Z-S3LFeXFCUyVwJvhJFHAlnkoNlwfT9AgcQqY0vYNRyMDZM5OHtqvk3z6MqUr5I&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">problems<\/span><\/a> when applied over <a href=\"https:\/\/l.facebook.com\/l.php?u=https%3A%2F%2Fwww.smartcat.io%2Fblog%2F2017%2Fhybrid-content-based-and-collaborative-filtering-recommendations-with-ordinal-logistic-regression-1-feature-engineering%2F&amp;h=ATMMrMctWj8e1eNHzNoswKS2eqbT0ZLNIyt_R2idK67_rNySi8RkUO4iD-l-Tecq9KYFCf721ct0T3OqfOdcnQlnkFD4-jbirOm8Vokl1LoEWyD9fsrI6cykQLZMje0&amp;s=1\" target=\"_blank\" rel=\"nofollow noopener\"><span class=\"_4yxr\">an elaborate description of the dataset<\/span><\/a> under consideration. As an analyst and a scientist, and no matter how complex the future that awaits, I don&#8217;t think that the interpretation game is over, and that we should ever give up of the effort to apply machine learning thoughtfully until we are able to fully understand its &#8220;inner life&#8221;. If that calls for an opening of a whole new scientific arena in Data Science, and even it is going to be so constrained by the complexity of the processes under study to forever remain a field of empirical, experimental study of artificial learning systems &#8211; be it. The challenge will only get harder as more and more advanced learning machinery becomes available, but I would avoid at any cost the attitude of just letting it go.<\/div>\n<!--themify_builder_content-->\n<div id=\"themify_builder_content-1511\" data-postid=\"1511\" class=\"themify_builder_content themify_builder_content-1511 themify_builder tf_clear\">\n    <\/div>\n<!--\/themify_builder_content-->","protected":false},"excerpt":{"rendered":"<p>Goran S. Milovanovi\u0107, PhD We are becoming an evidence driven culture: it&#8217;s all about data these days, but big and small data are nevertheless accompanied by big dilemmas in relation to their use and misuse in guiding our choices. Given the plenitude of statistical models and machine learning algorithms that we have at our disposal [&hellip;]<\/p>","protected":false},"author":1,"featured_media":1512,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[25],"tags":[],"_links":{"self":[{"href":"http:\/\/imuno-srbija.com\/data-science\/en\/wp-json\/wp\/v2\/posts\/1511"}],"collection":[{"href":"http:\/\/imuno-srbija.com\/data-science\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/imuno-srbija.com\/data-science\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/imuno-srbija.com\/data-science\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/imuno-srbija.com\/data-science\/en\/wp-json\/wp\/v2\/comments?post=1511"}],"version-history":[{"count":0,"href":"http:\/\/imuno-srbija.com\/data-science\/en\/wp-json\/wp\/v2\/posts\/1511\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/imuno-srbija.com\/data-science\/en\/wp-json\/wp\/v2\/media\/1512"}],"wp:attachment":[{"href":"http:\/\/imuno-srbija.com\/data-science\/en\/wp-json\/wp\/v2\/media?parent=1511"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/imuno-srbija.com\/data-science\/en\/wp-json\/wp\/v2\/categories?post=1511"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/imuno-srbija.com\/data-science\/en\/wp-json\/wp\/v2\/tags?post=1511"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}