Optimization Methods for Big Data

Project Description

Obiettivi

LEARNING OUTCOMES:

Aim of the course is to introduce constrained optimization with specific attention to applications in the field of SVM (Support Vector Machin) training and the definition of clustering techniques. Decision trees for classification are also described.

KNOWLEDGE AND UNDERSTANDING:
The student is introduced to the knowledge of non-linear optimization, especially constrained optimization, and to problems related to supervised learning problems with particular attention to classification and regression problems. Different techniques are described: SVM and decision trees. The student is also introduced to the problem of unsupervised classification, which is solved by standard clustering techniques such as k-means and hierarchical clustering.

APPLYING KNOWLEDGE AND UNDERSTANDING:

A significant part of the course is dedicated to the practical use of the techniques described theoretically. The student is encouraged – through the use of dedicated software (implemented in R) to apply the proposed tools on real datasets.

MAKING JUDGEMENTS:
The development of classification models for real datasets and implementation by means of software packages, allows the quantitative verification of the quality of the solutions developed by the students. Part of the examination consists in the development of specific projects in teams of up to two people.

COMMUNICATION SKILLS:

The interpretation of the results obtained (after the implementation and computation phase) is one of the fundamental activities of the process of defining a solution to a real problem, starting from a classification model. This type of activity obliges the student to compare with colleagues and the teacher so that his critical and dialectical skills are stimulated.

LEARNING SKILLS:

The student is exposed (through the proposed teaching material) to the reading of reference texts (both books and research articles) and to the discovery of recent and continuously developing software tools. He/she has to confront himself with different tools in order to (i) acquire new skills, (ii) learn how to update his/her preparation continuously and independently, (iii) undertake in-depth courses on the discipline.

Programma

Introduction to optimization: modeling approach

Optimization issues: classification

Problems of Mathematical Programming: conditions of existence of the solution
Unconstrained optimization: optimality conditions, solution algorithms: global convergence conditions, line search, hints on the gradient method.
Constrained optimization: optimal conditions and solution algorithms
Wolfe and SVM dual. Algorithms for SVM: SVM_light and dual coordinate method.
Unsupervised clustering: formulation and k-means algorithm batch and online. Algorithm k-medoids. Agglomerative and divisive hierarchical clustering

Decision trees: Decision trees and classification. CART (Classification And Regression Trees). Induction task: TDIDT and Top-Down approach. Choice of split test. Measures of “impurities” at nodes: Gini index, Chi-quadro, Entropia, Information Gain, Gain Ratio and Classification Error. Notes on computational complexity.
Design of an Optimal Classification Tree (OCT) as a whole optimization problem (MIO). Bertsimas and Dunn model: OCT-MIO. Unique case: Definition of variables. Modeling the tree structure. Pruning. Consistency with test output. Class-label assignment to leaf nodes.
Extension of OCT-MIO to the multivariate case: OCT-H.

Docente

Veronica Piccialli

0 crediti

60 ore di lezione

0° Anno

Laurea Magistrale

0° semestre