Skip to main content

Linear Classification

two major components: a score function that maps the raw data to class scores, and a loss function that quantifies the agreement between the predicted scores and the ground truth labels. We will then cast this as an optimization problem in which we will minimize the loss function with respect to the parameters of the score function.

Parameterized mapping from images to label scores

score function that maps the pixel values of an image to confidence scores for each class. let’s assume a training dataset of images xiRDx_i \in R^D, each associated with a label yiy_i. Here i1Ni\in{1…N} and yi1Ky_i\in{1…K}. That is, we have N examples (each with a dimensionality of D) and K distinct categories.

For example, in CIFAR-10 we have a training set of N = 50,000 images, each with D = 32 x 32 x 3 = 3072 pixels, and K = 10, since there are 10 distinct classes (dog, cat, car, etc). We will now define the score function f:RD↦RK that maps the raw image pixels to class scores.

simplest possible function, a linear mapping: f(xi,W,b)=Wxi+bf(x_i,W,b)=Wx_i+b