Friday, December 3, 2010

stanford machine learning course - lecture#2

I watched the 2nd lecture video and followed lecture notes from stanford machine learning course. At a very high level, Following is what I took away from it.

It formally explains what Supervised Learning is and explains what is the difference between a Regression problem and Classification problem. Then it defines the (least squares)cost function, called J($\theta$) and talks about algorithms that find $\theta$ that minimize the cost function J($\theta$).
First It describes Gradient Descent(Batch/Stochastic) algorithm that leads to Least mean square or Widrow-Hoff learning rule.
Along with some notation for derivatives of Matrix, It derives Normal equation, that can directly determine $\theta$ to minimize cost function, by solving the equation derivative-of(J($\theta$)) = 0