本文共 3730 字,大约阅读时间需要 12 分钟。
SVM hypothesis:
min C ∑ i = 1 m [ y ( i ) c o s t 1 ( θ T x ( i ) ) + ( 1 − y ( i ) ) c o s t 0 ( θ T x ( i ) ) ] + 1 2 ∑ i = 1 n θ j 2 \min C\sum^m_{i=1}[y^{(i)}cost_1(\theta ^Tx^{(i)})+(1-y^{(i)})cost_0(\theta ^Tx^{(i)})]+\frac{1}{2}\sum^n_{i=1}\theta^2_j minCi=1∑m[y(i)cost1(θTx(i))+(1−y(i))cost0(θTx(i))]+21i=1∑nθj2 h θ ( x ) = { 1 if θ T x ⩾ 0 0 otherwise h_{\theta}(x)=\begin{dcases}1 &\text{if } \enspace\theta ^Tx\geqslant 0\\0 &\text{otherwise}\end{dcases} hθ(x)={ 10if θTx⩾0otherwise 大间距分类器Large Margin Classifier/Intuition:当C很大时 SVM:当C不是很大时 margin:样本到决策边界的最大距离 margin使SVM具有鲁棒性 SVM使正负样本以最大margin分隔开来min θ 1 2 ∑ j = 1 n θ j 2 = 1 2 ∥ θ ∥ 2 s . t . θ T x ( i ) ≥ 1 i f y ( i ) = 1 θ T x ( i ) ≤ − 1 i f y ( i ) = 0 \min_\theta\frac{1}{2}\sum_{j=1}^n\theta_j^2=\frac{1}{2}\Vert\theta\Vert^2\\ s.t.\quad\theta^Tx^{(i)}\ge1\qquad if\;y^{(i)}=1\\ \qquad\;\theta^Tx^{(i)}\le-1\quad\; if\;y^{(i)}=0 θmin21j=1∑nθj2=21∥θ∥2s.t.θTx(i)≥1ify(i)=1θTx(i)≤−1ify(i)=0
Simplification: θ 0 = 0 , n = 2 \theta_0=0,n=2 θ0=0,n=2Given x ( 1 ) , y ( 1 ) , ( x ( 2 ) , y ( 2 ) ) , . . . , ( x ( m ) , y ( m ) ) , x^{(1)},y^{(1)},(x^{(2)},y^{(2)}),...,(x^{(m)},y^{(m)}), x(1),y(1),(x(2),y(2)),...,(x(m),y(m)),
choose l ( 1 ) = x ( 1 ) , l ( 2 ) = x ( 2 ) , . . . l ( m ) = x ( m ) . l^{(1)}=x^{(1)},l^{(2)}=x^{(2)},...l^{(m)}=x^{(m)}. l(1)=x(1),l(2)=x(2),...l(m)=x(m). Given example x x x:f 1 = s i m i l a r i t y ( x , l ( 1 ) ) = exp ( − ∣ ∣ x − l ( 1 ) ∣ ∣ 2 2 σ 2 ) f_1=similarity(x,l^{(1)})=\exp(-\dfrac{||x-l^{(1)}||^2}{2\sigma^2}) f1=similarity(x,l(1))=exp(−2σ2∣∣x−l(1)∣∣2)
f 2 = s i m i l a r i t y ( x , l ( 2 ) ) = exp ( − ∣ ∣ x − l ( 2 ) ∣ ∣ 2 2 σ 2 ) f_2=similarity(x,l^{(2)})=\exp(-\dfrac{||x-l^{(2)}||^2}{2\sigma^2}) f2=similarity(x,l(2))=exp(−2σ2∣∣x−l(2)∣∣2) …
For training example ( x ( i ) , y ( i ) ) : (x^{(i)},y^{(i)}): (x(i),y(i)):
f 1 ( i ) = s i m ( x ( i ) , l ( 1 ) ) f^{(i)}_1=sim(x^{(i)},l^{(1)}) f1(i)=sim(x(i),l(1))
f 2 ( i ) = s i m ( x ( i ) , l ( 2 ) ) f^{(i)}_2=sim(x^{(i)},l^{(2)}) f2(i)=sim(x(i),l(2)) … f 2 ( i ) = s i m ( x ( i ) , l ( m ) ) f^{(i)}_2=sim(x^{(i)},l^{(m)}) f2(i)=sim(x(i),l(m))
Hypothesis: Given x x x, compute features f ∈ R m + 1 f\in \R^{m+1} f∈Rm+1
Predict “ y = 1 y=1 y=1” if θ T f ≥ 0 \theta^Tf\ge 0 θTf≥0 Training: min θ C ∑ i = 1 m y ( i ) c o s t 1 ( θ T f ( i ) ) + ( 1 − y ( i ) ) c o s t 0 ( θ T f ( i ) ) + 1 2 ∑ j = 1 m θ j 2 \min_{\theta}C\sum^m_{i=1}y^{(i)}cost_1(\theta^Tf^{(i)})+(1-y^{(i)})cost_0(\theta^Tf^{(i)})+\dfrac{1}{2}\sum^m_{j=1}\theta^2_j θminCi=1∑my(i)cost1(θTf(i))+(1−y(i))cost0(θTf(i))+21j=1∑mθj2 注:对于 ∑ j = 1 m θ j 2 = θ T θ \sum^m_{j=1}\theta^2_j=\theta^T\theta ∑j=1mθj2=θTθ,可以用 θ T M θ \theta^TM\theta θTMθ来优化算法.
Large C ⟺ \Longleftrightarrow ⟺ small λ ⟹ \lambda \Longrightarrow λ⟹ 低偏差,高方差 ⟺ \Longleftrightarrow ⟺ overfit
underfit
如果 x ≈ l ( 1 ) x \approx l^{(1)} x≈l(1), f 1 ≈ 1 f_1 \approx 1 f1≈1;如果 x x x is far from l ( 1 ) l^{(1)} l(1), f 1 ≈ 0 f_1 \approx 0 f1≈0.
如上图所示,Large σ 2 \sigma^2 σ2 ⟺ \Longleftrightarrow ⟺ Feature f i f_i fi vary more smoothly ⟺ \Longleftrightarrow ⟺ 模型随着 x x x的输入变化缓慢 ⟺ \Longleftrightarrow ⟺ 高偏差,低方差 ⟺ \Longleftrightarrow ⟺ underfit
overfit
用SVM software package (e.g. liblinear, libsum, …) 来求得参数 θ \theta θ
转载地址:http://pymzi.baihongyu.com/