Okay, let’s have a look at using cross-validation to determine the best values for model hyperparameters. We will only be looking at the SVM or SGD classifiers as the Perceptron model doesn’t have any hyperparamters. Not that the other two models have that many. Some models have considerably more.

I will also only scale the data using the Standard Scaler. Refer to the related notebook for the code/notebook setup (imports, loading data, train/test split, scaling, etc.).

Hyperparameter Tuning

Parameters which define the model architecture are referred to as hyperparameters and thus this process of searching for the ideal model architecture is referred to as hyperparameter tuning.
……
I want to be absolutely clear, hyperparameters are not model parameters and they cannot be directly trained from the data.

Hyperparameter tuning for machine learning models, Jeremy Jordan, 2017.11.02

SVM

Let’s start with svm.SVC.

class sklearn.svm.SVC(*, C=1.0, kernel=‘rbf’, degree=3, gamma=‘scale’, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=- 1, decision_function_shape=‘ovr’, break_ties=False, random_state=None)

The hyperparameter we are likely most interested in is C. We will stick with the defaults for the other function parameters.

C: float, default=1.0

Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty.

I purpose that we pick a set of possible values, run cross-validation on the training dataset for those values, select the best 2 or 3 results, fit models using those values and measure the performance against the test dataset for each.

Tuning

Wanted to see how long a single 5-fold cross-validation would take. Notebook said, 0.1 s.

In [6]:
# let's see how long a single cross val, k=5, takes
svm = SVC(C=0.5, random_state=0)
svm_scr = ms.cross_val_score(svm, X_trn_std, y_trn, cv=5, scoring="accuracy")
print(svm_scr)
[0.96703297 0.96703297 0.98901099 0.98901099 0.93406593]

Let’s create a list of potential regularization values. And then run cross-validation against SVM model using each of them.

In [7]:
# let's start with the C hyperparameter for svm.SVC
cv_scores = {}
reg_cs_1 = [i/1000 for i in range(1, 10, 2)]
reg_cs_2 = [i/1000 for i in range(10, 100, 20)]
reg_cs_3 = [i/1000 for i in range(100, 1000, 200)]
reg_cs_4 = [i/1000 for i in range(1000, 10000, 2000)]
reg_cs = np.concatenate([reg_cs_1, reg_cs_2, reg_cs_3, reg_cs_4])

for cr in reg_cs: svm = SVC(C=cr, random_state=0) svm_scr = ms.cross_val_score(svm, X_trn_std, y_trn, cv=5, scoring="accuracy") cv_scores[str(cr)] = svm_scr

And, let’s look at the results.

In [8]:
for lbl, scrs in cv_scores_1.items():
  print(f"{lbl}: {[round(scr,5) for scr in scrs]} -> {scrs.mean()}")
0.001: [0.63736, 0.62637, 0.62637, 0.62637, 0.62637] -> 0.6285714285714286
0.003: [0.63736, 0.62637, 0.62637, 0.62637, 0.62637] -> 0.6285714285714286
0.005: [0.63736, 0.62637, 0.62637, 0.62637, 0.62637] -> 0.6285714285714286
0.007: [0.63736, 0.62637, 0.62637, 0.62637, 0.62637] -> 0.6285714285714286
0.009: [0.63736, 0.62637, 0.62637, 0.62637, 0.62637] -> 0.6285714285714286
 0.01: [0.63736, 0.62637, 0.62637, 0.62637, 0.62637] -> 0.6285714285714286
 0.03: [0.92308, 0.93407, 0.96703, 0.93407, 0.9011]  -> 0.9318681318681319
 0.05: [0.92308, 0.92308, 0.96703, 0.95604, 0.93407] -> 0.9406593406593406
 0.07: [0.91209, 0.93407, 0.96703, 0.95604, 0.92308] -> 0.9384615384615385
 0.09: [0.91209, 0.93407, 0.97802, 0.95604, 0.93407] -> 0.9428571428571428
  0.1: [0.92308, 0.93407, 0.97802, 0.95604, 0.93407] -> 0.945054945054945
  0.3: [0.95604, 0.94505, 0.98901, 0.96703, 0.93407] -> 0.9582417582417582
  0.5: [0.96703, 0.96703, 0.98901, 0.98901, 0.93407] -> 0.9692307692307693
  0.7: [0.97802, 0.96703, 0.98901, 0.98901, 0.93407] -> 0.9714285714285715
  0.9: [0.97802, 0.96703, 0.98901, 0.98901, 0.93407] -> 0.9714285714285715
  1.0: [0.97802, 0.96703, 0.98901, 0.98901, 0.95604] -> 0.9758241758241759
  3.0: [0.97802, 0.96703, 0.98901, 0.98901, 0.95604] -> 0.9758241758241759
  5.0: [0.98901, 0.96703, 0.98901, 0.97802, 0.95604] -> 0.9758241758241759
  7.0: [0.98901, 0.97802, 0.98901, 0.97802, 0.95604] -> 0.9780219780219781
  9.0: [0.98901, 0.97802, 0.98901, 0.97802, 0.95604] -> 0.9780219780219781

Ok, values still look to be going up. So let’s try a few more values. Going to use integer values in range(10, 20, 2).

In [9]:
# looks like it just keeps getting better, let's try a few more values, 10 and up
reg_cs_5 = [i for i in range(10, 20, 2)]
reg_cs_2 = np.concatenate([reg_cs, reg_cs_5])
cv_scores_2 = {}

for cr in reg_cs_2: svm = SVC(C=cr, random_state=0) svm_scr = ms.cross_val_score(svm, X_trn_std, y_trn, cv=5, scoring="accuracy") cv_scores_2[str(cr)] = svm_scr

In [10]:
for lbl, scrs in cv_scores_2.items():
  print(f"{lbl}: {[round(scr,5) for scr in scrs]} -> {scrs.mean()}")
0.001: [0.63736, 0.62637, 0.62637, 0.62637, 0.62637] -> 0.6285714285714286
0.003: [0.63736, 0.62637, 0.62637, 0.62637, 0.62637] -> 0.6285714285714286
0.005: [0.63736, 0.62637, 0.62637, 0.62637, 0.62637] -> 0.6285714285714286
0.007: [0.63736, 0.62637, 0.62637, 0.62637, 0.62637] -> 0.6285714285714286
0.009: [0.63736, 0.62637, 0.62637, 0.62637, 0.62637] -> 0.6285714285714286
0.01: [0.63736, 0.62637, 0.62637, 0.62637, 0.62637] -> 0.6285714285714286
0.03: [0.92308, 0.93407, 0.96703, 0.93407, 0.9011] -> 0.9318681318681319
0.05: [0.92308, 0.92308, 0.96703, 0.95604, 0.93407] -> 0.9406593406593406
0.07: [0.91209, 0.93407, 0.96703, 0.95604, 0.92308] -> 0.9384615384615385
0.09: [0.91209, 0.93407, 0.97802, 0.95604, 0.93407] -> 0.9428571428571428
0.1: [0.92308, 0.93407, 0.97802, 0.95604, 0.93407] -> 0.945054945054945
0.3: [0.95604, 0.94505, 0.98901, 0.96703, 0.93407] -> 0.9582417582417582
0.5: [0.96703, 0.96703, 0.98901, 0.98901, 0.93407] -> 0.9692307692307693
0.7: [0.97802, 0.96703, 0.98901, 0.98901, 0.93407] -> 0.9714285714285715
0.9: [0.97802, 0.96703, 0.98901, 0.98901, 0.93407] -> 0.9714285714285715
1.0: [0.97802, 0.96703, 0.98901, 0.98901, 0.95604] -> 0.9758241758241759
3.0: [0.97802, 0.96703, 0.98901, 0.98901, 0.95604] -> 0.9758241758241759
5.0: [0.98901, 0.96703, 0.98901, 0.97802, 0.95604] -> 0.9758241758241759
7.0: [0.98901, 0.97802, 0.98901, 0.97802, 0.95604] -> 0.9780219780219781
9.0: [0.98901, 0.97802, 0.98901, 0.97802, 0.95604] -> 0.9780219780219781
10.0: [0.97802, 0.97802, 0.98901, 0.97802, 0.94505] -> 0.9736263736263737
12.0: [0.97802, 0.97802, 0.96703, 0.97802, 0.94505] -> 0.9692307692307691
14.0: [0.97802, 0.97802, 0.96703, 0.97802, 0.94505] -> 0.9692307692307691
16.0: [0.97802, 0.97802, 0.96703, 0.97802, 0.93407] -> 0.9670329670329669
18.0: [0.97802, 0.97802, 0.96703, 0.97802, 0.93407] -> 0.9670329670329669
In [11]:
sv_means = [scrs.mean() for _, scrs in cv_scores_2.items()]
ax = plt.axes()
ax.plot(reg_cs_2, sv_means, color='k', linestyle='-')
ax.scatter(reg_cs_2, sv_means, color='blue', linestyle='-')
Out[11]:
[<matplotlib.lines.Line2D at 0x23755b52f10>]
Out[11]:
<matplotlib.collections.PathCollection at 0x23755b64460>
plot of cross-validation tuning scores for C parameter for SVM model

Looks like the best value is C >= 7 and C <= 9. Given the SVC function default is 1, let’s test with C=1 and C=7.

Testing

In [12]:
# looks like C>=7 and C<=9 gives us the best results
# C=1 is the default, so let's test using C=1 & C=7
c_tst = [1.0, 7.0]
ct_scr = {}
for ct in c_tst:
  svm = SVC(C=ct, random_state=0)
  t_f = svm.fit(X_trn_std, y_trn)
  t_preds = svm.predict(X_tst_std)
  ct_scr[str(ct)] = accuracy_score(y_tst, t_preds)
In [13]:
for lbl, scr in ct_scr.items():
  print(f"{lbl}: {scr}")
1.0: 0.9824561403508771
7.0: 0.9736842105263158

And, there you go. Our tuning attempt appears to have failed. Guess real world data makes things complicated. I am guessing that the training and testing dataset split we created is the likely culprit. Will have to do some more research. But, that isn’t going to stop us from trying to tune the SGM classifier.

SGD

Okay, onto sklearn.linear_model.SGDClassifier.

class sklearn.linear_model.SGDClassifier(loss=‘hinge’, *, penalty=‘l2’, alpha=0.0001, l1_ratio=0.15, fit_intercept=True, max_iter=1000, tol=0.001, shuffle=True, verbose=0, epsilon=0.1, n_jobs=None, random_state=None, learning_rate=‘optimal’, eta0=0.0, power_t=0.5, early_stopping=False, validation_fraction=0.1, n_iter_no_change=5, class_weight=None, warm_start=False, average=False)

Once again only one obvious hyperparameter.

We might also want to look at learning_rate.

And there might also be method parameters that would qualify for tuning.

Let’s go with the first, alpha, the regularization penalty. This one defaults to a value considerably lower than that used for the SVM model. Based on the sklearn documentation, I think we will try the following values for alpha: 10.0**-np.arange(1,7).

In [14]:
cv_scores_3 = {}
reg_alphas = [1e-4, 1e-3, 1e-2, 1e-1, 1e0, 1e1, 1e2, 1e3]
reg_alphas = 10.0**-np.arange(1,7)

for alpha in reg_alphas: sgd = linear_model.SGDClassifier(alpha=alpha, max_iter=1000, random_state=0) sgd_scr = ms.cross_val_score(sgd, X_trn_std, y_trn, cv=5, scoring="accuracy") cv_scores_3[str(alpha)] = sgd_scr

In [15]:
for lbl, scrs in cv_scores_3.items():
  print(f"{lbl}: {[round(scr,5) for scr in scrs]} -> {scrs.mean()}")
0.1: [0.98901, 0.96703, 0.97802, 0.97802, 0.95604] -> 0.9736263736263737
0.01: [0.97802, 0.96703, 1.0, 0.98901, 0.96703] -> 0.9802197802197803
0.001: [0.96703, 0.96703, 0.98901, 0.97802, 0.94505] -> 0.9692307692307693
0.0001: [1.0, 0.96703, 0.95604, 0.97802, 0.92308] -> 0.964835164835165
1e-05: [0.98901, 0.95604, 0.97802, 0.97802, 0.92308] -> 0.964835164835165
1e-06: [0.96703, 0.94505, 0.98901, 0.95604, 0.92308] -> 0.956043956043956
In [16]:
gd_means = [scrs.mean() for _, scrs in cv_scores_3.items()]
ax = plt.axes()
ax.plot(reg_alphas, gd_means, color='k', linestyle='-')
ax.scatter(reg_alphas, gd_means, color='blue', linestyle='-')
Out[16]:
[<matplotlib.lines.Line2D at 0x23755bd31c0>]
Out[16]:
<matplotlib.collections.PathCollection at 0x23755bd3640>
plot of cross-validation tuning scores for alpha parameter for SGDClassifier model

Ok, let’s fit the training dataset and measure performance against the test dataset.

In [17]:
sgd = linear_model.SGDClassifier(alpha=0.010, max_iter=1000, random_state=0)
sg_f = sgd.fit(X_trn_std, y_trn)
sg_preds = sgd.predict(X_tst_std)
print(f'SGD, alpha=0.010 -> {accuracy_score(y_tst, sg_preds)}')
SGD, alpha=0.010 -> 0.9824561403508771

That score pretty much the same as the SVM model with C=1.

If we wanted to tune multiple hyper-parameters, a reasonable approach would be to find the best value for one of them. Then using the selected value for the first, tune the second. Repeating for as many parameters as you wish to test. That’s where the scikit-learn methods really, help. They do them all at one time — at the expense of computing cycles of course.

scikit-learn Hyper-parameter Optimizers

sci-kit learn provides a number of hyper-parameter optomizers. The two I’ve seen the most in on-line articles/posts/tutorials are:

GridSearchCV pretty much does what we did. The randomized version is probably more efficient time-wise if you have a lot of possibilities. Both accept a dictionary providing the parameters to tune. Which means they can tune on multiple parameters at the same time. The grid version will literally test every possible combination. The randomized version will sample a given number of candidates from a parameter space with a specified distribution.

I thought I might try one of them out. But as of 2021.10.11 afternoon I am not sure. Might leave it for later posts when I am actually working on a machine learning exercise.

sklearn.model_selection.RandomizedSearchCV

Ah, decided to at least have a quick look at one of the scikit-learn methods.

I will use scipy.stats.uniform to provide the distribution function for the alpha parameter in the SGDClassifier model.

In [18]:
# let's try one of sklearn's tuning methods
from scipy.stats import uniform as sp_rand
from sklearn.model_selection import RandomizedSearchCV

param_grid = {'alpha': sp_rand()} sgd = linear_model.SGDClassifier(max_iter=1000, random_state=0)

rsearch = RandomizedSearchCV(estimator=sgd, param_distributions=param_grid, n_iter=100) rsearch.fit(X_tst_std, y_tst) rs_tbl = pd.DataFrame(rsearch.cv_results_) display(rs_tbl) # summarize the results of the random parameter search print(rsearch.best_score_) print(rsearch.best_estimator_.alpha)

Out[18]:
RandomizedSearchCV(estimator=SGDClassifier(random_state=0), n_iter=100,
                   param_distributions={'alpha': <scipy.stats._distn_infrastructure.rv_frozen object at 0x0000026EE37DA4C0>})
mean_fit_timestd_fit_timemean_score_timestd_score_timeparam_alphaparamssplit0_test_scoresplit1_test_scoresplit2_test_scoresplit3_test_scoresplit4_test_scoremean_test_scorestd_test_scorerank_test_score
00.000.000.000.000.39{'alpha': 0.3862038768185039}1.000.960.960.961.000.970.021
10.000.000.000.000.53{'alpha': 0.5287128751060116}1.000.910.960.961.000.970.0334
20.000.000.000.000.73{'alpha': 0.7274908233473413}1.000.910.960.961.000.970.0334
30.000.000.000.000.21{'alpha': 0.21497964544322512}1.000.960.960.961.000.970.021
40.000.000.000.000.5{'alpha': 0.5017156246455314}1.000.960.960.961.000.970.021
.............................................
950.000.000.000.000.93{'alpha': 0.9267266658038729}1.000.910.960.961.000.970.0334
960.000.000.000.000.32{'alpha': 0.321020089526599}1.000.960.960.961.000.970.021
970.000.000.000.000.77{'alpha': 0.7742128565090565}1.000.910.960.961.000.970.0334
980.000.000.000.000.1{'alpha': 0.0962207696348325}1.000.960.960.911.000.970.0334
990.000.000.000.000.13{'alpha': 0.13382122884026693}1.000.960.960.911.000.970.0334

100 rows × 14 columns

0.9739130434782609
0.3862038768185039
In [19]:
# And test the suggested value
sgd_sr = linear_model.SGDClassifier(alpha=rsearch.best_estimator_.alpha, max_iter=1000, random_state=0)
sgsr_f = sgd.fit(X_trn_std, y_trn)
sgsr_preds = sgd.predict(X_tst_std)
print(f'SGD, alpha={rsearch.best_estimator_.alpha:.4f} -> {accuracy_score(y_tst, sg_preds)}')
SGD, alpha=0.3862 -> 0.9824561403508771

Pretty much in line with the number we got using regular cross-validation on a selection of potential values.

sklearn.model_selection.RandomizedSearchCV may have taken a little longer to run, but it looks to have tested a lot more values.

Done

And, with that I do think this post is ready to be finished. Perhaps not horribly informative. Maybe a different problem and a model with more hyper-parameters would have been a better choice. But, it does provide a workable introduction to the concepts and tools available.

Feel free to download and play with my version of this post’s related notebook.

Resources