A.A The purpose of why we split the data in to two sets is so the “train set” can build the model and the “test sets” are how the computer checks the accuracy of the model. This is done so that the model has a base line to run off of.
B.A. The Softmax function, or activation function that turns numbers aka logits into probabilities that sum to one. Softmax function outputs represent the probability distributions of a list of potential outcomes. The 10 neurons in the last layer corresponds to the clothing type.
C.A. the loss function will measure the accuracy of the model while during training Different loss functions will give different errors for the same prediction, and thus have a considerable effect on the performance of the model. This is important as the quality of the model will dictate the value of our output. The optimizer argument They tie together the loss function and model parameters by updating the model in response to the output of the loss function. In simpler terms, “optimizers shape and mold your model into its most accurate possible form by futzing with the weights.”( https://algorithmia.com/blog/introduction-to-optimizers)
D.A. 1.What is the shape of the images training set (how many and the dimension of each)? Th shape was 60000 and the image 28x28 pixels 2.What is the length of the labels training set? 60,000 image is 28x28 pixels 3.What is the shape of the images test set? 10,000 and image is 28x28 pixels 4.Estimate a probability model and apply it to the test set in order to produce the array of probabilities that a randomly selected image is each of the possible numeric outcomes (look towards the end of the basic image classification exercises for how to do this — you can apply the same method applied to the Fashion MNIST dataset but now apply it to the hand written letters MNIST dataset). The output array = [[2.3905121e-11 6.7778480e-08 9.9999988e-01 2.5209381e-11 1.4328035e-25 5.8861547e-13 1.0552294e-12 2.8645500e-20 8.1970752e-10 1.4504924e-21]]