Ace_Code

A

What is a one-hot-encoded column and why might it be needed when transforming a feature? Are the source values continuous or discrete?

It refers to splitting the column which contains numerical categorical data to many columns. the fields in CATEGORICAL_COLUMNS are transformed from categorical columns to one-hot-encoded columns (indicator column).this coverts the variable so that it is used with a higher degree of accuracy when predicting. The source veriables are descrite because they are intigers.

What is a dense feature? For example, if you execute example = dict(dftrain) and then tf.keras.layers.DenseFeatures(your_features)(your_object).numpy(), how has the content of your data frame been transformed? Why might this be useful?

A dense feature training data is described with Feature Columns. At the first layer of the model, this column-oriented data should be converted to a single Tensor. This layer can be called multiple times with different features/ transforming the object into a indicator column. this is use full for its another way to work the model to get a better understanding of the data being analyzed.

Provide a histogram of the probabilities for the logistic regression as well as your boosted tree model. How do you interpret the two different models? Are their predictions essentially the same or is there some area where they are noticeable different. Plot the probability density function of the resulting probability predictions from the two models and use them to further illustrate your argument. Include the ROC plot and interpret it with regard to the proportion of true to false positive rates, as well as the area under the ROC curve. How does the measure of the AUC reflect upon the predictive power of your model?

ROC_Plot

Roc_plt

Linear

Pic_July_22

Boosted

Other_Hist

the roc curve dose verify that the model is becoming overfit. the true positives are staying the same wich indicates that the moddle is overfitting. In comparasent to the original histogram, the BoostedTree model preformed a little better. The histogram from the boostedtree produced a more accurate model that showed a prediction a higher faitality rate, which proved to be correct.

B

Boosted Trees continued (with model understanding)

1.Upload your feature values contribution to predicted probability horizontal bar plot as well as your violin plot. Interpret and discuss the two plots. Which features appear to contribute the most to the predicted probability?

Value_1 Value_2

in the two plots above, one can see the veriables that most perdicted the likley hood of death on the Titanic. the fiest was whether or not you were male or femal. Given the rules of the day, wemon and children were to be put on the limited life boats afailable. thus, being a male was a huge predicter in the prediction of a faitality. The next graph shows how the class of the passenger was also a lager predictor of the faitality of a passenger on board the H.M.S Titanic.

2.Upload at least 2 feature importance plots. Which features are the most important in their contribution to your models predictive power?

Value_3 value_4

In the first graph, one can see that the most important veriable in this list of embarkation locations was southampton. As most of the passengers embarked from this location. In the second Graph we see that the most important perdiction veriable of servivability was being a female.