DFA is primarily a classification method | My Assignment Tutor


Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 1 of 18Tutorial Solutions – Week 7 (DFA)Question 1:a) What is the goal of DFA?Solution:DFA is primarily a classification method that incorporates dimension reduction.To determine which variables discriminate between two or more naturally occurring groups.To model a function that can be used to predict membership in groups based on measuredvariables.b) Describe the relationship between DFA and MANOVA.Solution:DFA is MANOVA reversed.In MANOVA, the independent variable is the group variable and the dependent variables arethe multiple continuous measurement variables or the DFs.In DFA, the independent variables are the multiple continuous measurement variables andthe dependent variable is the group variable.c) How many DFs are produced through DFA?Solution:The smallest of either number of variables or groups-1d) Unless otherwise stated in the model how are prior probabilities calculated? What istheir relationship to the posterior probabilities?Solution:They are the observed proportions of individuals in each group. They are the probability ofgroup membership absent any other information on measured variables.The posterior probabilities of group membership take into account the prior probabilities aswell as the additional information from the DFs based on the original measured variables.Question 2:Complete exercise 1 at the end of Chapter 8 of Manly using the data set ‘mandiblefull.dat’.As well as classifying individuals by species as stated in Manly, we will also classify by sex.a) If there are more than 20 individuals in each of the sex or species groups to beclassified, then we can use training and test data sets for that analysis. Createfrequency tables to determine the number of cases in each species and sex.Solution:• Understand contents of data set> head(mf)Case Group X1 X2 X3 X4 X5 X6 X7 X8 X9 SexDFA: groups 1 2 3 4 5MANOVA : 1 2 3 4 5 groupsMANOVA : 1 2 3 groupsX X X X XX X X X XZ Z Z= + + + ++ + + + =+ + =Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 2 of 181 1 1 123 10.1 23 23 19 7.8 32 33 5.6 12 2 1 137 9.6 19 22 19 7.8 32 40 5.8 13 3 1 121 10.2 18 21 21 7.9 35 38 6.2 14 4 1 130 10.7 24 22 20 7.9 32 37 5.9 15 5 1 149 12.0 25 25 21 8.4 35 43 6.6 16 6 1 125 9.5 23 20 20 7.8 33 37 6.3 1> str(mf)‘data.frame’: 77 obs. of 12 variables:$ Case : int 1 2 3 4 5 6 7 8 9 10 …$ Group: int 1 1 1 1 1 1 1 1 1 1 …$ X1 : int 123 137 121 130 149 125 126 125 121 122 …$ X2 : num 10.1 9.6 10.2 10.7 12 9.5 9.1 9.7 9.6 8.9 …$ X3 : int 23 19 18 24 25 23 20 19 22 20 …$ X4 : int 23 22 21 22 25 20 22 19 20 20 …$ X5 : int 19 19 21 20 21 20 19 19 18 19 …$ X6 : num 7.8 7.8 7.9 7.9 8.4 7.8 7.5 7.5 7.6 7.6 …$ X7 : int 32 32 35 32 35 33 32 32 31 31 …$ X8 : int 33 40 38 37 43 37 35 37 35 35 …$ X9 : num 5.6 5.8 6.2 5.9 6.6 6.3 5.5 6.2 5.3 5.7 …$ Sex : int 1 1 1 1 1 1 1 1 2 2 …Nine different mandible measurements sampled from 5 canine species groups, anddistinguishing between males and females.77 observations on 11 variables (excluding case number which is a variable in this datasetbut not an interesting variable to analyse!). Nine continuous variables to be included in DFAwith species group and sex as classifiers.> table(mf$Group)1 2 3 4 516 20 17 14 10Sample sizes in each of the species groups are 20 or less which is not large enough to usetraining and test datasets. If we tried to create a training set based on 70% from eachgroup the best case scenario would be species group 2 which would have 14 cases in thetraining set and only 6 in the test set. For species group 5 we would have 7 in the traininggroup and only 3 in the test set. We could do this partitioning and the analysis would work(we would get output results), but the analysis would not be very convincing based on suchsmall sets. In these situations it is often best to acknowledge the limitations of the data andproceed with the analysis based on the whole data set and acknowledge the bias inherent inpredictions because we have not had the benefit of an independent testing set. For thespecies group analysis we will not use training and test sets.For sex I will create a temporary vector of the sex variable so that I can exclude species 5with unknown sex cases> mfsex table(mfsex$Sex)1 235 32Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 3 of 18After excluding group 5 in rows 68 to 77 that had unknown sex there are enough femalesand males to use a training and test set. We will proceed with creating these partitions bysex next.b) Try creating training and test sets for sex. Has the portioning created samples of sizesthat you would expect? Discuss.Solution:> library(caret)> set.seed(42)> inTrain mfstrain mfstest table(mfstrain$Sex)1 225 26> table(mfstest$Sex)1 210 6Let’s check that our partition worked properly. Seems a bit odd that we have onemore in the training set for sex 2 (26 individuals) that had n=32 than for sex 1 (25individuals) that had more individuals to start with (n=35). If we check thecalculation it should be 0.75*35=26.25 and 0.75*32=24. This odd partitioning hashappened because R does not recognise Sex as a factor. If we convert Sex to afactor and then re-run the partitioning we get:> mfsex$Sex set.seed(42)> inTrain mfstrain mfstest table(mfstrain$Sex)1 227 24> table(mfstest$Sex)1 28 8We now have 27 and 24 individuals in the training set as we expected from thecalculation above. When dealing with large data sets, conversion of the partitioningvariable to a factor isn’t really important, as we are generally after a ‘rough’partition and when you have lots of data, a few cases here or there makes verySource: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 4 of 18little difference. When working on smaller data sets, or where the number of casesin each group is very unbalanced, not converting to a factor first may make adifference in the performance of your model. However, if I felt that one or twocases was going to make a big difference to the performance of my model than Iwould be questioning whether my sample size was large enough to be using DFA (apredictive model) and/or large enough to be using training/test sets.c) Convert Group and Sex to factors in your original dataframe. What do you notice aboutsex as a factor variable?Solution:> mf$Group mf$Sex str(mf)‘data.frame’: 77 obs. of 12 variables:$ Case : int 1 2 3 4 5 6 7 8 9 10 …$ Group: Factor w/ 5 levels “1”,”2″,”3″,”4″,..: 1 1 1 1 1 1 1 1 1 1 …$ X1 : int 123 137 121 130 149 125 126 125 121 122 …$ X2 : num 10.1 9.6 10.2 10.7 12 9.5 9.1 9.7 9.6 8.9 …$ X3 : int 23 19 18 24 25 23 20 19 22 20 …$ X4 : int 23 22 21 22 25 20 22 19 20 20 …$ X5 : int 19 19 21 20 21 20 19 19 18 19 …$ X6 : num 7.8 7.8 7.9 7.9 8.4 7.8 7.5 7.5 7.6 7.6 …$ X7 : int 32 32 35 32 35 33 32 32 31 31 …$ X8 : int 33 40 38 37 43 37 35 37 35 35 …$ X9 : num 5.6 5.8 6.2 5.9 6.6 6.3 5.5 6.2 5.3 5.7 …$ Sex : Factor w/ 3 levels “0”,”1″,”2″: 2 2 2 2 2 2 2 2 3 3 …Sex has 3 factors because of species group 5 which are of unknown sex. Need to rememberwhen further analysing based on sex that group 5 must be excluded. This is a goodexample of using structure (str) function frequently when modifying your data to doublecheck that the changes you make actually create the expected outcome or changes to yourdata.d) Perform whatever methods you feel would be appropriate before starting DFA to classifyby species group.Solution:We are not using training/test sets for this analysis by species group so let’s look at thedata as a whole• Graphical display of variables> library(lattice)> splom(mf[,3:11], groups=mf$Group)Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 5 of 18There appears to be some strong bivariate correlations and clustering of colours /groupingof data. No obvious extreme outliers in any pairing of the 9 variables. Notice that with thesplom function we can define that only numeric variables are plotted without having tosubset the data and we could also define the cases to be included by adding row numbersbefore the comma, For example to plot rows 1 to 20 and columns 3 to 6 we would use> splom(mf[1:20,3:6], groups=mf$Group)• Check correlations> (cormf0.7)Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 6 of 18• Check descriptives and univariate normality. In the MVN package the default tests arethe univariate Shapiro Wilks test and the multivariate Mardia’s test.> mvn(mf[,3:11], multivariatePlot=”qq”)$`multivariateNormality`Test Statistic p value Result1 Mardia Skewness 320.04584828967 5.3464953551982e-12 NO2 Mardia Kurtosis 3.96172222565234 7.44110799459907e-05 NO3 MVN NO$univariateNormalityTest Variable Statistic p value Normality1 Shapiro-Wilk X1 0.9208 1e-04 NO2 Shapiro-Wilk X2 0.9824 0.3672 YES3 Shapiro-Wilk X3 0.9359 8e-04 NO4 Shapiro-Wilk X4 0.9538 0.007 NO5 Shapiro-Wilk X5 0.9116 1e-04 NO6 Shapiro-Wilk X6 0.9861 0.5705 YES7 Shapiro-Wilk X7 0.8588 mvn(mf[,3:11], mvnTest=”royston”, desc=FALSE)$`multivariateNormality` TestHp value MVN1 Royston 42.32811 1.582013e-08NO All tests and QQplot conclude that the data is not normal. Identifying which cases are notunivariate normal can be tricky.> library(car)> scatterplotMatrix(mf[,3:11])From the bivariate scatter plots nothing stands out.Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 8 of 18This is the curse of multivariate data. We are trying to identify cases that behave out ofcharacter for the dataset based on their measurements for 9 variables, but we can onlyvisualise (and understand) the relationships between pairs of these 9 variables.In the MVN package there is an mvOutlier function.> result names(result) [1] “multivariateNormality” “univariateNormality”“Descriptives”“multivariateOutliers”“newData” We can call these by:> result$multivariateOutliersObservation Mahalanobis Distance Outlier1 1 153.943 TRUE2 2 152.914 TRUE3 3 133.364 TRUE4 4 125.414 TRUE5 5 92.065 TRUE6 6 90.909 TRUE7 7 89.420 TRUE8 8 88.330 TRUE9 9 74.451 TRUE10 10 73.719 TRUE11 11 69.982 TRUE12 12 60.571 TRUE13 13 49.095 TRUE14 14 47.564 TRUE15 15 38.585 TRUE16 16 32.398 TRUE17 17 24.855 TRUE 181819.042TRUE> result$newDataX1X2 X3 X4 X5X6 X7 X8X9 19 110 8.1 18 16 19 7.1 31 32 4.720 116 8.5 20 18 18 7.1 32 33 4.721 114 8.2 19 18 19 7.9 32 33 5.122 111 8.5 19 16 18 7.1 30 33 5.023 113 8.5 17 18 19 7.1 30 34 4.6….The output for newData is too long to include here, but the outlier command provides a listof the Mahalanobis distances and whether or not they are an outlier. The newDatacommand produces a new data set, apparently with the outliers removed, however careneeds to be taken to double check this new data set. If you rerun the mvOutlier analysis onthe new data:> result1 (mf.lda=lda(Group~X1+X2+X3+X4+X5+X6+X7+X8+X9, data=mf))Call:lda(Group ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = mf)Prior probabilities of groups:1 2 3 4 50.2077922 0.2597403 0.2207792 0.1818182 0.1298701Group means:X1 X2 X3 X4 X5 X6 X7 X8 X91 125.9375 9.72500 21.37500 21.12500 19.37500 7.675000 32.06250 36.62500 5.8687502 111.0000 8.18000 18.60000 17.00000 18.20000 6.815000 30.35000 33.35000 4.8050003 133.2353 10.72353 24.05882 23.64706 21.47059 8.488235 29.00000 37.70588 6.6117654 157.3571 11.57857 26.21429 24.71429 24.71429 9.335714 40.21429 44.78571 7.4071435 122.8000 10.34000 20.00000 22.90000 19.30000 8.190000 32.80000 35.90000 6.170000Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 11 of 18Coefficients of linear discriminants:LD1 LD2 LD3 LD4X1 0.1262457 -0.02141331 -0.07415646 -0.09390916X2 -0.1080702 0.02953726 0.56106589 -0.07130602X3 -0.2910043 -0.03424176 -0.10744062 -0.14141364X4 0.2270426 0.04637768 0.43534619 0.09499400X5 0.8891596 -0.74578775 -1.12909329 0.68431718X6 0.8115373 0.10875761 0.42482620 1.09992590X7 -1.3427053 -0.17820451 0.33210847 0.04612700X8 -0.2269895 -0.09534583 0.01837503 -0.07252568X9 1.6386618 0.44639830 1.20670414 -0.78224140Proportion of trace:LD1 LD2 LD3 LD40.6505 0.2592 0.0860 0.0042The first 2 DFs explain 90% of the MV differences between groups. DF1 most stronglyrepresents X7 (length of first to third molar), X9 (breadth of lower canine), X6 (breadth offirst molar) and X5 (length of first molar). The difference in the sign of the loadings meansthat an individual with a high score on DF1 would have a small X7 and large X9, X6 and X5.f) Classify each individual based on your DFA model. What percentage of individuals wascorrectly classified?Solution:> group.pred=predict(mf.lda)> table(mf$Group, group.pred$class)1 2 3 4 51 16 0 0 0 02 0 20 0 0 03 0 0 17 0 04 0 0 0 14 05 2 0 0 0 8Nearly perfect discrimination between groups: 75/77 or 97% of individuals were correctlyclassified. Two individuals from group 5 (prehistoric Thai dogs) were incorrectly classified asgroup 1 (Modern dogs from Thailand). Because we were unable to keep some data separateand isolated from the model building process (i.e. we could not create training and test setsfor this analysis) we must remembered that our model is positively biased towards the bestpossible predictions for this data as the same data was used to build and test the model.g) Plot individuals for DF1 v DF2 with individuals classified by original group and thenpredicted group. Interpret. Can you identify the two incorrectly classified individuals?Plot DF3 v DF4 using both original and predicted group labels. Interpret.Solution:> # DF1 and DF2 with individuals grouped by original Group classifications> lda.temp xyplot(LD1~LD2, data=lda.temp, groups=class,+ auto.key=list(title=”Sampled Group”, space = “top”, cex=1.0))Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 12 of 18Group 3 (Cuons) and group 4 (Indian wolves) are very distinct groups. The centre of Group2 (Golden Jackals) is distinct from the other groups but there is some overlap of individualswith groups 1 (Modern dogs in Thailand) and 5 (Prehistoric Thai dogs) which themselvesoverlap almost completely with each other.> # DF1 and DF2 with individuals grouped by predicted Group classifications> lda.temp xyplot(LD1~LD2, data=lda.temp, groups=class,+ auto.key=list(title=”Predicted Group”, space = “top”, cex=1.0))The misclassified individuals are circled. Given the overlap in groups 1 and 5 on DF1 andDF2 this is not surprising. Note: I have identified the individuals visually only (not code wasused to do this).Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 13 of 18> # DF3 and DF4 with individuals grouped by original Group classifications> lda.temp xyplot(LD3~LD4, data=lda.temp, groups=class,+ auto.key=list(title=”Sampled Group”, space = “top”, cex=1.0))> # DF3 and DF4 with individuals grouped by predicted Group classifications> lda.temp xyplot(LD3~LD4, data=lda.temp, groups=class,+ auto.key=list(title=”Predicted Group”, space = “top”, cex=1.0))DF3 and DF4 account for only 9.02% of the total variance but they do help discriminatebetween groups 1 and 5 along DF3.Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 14 of 18h) What would you conclude about this analysis? Include any limitations you would placeon interpretation.Solution:Despite small group samples and deviations from both univariate normality and MVN theanalysis was still successful and accurately predicted species group membership. Thisanalysis provides a good demonstration of the difference between analysis undertakenpurely for prediction of group membership, and analysis undertaken for inference aboutdata generation processes. When group membership prediction is the only goal of analysisthen the ‘success’ or value of the analysis can be judged based on whether prediction isaccurate (as it was in this case). Since we accurately predicted species group based on the9 variables, does it matter that we may have violated the assumption of MVN? Presumablyif we had met MVN then the prediction would have only been more accurate not less. Thelack of an independent data set for testing (prediction) purposes is a limitation of thisanalysis as it has introduced some positive bias into the prediction outcomes (the model isprobably more likely to accurately classify data that was used to build the model than if itwas tested on an independent sample from the same population.Alternatively, if we had been performing hypothesis testing analysis from which we hopedto infer something about what was driving differences between the data observed in eachgroup (such as MANOVA), then not being able to tell if we have violated the testassumptions, or by how much we may have violated them, introduces another dimension ofuncertainty into our interpretation.i) Run DFA to classify by Sex. Species group 5 individuals are of unknown sex so excludeall of this species from this analysis. As part of your interpretation comment on the priorprobabilities for Sex. Give the total % correctly classified and the %n of each sexmisclassified.Solution:First, save the cases for Species 1 to 4 into a new data frame and check structure andcreate a frequency table for sex.> mfsex str(mfsex)‘data.frame’: 67 obs. of 12 variables:$ Case : int 1 2 3 4 5 6 7 8 9 10 …$ Group: Factor w/ 5 levels “1”,”2″,”3″,”4″,..: 1 1 1 1 1 1 1 1 1 1 …$ X1 : int 123 137 121 130 149 125 126 125 121 122 …$ X2 : num 10.1 9.6 10.2 10.7 12 9.5 9.1 9.7 9.6 8.9 …$ X3 : int 23 19 18 24 25 23 20 19 22 20 …$ X4 : int 23 22 21 22 25 20 22 19 20 20 …$ X5 : int 19 19 21 20 21 20 19 19 18 19 …$ X6 : num 7.8 7.8 7.9 7.9 8.4 7.8 7.5 7.5 7.6 7.6 …$ X7 : int 32 32 35 32 35 33 32 32 31 31 …$ X8 : int 33 40 38 37 43 37 35 37 35 35 …$ X9 : num 5.6 5.8 6.2 5.9 6.6 6.3 5.5 6.2 5.3 5.7 …$ Sex : Factor w/ 3 levels “0”,”1″,”2″: 2 2 2 2 2 2 2 2 3 3 …> table(mfsex$Sex)0 1 20 35 32Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 15 of 18Note: To exclude the unused zero category from the table (not from the dataset) we canuse> table(droplevels(mfsex)$Sex)1 235 32To remove the zero level from the dataframe we can just redefine Sex as a factor in thedataframe.> mfsex$Sex table(mfsex$Sex)1 235 32Now create the training and test sets:> library(caret)> set.seed(42)> inTrain mfstrain mfstest table(mfstrain$Sex)1 227 24> table(mfstest$Sex)1 28 8Now run the analysis> (mfs.lda=lda(Sex~X1+X2+X3+X4+X5+X6+X7+X8+X9, data=mfstrain))Call:lda(Sex ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = mfstrain)Prior probabilities of groups:1 20.5294118 0.4705882Group means:X1 X2 X3 X4 X5 X6 X7 X8X91 134.5185 10.30741 23.51852 21.96296 21.29630 8.207407 33.33333 38.481486.3370372 126.0417 9.58750 21.79167 20.75000 20.20833 7.758333 31.50000 36.875005.758333Coefficients of linear discriminants:LD1X1 0.002746617X2 -0.143949431X3 -0.111656881X4 0.099646647X5 0.601249353X6 0.108912931X7 -0.221870446X8 0.212480164X9 -2.256060335Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 16 of 18Only one DF because only 2 levels in Sex. The prior probabilities of Males and Females arealmost 50:50. This may have been an aspect of experimental design (data collection) andnot an indication of population proportions. True proportions may be unknown with thisdata.> sex.pred sex.predtest # table of predicted vs actual> table(mfstrain$Sex, sex.pred$class)1 21 19 82 9 15> table(mfstest$Sex, sex.predtest$class)1 21 6 22 4 4Predictions of the training data set which contributed to the model construction correctlyclassified 34/51=67% of individuals into male (1) and female (2). Eight males (8/27=30%of males) and nine females (9/24=38% of females) were misclassified.Predictions using the DFA model applied to the test set correctly classified 10/16=63% ofindividuals. 25% of males were misclassified while 50% of females were misclassified.Classification by sex has not been as successful as classification by species group.j) See if you can adjust the code from part f) to predict group membership based on onlythe first two DF’s. Why might this be useful to consider?Solution:> group.pred2=predict(mf.lda, dimen=2)> table(mf$Group, group.pred2$class)1 2 3 4 51 14 1 0 0 12 2 18 0 0 03 0 0 17 0 04 0 0 0 14 05 7 0 0 0 3The addition of ‘dimen=2’ to the code means use the first 2 dimensions only and dimen=3would mean use first 3 etc. There is no easy way to choose DF 1 and DF 3 – and I am notsure there would be any good reason to. If you run ‘?predict.lda’ and have a look at thehelp file you should be able to see an explanation of this. The package and functiondocumentation can take a while to get the hang of and sometimes it can help to workbackwards.In general, for DFA we use all DFs for prediction. Remember that the number of DFs isdetermined by either the number of variables or number of groups -1, whichever is smaller.Generally, we have fewer groups that we want to classify into than we do variablesmeasured on our individual cases. So, generally finding the DFs by the group-1 rule reducesor dimensionality quite a bit (in the example above in part j) we went from 9 variables to 4Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 17 of 18DFs) and it is easy to include all DF’s produced into prediction. However, using only the first1, 2, then 3 etc is a good way to see how your classification improves and this might be avery useful tool in some cases.Question 2 i) update regarding change to function leading to some different results.There has been an update to the caret package which is affecting the way thecreateDataPartition function works. Even if we use the same seed value the updatedpackage is selecting different cases for inclusion in the training and test sets than the olderversion of the package did. This then leads to different loadings on the discriminantfunction. In DFA however the loadings are not the thing we are most interested in. Insteadwe interpret the success of the prediction/classification of the test set into the groups. Usingcaret package version 6.0-86 you should see the following output:> set.seed(42)> inTrain mfstrain mfstest table(mfstrain$Sex)1 227 24> table(mfstest$Sex)1 28 8> library(MASS)> (mfs.lda=lda(Sex~X1+X2+X3+X4+X5+X6+X7+X8+X9, data=mfstrain))Call:lda(Sex ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = mfstrain)Prior probabilities of groups:1 20.5294118 0.4705882Group means:X1 X2 X3 X4 X5 X6 X7 X8 X91 132.3704 10.125926 22.51852 21.66667 20.92593 8.133333 32.81481 38.07407 6.2185192 124.7917 9.420833 21.58333 20.54167 20.08333 7.720833 31.54167 36.33333 5.737500Coefficients of linear discriminants:LD1X1 -0.1308965X2 -1.1746677X3 0.3137531X4 0.2591029X5 0.3214093X6 0.5931156X7 0.0616575X8 0.1703411X9 -0.8683559> sex.pred sex.predtest # table of predicted vs actual> table(mfstrain$Sex, sex.pred$class)1 21 19 82 11 13> table(mfstest$Sex, sex.predtest$class)Source: Manly, Bryan F.J. Multivariate Statistical Methods: A Primer, Third Edition, CRC Press,07/2004.Page 18 of 181 21 5 32 5 3Predictions of the training data set which contributed to the model construction correctlyclassified 32/51=63% of individuals into male (1) and female (2). Eight males (8/27=30%of males) and nine females (11/24=46% of females) were misclassified.Predictions using the DFA model applied to the test set correctly classified 8/16=50% ofindividuals. 38% of males were misclassified while 63% of females were misclassified.Classification by sex has not been as successful as classification by species group.This model has been even less successful than the one above based on a different trainingand test set. This is a good example of using repeated analysis to help us understand howsensitive the method is to small changes in our data. In this example, we can see that the‘luck’ of selecting a ‘better sample can lead to a better trained model. If we had more dataoverall some of this sensitivity to small changes in sample composition could be reduced.

[Button id=”1″]

Thanks for installing the Bottom of every post plugin by Corey Salzano. Contact me if you need custom WordPress plugins or website design.

Looking for a Similar Assignment? Our ENL Writers can help. Get your first order at 15% off!

Order

Hi there! Click one of our representatives below and we will get back to you as soon as possible.

Chat with us on WhatsApp