CSCI433/CSCI933: Machine Learning – Algorithms andApplicationsAssignment Problem Set #2Lecturer: Prof. Philip O. Ogunbona(firstname.lastname@example.org)School of Computing and Information TechnologyUniversity of WollongongDue date: Saturday May 1, 6:00 p.m.IntroductionIn this assignment the tasks are set to help you hone your practical skills in building and testingmachine learning models, and develop theoretical insights for understanding algorithms. You willdesign and compare the performance of four regression predictors using the dataset provided. Furthermore, you will study and gain insight into the theoretical relationship between the algorithms soas to understand the basis of their performance.The assignment starts by requiring you to complete a reading and practice exercise of threechapters of the book by G´eron (2019, chp. 1, 2, & 4). A copy of the book is on Moodle for yourpersonal educational use in this subject only. By the end of this preliminary exercise you shouldunderstand:1. how to install Python and necessary libraries on your personal computer;2. key practical issues involved in building a machine learning model;3. the pipeline of end-to-end machine learning project;4. how to load, clean, wrangle, visualize and understand data as an essential initial step in buildinga practical predictive model;5. how to build a simple regression model.What needs to be done1. Read, study and understand the three chapters of the book (G´eron, 2019, Chp. 1, 2, & 4).Ensure you write and run the associated codes in the chapters.12. The popular Housing dataset is provided along with this specification in a .zip” archive file.It contains training dataset and test dataset. Also included in the archive file is the descriptionof the variables (features) in the dataset. Ensure that you really understand the organisationof the dataset. This is absolutely important – check the size, shape, etc.3. Using Python programming language and the scikit-learn machine learning library, implementthe following regression models on the housing dataset:(i) ordinary least squares; (ii) ridgeregression; (iii) PCA-regression, (iv) elastic-net regression. Your model is to predict the saleprice of houses4. Write a report on your experiments and the performance evaluation of the models.5. Your report will include a section that describes mathematically, the connection between ordinary least squares regression, ridge regression and PCA-regression.6. Your report will be presented in a conference paper format (see accompanying template) andshould detail your understanding of theory of the techniques and a succinct description ofexperiments. You will describe the techniques in your own words with appropriate equations.When you write an equation, the meaning of the symbols must be explained as well as theintuition behind the equation itself. Your report MUST not be more than four (4) pages in theformat specified by the template. The 4-page count excludes the list of references.7. You may need to look at some of the books available on Moodle site of this subject for moreinsight.8. Please, appropriately cite any other paper or book you have read in gaining deeper understanding of the concepts and methods.What needs to be submitted• You will prepare a zip” or rar” file containing your report (4-page PDF file) and Pythoncode (named : eval regression.py”) file.• Your code must run from command line as:python3 eval_regression.pyand write results indicating that your code works (e.g. prediction errors for each method) tostandard output (stdout).• The report should be typed (or typeset using LaTeX) with 12-point font, and with spacingsas specified in the template. Submitted report MUST be a PDF file. Any WORD documentshould have been converted to PDF before submission. Non-PDF reports will not be marked.• Submit the zip” or rar” via Moodle dropbox provided on or before the deadline.2Report marking schemeYour report should be according to the following format (i.e. headings):Title (5 marks) – Give your report a nice title and write your names and student number. Seetemplate.Introduction (5 marks) – Describe the problem of regression-based prediction. Provide some ofthe importance of this scheme and the role played in practical machine learning.Theory and properties of predictors (40 marks) – Describe the four predictors you have testedin your experimentation. This is very important because it shows how well you understand theproperties of the predictors. It is expected that you will write mathematical equations thatdescribe the predictor models. The marks awrded to this section gives an indication of theamount of work expected.Theoretical links (10 marks) – Briefly derive the mathematics supporting the theoretical linkamongst ordinary least squares regression, ridge regression and PCA-regression. Highlight theimplications of varying the parameters.Data preparation (20 marks) – Describe the data in your own words and highlight various statistics (mean, variance, etc.) along with any significant observation that could be gleaned fromthe data. You may include some graphs. But they must be described in your report (4 pagesmay not givee you room!). Describe the various methods and implications of the data preparations you undertook. Note that this is very important as it would have significant impacton the accuracy obtained from your predictor. You should discuss how you split the data fortraining, validation and testing.Experiments and evaluation (20 marks) – Describe the experiments you carried out demonstrate your understanding of the models/algorithms and justify the methods of performanceevaluation you have adopted. State the comparative evaluation estimates and justify the differences. This section definitely requires that you show the results in a table or graph.Discussion and conclusions (25 marks) – You are required to reflect and write about the differences amongst the various predictor models relative to their parameters, amount of datarequired for training, nature/format of data required and the accuracy obtained. In addition,you are required to reflect and describe any significant trend/observation you discovered withregards to what features may be dominant in determining the sale price of a house. For example, for a given house type, is there a subgroup of features that are more likely to fetch highsales value?ReferencesG´eron, A. (2019). Hands-on machine learning with scikit-learn, keras, and tensorflow (2nd ed.).Sebastopol, CA 95472: O’Reilly Media, Inc.3
Hi there! Click one of our representatives below and we will get back to you as soon as possible.