Virtual University Journals
Search

Comparative Analysis of Supervised Machine Learning Regression Models on California Housing Dataset

Zahid Khan, Muhammad Sohail, Habiba Mehak
Abstract: This paper is a comparative analysis of various regression machine model under supervision, on the use of the California Housing data (published in UCI machine learning Repository). The four algorithms under analysis include the use of Linear Regression, Decision Tree Regression, random Forest Regression, and the Support Vector Regression, which is widely used in regression. The standard predictive accuracy measures that were used to evaluate model performance are Mean Squared error (MSE), root mean squared error (RMSE), and the coefficient of determination (R2). The empirical evidence shows that the ensemble-based methods, especially the Random Forest Regression comprise methods, which are more reliable, as opposed to single-model methods, even with the predictive precision factor. The findings highlight the importance of the state-of-the-art ensemble approaches in the modelling of the complex real-world housing data and reveal some insights into the usage of the approaches in the large-scale regression operation.
Keywords: supervised machine learning, regression model, linear regression, decision tree regression, random forest regression; support vector regression
Full Text: PDF