View on GitHub


Download this project as a .zip file Download this project as a tar.gz file


This summaries of the analysis in Tableau graphs


Data consists of n = 57258 real state properties from the states of Arizona, Alaska, Oregon, California, New Mexico, Texas, and Mexico. Only 11 listings are out of Arizona state, so they are excluded in this analysis. The data used are all listings in Arizona. Data comes with 15 variables in total. Variables describes information about the listings such as Geo location, list price, closing price, number of bedroom, house size and etc.,


Complete Summary tab: It is a summary of all listing data without cleaning, filtering. For simplicity reason, they divided into 13 neiboughhoods at this tab

Predictive Model Evaluation tab: In the geo map, the number on top is the actual closing price; the numbers on the bottom are the differences of predicted prices to actual prices. Difference_range is a filter segmented in absolute difference

1000 Trials Summary tab: The data set is randomly divided for cross validation, in 1000 different seeds. (set.seed(i)), Pred_diff, is the ABSOLUTE difference of predicted prices to actual prices. This model prediction' is off on average of $29388, compared to human prediction


The goal is to predict the actual market prices of properties in Arizona State, using all information including external data sources.

Data Cleaning & Methodologies

Versions of the analysis

Version 1: Using the existing data to build the predictive model, without variable "list price"

Version 2: Using the existing data to build the predictive model, with variable "list price"

Version 3: Importing external data sources:

Version 4: Ideal data set will include Years of the house; number of garages; decoration cost; buyer's payment method and etc.,


Version 1 conclusion: The average of Pred_diff is around $35000, and estimations have both lower-than and higher-than situations, suggesting there are other unknown factors affecting pricing


Version 2 conclusion: (Done in another file)When the additional variable "list price" is added, the prediction model almost makes the estimation perfectly. Suggesting human estimation provide valuable information other than current 15 variables.

Version 3 conclusion: (To be continued)

Support or Contact

You can contact me through My linkedin