Predicting prices of used BMW cars

Jens Svensmark

May 10, 2021

Source

Source code for this project available on github.

Problem statement

Background

  • Cars are used throughout the world
  • Big resale market (due to cost and durability)
  • Many consumers have no clear idea about car prices
  • Makes navigating the market and negotiating with car dealers difficult

Goal

  • Predict resale prices of cars based on historic data
    • Target variable is continuous
    • Will use R-squared (R2) metric
    • This should be close to 1
  • Make predictions available to consumers

Data

Source

  • Provided by Datacamp
  • No details about collection known

Features

Feature Description Type
price Price in USD numerical
year Production year numerical
mileage Distance driven numerical
tax Road tax numerical
mpg Miles per gallon numerical
engineSize Size of engine numerical
Feature Description Type
model Car model categorical
transmission Type of transmission categorical
fuelType Fuel type categorical

Which features are the most important?

Simple data model

Sorry, your browser does not support SVG.

Full data model

Sorry, your browser does not support SVG.

Exploring the data

Year and mileage

price_of_year_mileage.png

Car model

Sorry, your browser does not support SVG.

Transmission

Sorry, your browser does not support SVG.

Tax, mpg and engine size

price_of_tax_mpg_enginesize.png

Fuel type

Sorry, your browser does not support SVG.

Predictive model

Linear model

mileage_fit.png

Feature selection

Last added feature Mean R2 test score
mileage 0.543242
year 0.643062
model 0.885855
engineSize 0.918769
transmission 0.924562
Last added feature Mean R2 test score
transmission 0.924562
fuelType 0.925534
mpg 0.928286
tax 0.928287

Feature selection

  • Include: mileage, year, car model, engine size and transmission.
  • Exclude: fuel type, mpg and tax.

Parameter interpretation

observable 10coef 10coef - 1
year 1.106 11%
engineSize 1.206 21%
10000*mileage 0.941 -6%

Parameter interpretation

Price relative to "Automatic"

transmission 10coef 10coef - 1
Manual 0.913 -9%
Semi-Auto 1.02 2%

Parameter interpretation

Price relative to "1 Series"

model 10coef 10coef - 1
2 Series 1.027 3%
3 Series 1.13 13%
4 Series 1.151 15%
5 Series 1.228 23%
6 Series 1.302 30%

Web interface prototype

Web interface prototype

https://svensmark.jp/dc_cert/predict_price/

web_page_screenshot.png

Conclusion

  • Built a linear model for predicting resale prices of BMW cars
  • Works fairly well
  • Model coefficients are explainable
  • Demonstrated web interface prototype

Going forward

Follow up with data collection team

  • Suspicious values in mpg, engine size and tax

If more accuracy is required

  • More complex model might help
  • But risk of overfitting and less explainability

Web interface

  • Improve design of web front end
  • Ensure scalability of back end depending on expected usage

Thank you for your attention

Any questions?

Additional background

Metric

R-squared (R2)

  • A number
  • Measure of how well the model describes the data
  • The closer to one the better

Data

Data model 2

Sorry, your browser does not support SVG.

Predictive model

Additional assumption

  • All car prices fall at the same rate with age and mileage, independent on car model and other factors

Parameter interpretation

Price relative to "1 Series"

model 10coef 10coef - 1
2 Series 1.027 3%
3 Series 1.13 13%
4 Series 1.151 15%
5 Series 1.228 23%
6 Series 1.302 30%
7 Series 1.542 54%
8 Series 2.07 107%
X1 1.162 16%
X2 1.204 20%
X3 1.435 44%
X4 1.492 49%
X5 1.762 76%
X6 1.791 79%
X7 2.382 138%
M2 1.488 49%
M3 2.183 118%
M4 1.672 67%
M5 1.754 75%
Z4 1.259 26%

90% Prediction interval

  • 90% of car prices expected to be within this interval
  • Indicates model uncertainty

Example:

  • Predicted price (p): $10,000
  • Relative half-width (h): 25%
  • 90% of cars between p/(1+h) and p*(1+h), that is from $8,000 to $12,500

90% Prediction interval with partial data

Last added feature Relative half-width
mileage 70%
model 41%
year 30%
engineSize 25%
transmission 24%