Machine Learning for Cost Estimation

The following project uses machine learning techniques to predict the material cost of bespoke manufactured equipment. The project is created using Python within the Jupyter Notebook environment, including packages Scikit Learn, Pandas, Numpy, Matplotlib and Seaborn.

Background

Cost estimation for bespoke manufacturing can be tedious and resource intensive. For large manufacturing projects with thousands of individual components, it is difficult to provide an accurate bill of material without having engineers perform a minimum level of pre-design work. Depending on the complexity of the machine or system, engineers may take days or weeks to develop a pre-design. 

Whilst this pre-design time is useful for the business if they have already secured an order for the manufacturing project, it is completely non-productive if only used to provide estimates of material costs for use in a failing tender bid.

This is particularly harmful for manufacturers in industries that practice a high frequency of bespoke  competitive tenders, and a typically low “win rate” for securing them. Engineers can waste valuable time in pre-designing basic systems to  generate cost estimates, only to not be awarded the final tender. 

Furthermore, if the manufacturer neglects to dedicate these engineering resources to estimate material costs accurately, the projects can easily be over or under priced which are both problematic

How can we quickly and cost effectively estimate the cost of projects to avoid over and under pricing?

In this project we use traditional ML techniques to predict the material costs of projects with a minimal amount of predictor features. This allows engineers to accurately predict material costs of projects quickly without a comprehensive layer of predesign.

Prior Research

There have been several researchers that have used machine learning in similar projects

Rickenbacher et al. (2013) - Linear Regression CNC machined parts

Rickenbacher et al. (2013) developed a cost model that can estimate the actual cost of a single part. The cost model uses linear regression to estimate manufacturing times for 24 different manufacturing operations.

Duran et al. (2012) - ANN for cost estimation for shell and tube heat exchangers

Duran et al. (2012) developed and tested a model of manufacturing cost estimating of piping elements during the early design phase through the application of Neural Networks. The developed model demonstrates that neural networks can improve the accuracy of cost estimation for shell and tube heat exchangers.

Kurasova- Multiple machine learning model in customized furniture production

Kurasova et al (2021) explored several various ML models to predict labor costs in bespoke furniture manufacturing

“One more issue arises, when the cost has been estimated before manufacturing while the information about the actual costs is most limited, it is called early cost estimation. The accuracy of this estimation can strongly affect company profits: too low price will reduce profits, too high price can deter customers.”

The following shows the different approaches by Kurasova et al (2021 to address the problem of cost estimate

Labor Intensive Approach
Machine Learning Approach
Results (Kurasova et al (2021)

The project yielded excellent results, with an R2 accuracy of 0.85 in the Random Forest model.

The project concluded;

Usually, the cost estimation is a complicated and time-consuming process due to the need to evaluate many components in the early design stage, when information is most limited. Moreover, a lot of human resources are required. 

Application of the machine learning techniques allows simplifying and accelerating this process by providing accurate and effective cost estimation.

Project Workflow

The typical project workflow is shown below. Data preprepartion and cleaning, followed by EDA, modelling and evaluation

Data Set and Preparation

The dataset consists of historical cost accounting data from a bespoke HVAC equipment manufacturer. The data is organized by a Job ID representing each unique manufactured instance of bespoke equipment Features include short and long written descriptions, part group identifiers, completed date, sale value and the target variable of material cost.

Exploratory Data Analysis

The cost ditribution and actual gross margin of each job is shown below. The median margin is 0.33, the strategy would be to eliminate all gross margins less than a critical value.

Modeling

Numerous ML models were used to predict cost, from basic to ensemble. XGBoost had the highest accuracy.

Results

To evaluate the model we first use iit to predict the gross margin for all systems manufactured in the 2019-2020 FY. We can see the predictions overlaid with the actuals in the histogram below.

We can then choose a minimum threshold for gross margin of 0.2 and eliminate all jobs that do not meet this  minimum criteria.In practice this would be analogous to rejecting all tenders where our cost estimates resulted in gross margins lower than 0.2

Once rejected we can evaluate the new total gross profit, gross margin and reduced labour hours for the year


By rejecting lower margin projects, the manufacturer can save 5676 labor hours over the FY19/20 year representing a 22.25%  increase in available production capacity. Assuming other orders can be taken at the average gross margin through replacement, the new gross profit would be $3.18M, a  $580,835 improvement. There are additional benefits such as lower working capital and lower overhead.

GM Threshold 2017-2021

We can examine how gross profit scales with choosing a greater gross margin threshold, and the additional replacement sales required to fill these orders.

Conclusion

This project shows that bespoke equipment costs can be estimated using machine learning models. Accurate estimation of costs can increase profits by eliminating lower margin projects. There is a need to investigate under/overpricing on yet to be awarded projects. 

References

Rickenbacher et al. (2013) - Linear Regression CNC machined parts

Duran et al. (2012) - ANN for cost estimation for shell and tube heat exchangers

Kurasova - Multiple machine learning model in customized furniture production

Github

The code for this project can be found on my Github

Machine Learning for Cost Estimation