In this project, we are going to implement parallel versions of Gradient Boosting Decision Tree (GBDT) algorithm. First, we will write a serial version of GBDT and then build the CPU parallel versions via Open MP and MPI, respectively. If there are GPU resources available, we also plan to implement GPU parallel version. Performance will be compared with sequential and parallel implementation of standard libraries such as XGBoost and LightGBM.
In recent years, Gradient Boosting Decision Tree has been widely used in many applications in industry such as advertising and recommendation system. As a boosting method, GBDT ensembles weak classifiers and improves the accuracy. In each iteration, it would build a new tree to reduce the residual caused by previous trees. The algorithm in pseudo code is shown as below (https://www.stat.auckland.ac.nz/~yee/784/files/ch10BoostingAdditiveTrees.pdf).
To solve the bottleneck of GBDT, the training cost for many deep trees on large datasets, many researchers proposed various ways to parallel and distributed versions to accelerate the training process. For example, LightGBM uses fork-join paradigm to have work nodes train different decision trees. Among these implementations, XGBoost stands out and has been widely used in many applications.
We know there has been powerful and well-researched parallel implementations and we are not likely to propose new and better ones. Therefore, what we aimed in this project is to further familiarize ourselves with the parallelism skills learnt in the course and use them in practical projects. After implementation and detailed experiment and comparison, we believe we would have a more complete understanding of parallelism of algorithms on different platforms.
Get familiar with the GBDT algorithms. We have learnt GBDT before but never really implemented it. There must be many details that wait for us to figure out during the developing process.
We will reference the implementation of sequential GBDT from the following open sources:
For Open MP and MPI version, we will implement our own solutions.
For CUDA versions, we will try our own implementations but may reference to:
Goals
Stretch Goals
Deliverables
For CPU versions, we are going to use Open MP and Open MPI in C++ to implement the Gradient Boosting Decision Tree. We have gained some experience using them through course assignments.
For GPU versions, we are going to use CUDA.
We will develop and debug the code on GHC machine and then collect the performance results on PSC machine which have more processors.
Date | Task |
---|---|
11/14 | Finish the sequential version |
11/20 | Finish the open MP version |
11/26 | Finish the MPI version |
12/5 | Finish the experiments and analysis |
12/8 | Finish the poster |