Original Article
Using machine learning techniques to generate laboratory diagnostic pathways—a case study
Abstract
Background: Diagnostic pathways are based on expert rules (“if…then…else”), which can be visualized as decision trees. Machine learning algorithms may be used to validate existing or to suggest potential new decision trees.
Methods: We present and compare two machine learning algorithms, which automatically generate decision trees from laboratory data. The underlying functions (rpart and ctree) are included in the free statistical software environment R (www.r-project.org).
Results: Using input data from a published study on hepatitis C patients, we demonstrate that both algorithms are easy to apply and produce plausible decision trees. These algorithms confirm common knowledge about the power of laboratory testing to detect liver fibrosis and cirrhosis. For example, we show that conventional measurands such as choline esterase or albumin can separate cases with no or slight fibrosis from those with severe fibrosis and cirrhosis, while intermediate stages are often misclassified. Addition of the newer ELF score to the test panel makes decision trees simpler and may improve the quality of classification. Validation of the automatic decision trees with the leave-one-out method shows that the accuracy of all models created by machine learning is significantly higher than pure guessing. However, small changes of the input data result in markedly deviant decision trees, which need to be validated further with respect to clinical plausibility.
Conclusions: The machine learning algorithms presented here can support but do not replace the medical expert when designing decision trees for diagnostic pathways.
Methods: We present and compare two machine learning algorithms, which automatically generate decision trees from laboratory data. The underlying functions (rpart and ctree) are included in the free statistical software environment R (www.r-project.org).
Results: Using input data from a published study on hepatitis C patients, we demonstrate that both algorithms are easy to apply and produce plausible decision trees. These algorithms confirm common knowledge about the power of laboratory testing to detect liver fibrosis and cirrhosis. For example, we show that conventional measurands such as choline esterase or albumin can separate cases with no or slight fibrosis from those with severe fibrosis and cirrhosis, while intermediate stages are often misclassified. Addition of the newer ELF score to the test panel makes decision trees simpler and may improve the quality of classification. Validation of the automatic decision trees with the leave-one-out method shows that the accuracy of all models created by machine learning is significantly higher than pure guessing. However, small changes of the input data result in markedly deviant decision trees, which need to be validated further with respect to clinical plausibility.
Conclusions: The machine learning algorithms presented here can support but do not replace the medical expert when designing decision trees for diagnostic pathways.