Big data, bigger opportunities
It is likely that somewhere between 70–80% of the information in the electronic health record is pathology data . These data are relatively readily available and represent the collective experience of the pathology profession. The amount of this data is enormous. There are more than 500 million pathology tests performed each year in Australia alone . We are now in what Sikaris has called the third phase of medical learning [(I) masters; (II) journals; (III) databases) and we have the capability of turning this enormous resource of data into information and knowledge.
Pathology is the study and diagnosis of disease and the new database mining tools combined with the pathology database allow us to use this data in a variety of ways including determination of reference intervals and new quality control techniques, diagnostic algorithms, defining the usefulness of newer and current tests, guiding treatment, and perhaps most excitingly in knowledge discovery. Pathology data can be used to determine many relationships between data sets including anomaly detection, association, clustering, classification and regression.
Pathology (medical) data mining has unique problems related to the heterogeneity of the data classification, the ethical and legal issues of dealing with patient information, and the statistical philosophy of large data sets with missing elements .
Recognising the importance of pathology data mining, the Journal has introduced a new category of manuscript defined as “Clinical Database in Laboratory Medicine Research Column” which will highlight hot topics and major advances in the field. The Editors are willing to support articles that lead to greater insights into the issues described earlier. Papers that provide broad practical results that assist the readership in applying techniques to their data are sought. The column will serve as a source of advice and direction for laboratory staff and informaticians to exchange ideas and results to further this exciting move towards the widespread use of big laboratory data mining.