Deep Feature Representations and Multi-Instance Multi-Label Learning of Whole Slide Breast Histopathology Images
The examination of a tissue sample has traditionally involved a pathologist investigating the case under a microscope. Whole slide imaging technology has recently been utilized for the digitization of biopsy slides, replicating the microscopic examination procedure with the computer screen. This technology made it possible to scan the slides at very high resolutions, reaching up to 100,000 x 100,000 pixels. The advancements in the imaging technology has allowed the development of automated tools that could help reduce the workload of pathologists during the diagnostic process by performing analysis on the whole slide histopathology images.
One of the challenges of whole slide image analysis is the ambiguity of the correspondence between the diagnostically relevant regions in a slide and the slide-level diagnostic labels in the pathology forms provided by the pathologists. Another challenge is the lack of feature representation methods for the variable number of variable-sized regions of interest (ROIs) in breast histopathology images as the state-of-the-art deep convolutional networks can only operate on fixed-sized small patches which may cause structural and contextual information loss. The last and arguably the most important challenge involves the clinical significance of breast histopathology, for the misdiagnosis or the missed diagnoses of a case may lead to unnecessary surgery, radiation or hormonal therapy.
We address these challenges with the following contributions. The first contribution introduces the formulation of the whole slide breast histopathology image analysis problem as a multi-instance multi-label learning (MIMLL) task where a slide corresponds to a bag that is associated with the slide-level diagnoses provided by the pathologists, and the ROIs inside the slide correspond to the instances in the bag. The second contribution involves a novel feature representation method for the variable number of variable-sized ROIs using the activations of deep convolutional networks. Our final contribution includes a more advanced MIMLL formulation that can simultaneously perform multi-class slide-level classification and ROI-level inference.
Through quantitative and qualitative experiments, we show that the proposed MIMLL methods are capable of learning from only slide-level information for the multi-class classification of whole slide breast histopathology images and the novel deep feature representations outperform the traditional features in fully supervised and weakly supervised settings.