# Using Image Processing and Statistical Analysis to Quantify Cell Scattering for Cancer Drug Research

By Gretchen Argast, OSI Pharmaceuticals, LLC and Paul Fricker, MathWorks

Epithelial to mesenchymal transition (EMT), a process vital to embryonic development, has been linked to the spread of cancer in adults. As a result, there is increased interest in developing cancer drugs that target EMT in addition to drugs that target cell proliferation and survival.

Until recently, measuring how a drug affected one aspect of EMT, cell scattering, was a manual process that involved subjectively assessing the relative closeness of cells in a culture. Researchers at OSI Pharmaceuticals worked with MathWorks consultants to develop an automated system for quantifying the scattering of cells in a sample. Based on MATLAB^{®}, Image Processing Toolbox™, and Statistics and Machine Learning Toolbox™, the system measures nucleus-to-nucleus distances of nearest-neighbor cells. The ability to measure scattering is essential to evaluating the efficacy of drugs that may inhibit or reverse EMT because it gives researchers a reliable way to compare the effects of one drug against another.

### What is EMT?

In humans and other vertebrates, there are two basic cell types: epithelial and mesenchymal. Several morphological and functional characteristics differentiate the two cell types. For example, epithelial cells depend on cell-to-cell contact for survival. Mesenchymal cells, in contrast, are characterized by their independence from nearby cells and by their mobility, two requirements for cell scattering.

In EMT, cells lose their epithelial traits and acquire mesenchymal traits. EMT is essential for developing embryos because it produces mesenchymal cells that can migrate to form bone, cartilage, and other tissue where needed. In adults, however, EMT is associated with pathologies such as cancer and fibrosis. Because mesenchymal tumor cells are more mobile, and thus more invasive, than epithelial tumor cells, scientists believe that they facilitate *metastasis*, or the spread of tumor cells. EMT also diminishes the effectiveness of chemotherapy treatments that target epithelial cells.

## Analyzing Cell Sample Images

OSI researchers have developed pancreatic and lung tumor models and identified a set of *ligands*, or binding molecules, that drive EMT in these models. Two of these ligands, hepatocyte growth factor (HGF) and oncostatin M (OSM), induced EMT in the models, enabling us to produce samples that demonstrate the cell scattering associated with EMT. The samples are stained so that the nucleus of each cell shows blue in the images captured by our microscopes (Figure 1).

To quantify the scattering of the cells, we developed a numerical procedure that uses image processing and statistical analyses. Measuring the spatial density of the cells would be relatively straightforward if the images were completely covered by the cells: We would simply count the number of nuclei in each image and then divide by the total image area. The images that we generate are almost always partially covered, however, making it difficult to estimate the cell density correctly. We decided to develop an alternative approach to quantify the scattering, based on measurements of the distances between the cell nuclei.

To analyze the cell images, we used an algorithm consisting of four main steps:

- Threshold the entire image to segment the cell nuclei, or clusters of nuclei.
- Analyze the resulting
*blobs*to determine their sizes (areas). - Zoom in on larger blobs to perform a localized analysis, to identify individual cells within the blob.
- Identify the (
*x*,*y*)-location for each cell nucleus in the image.

Because the intensity scaling is consistent across all the captured images, we can capture most of the individual blobs using a single hard-coded threshold value. This thresholding procedure produces a binary image in which the cell nucleus is indicated by 1, or white, and its absence is indicated by 0, or black (Figure 2). Using Image Processing Toolbox, we analyzed these black and white images to find the locations and sizes (areas) of all the blobs.

In some cases, a few cells are so close together that their nuclei appear to be touching one another, and they cannot be distinguished as separate nuclei. To enhance the processing of the images, we sorted the blobs into three categories based on their size. Those with areas below a certain size were deemed to be noise or partially occluded cells, and were discarded from the subsequent analysis. Blobs of intermediate size were classified as individual nuclei that had already been successfully segmented. The largest blobs were presumed to be clusters of overlapping cells requiring further analysis.

To distinguish the individual nuclei within the larger blobs, the algorithm crops the subregions of the image containing the largest blobs and performs local, adaptive thresholding to more accurately distinguish the individual cells (Figure 3).

At the end of the image analysis procedure, the algorithm has identified the location of most of the cell nuclei in the image, and stored this data in an array. The success of the algorithm can be verified visually by overlaying the input images with markers at each measured nucleus location (Figure 4).

## Measuring and Analyzing Distances Between Cells

Once we have processed the images and obtained an array of cell nucleus coordinates, we use basic MATLAB matrix operations to compute the distances between an individual nucleus and all the other nuclei in the cell cluster. To assess the scattering of the cells, we compute the distance between each cell and its nearest neighbor. Each image generates a set of nearest-neighbor distances, with one value for each cell. The distance values computed from the image data are initially measured in pixels, and are converted to microns using a known length scale.

MATLAB histograms of these nearest-neighbor distances show clearly that the data fits into meaningful distribution patterns. These patterns reveal distinct differences between each of the four types of cells that we were studying: untreated, HGF-treated, OSM-treated, and HFG+OSM-treated lung cancer cells (Figure 5).

These histogram results suggested that the data could be characterized using a statistical distribution. Using Statistics and Machine Learning Toolbox we fitted the measured distance values to a series of probability distributions. Narrowing our search to asymmetric, continuous distributions, after an iterative process we found that the *loglogistic* distribution provided the best fit for the nearest-neighbor distance results.

In addition to characterizing the scattering of the cells, one of the main objectives of this project was to develop a method for differentiating the degree of scattering produced by the treatment of cell samples with different ligands. To accomplish this, we used MATLAB to compute the mean (μ) and variance (σ) parameters for the loglogistic distribution for each of the four samples (Figure 6).

The statistical fitting plots show that the computed values of μ and σ capture distinct differences in the magnitude of cell scattering in the four data sets. Conversely, when these parameters are computed for a given data set, they can be used to identify which ligand (HGF, OSM, or HGF+OSM) was used to treat the original cell sample. The distributions show that either ligand alone induced scattering in the cells, and that the combined ligand treatment resulted in a further increase in scattering. These distributions reflect what we observe qualitatively in the cells after treatment with ligands. From these results we concluded that the mean and variance parameters of the loglogistic distribution fitting of computed nearest-neighbor distances could be used to reliably quantify the scattering of cell nuclei in a given sample.

In addition to characterizing the responses of the cells to different ligands, we also looked at the effect of drug treatment on the degree of cell scattering. We computed the loglogistic distributions for samples treated with HGF+OSM that were also treated with increasing concentrations of a drug that blocks the effects of HGF (50 nM to 2 μM) (Figure 7). At concentrations of 500 nM and above, the drug inhibited the effects of HGF and reduced the degree of scattering to one that approximated the effects of OSM by itself. This type of analysis is essential for determining the optimal dose for a new drug.

At the beginning of the EMT quantification project, our goal was to use image analysis techniques with our microscope data to quantify the scattering or density of cells in our samples. After successfully analyzing the basic attributes of the cell nuclei using MATLAB and Image Processing Toolbox, we realized that the resulting data could best be characterized in terms of a statistical distribution. It was easy to transition to a statistical analysis of the data using Statistics and Machine Learning Toolbox. MATLAB enabled us to work within a single development environment, from the initial image thresholding and nearest-neighbor distance calculations, through selecting and validating an appropriate statistical distribution, to the final comparison of different ligand dose responses.

With a system in place for quantifying the scattering of cells in a sample, OSI researchers now have an objective computational method for measuring the ability of drugs in development to reduce or reverse EMT, and potentially, for increasing the drug’s ability to inhibit cancer metastasis.

Published 2012 - 92038v00