Main Content

trainCellpose

Train custom Cellpose model

Since R2023b

Description

trainCellpose(dataFolder,outputModelFile) trains a custom Cellpose model by providing an interface to the Cellpose Library. Use this syntax to train a model with default options. The function identifies pairs of training and label images in the dataFolder folder, and assumes that each label image has the same file name as the corresponding training image, plus the suffix "_labels".

example

trainCellpose(dataFolder,outputModelFile,Name=Value) specifies options using one or more name-value arguments. For example, ImageSuffix="_imRGB" trains the model using only images in the specified data folder with filenames that end in _imRGB.

Note

This functionality requires Deep Learning Toolbox™, Computer Vision Toolbox™, and the Medical Imaging Toolbox™ Interface for Cellpose Library. You can install the Medical Imaging Toolbox Interface for Cellpose Library from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons.

Examples

collapse all

Train a custom Cellpose model. This example shows how to use the trainCellpose function for a hypothetical training data set.

The function requires the path to the training data and the path to save the new trained model as inputs.

dataFolder = "C:\trainingData";
outputModelFile = "C:\cellposeModels\retrainedCyto2Model"

By default, the function retrains a copy of the cyto2 model from the Cellpose Library. This code uses an ImageSuffix value of _imRGB and a LabelSuffix value of _mask to specify the suffixes for the training and label images, respectively. For example, the function recognizes files named im1_imRGB.png and im1_mask.png as a training image and its ground truth label image.

trainCellpose(dataFolder,outputModelFile,...
    MaxEpochs=2,...
    ImageSuffix="_imRGB",...
    LabelSuffix="_mask");

Input Arguments

collapse all

Path to the data folder, specified as a string scalar or character vector. Specify dataFolder as the path to a folder that contains training images and their corresponding ground truth label images.

  • Training images must be in the TIFF, JPEG, or PNG file format.

  • Ground truth label images must be in the TIFF or PNG file format. Each ground truth image must have the same name as the corresponding training image, with a suffix specified by LabelSuffix.

Note

Because the function writes intermediate flow files to the data folder, the data folder must have write permissions. The function reuses the intermediate files if you perform training multiple times, to make training faster.

Data Types: char | string

Output model file, specified as a string scalar or character vector. Specify the full path to the folder where you want the function to write the trained model.

Data Types: char | string

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: trainCellpose(dataFolder,outputModelFile,PretrainedModel="") trains an uninitialized Cellpose model, rather than retraining a pretrained model.

Training image suffix, specified as a string scalar or character vector. Use this argument to specify a suffix, excluding the file extension, by which to filter training images in dataFolder. When specified, trainCellpose excludes images that do not end in the suffix from training.

Data Types: char | string

Main channel for the trained network to segment, specified as one of these options.

  • "average" — Use the average value across channels for training. Use this value for grayscale images.

  • "R" — Use the first image channel for training, corresponding to the red channel of an RGB image.

  • "G" — Use the second image channel for training, corresponding to the green channel of an RGB image.

  • "B" — Use the third image channel for training, corresponding to the blue channel of an RGB image.

This argument corresponds to the chan parameter in the Cellpose Library.

Data Types: char | string

Auxiliary channel to use for training, specified as "none", "R", "G", "B". If this value is "none", then the function uses an auxiliary image containing all zeros during training. This argument corresponds to the chan2 parameter in the Cellpose Library.

Data Types: char | string

Label suffix, specified as a string scalar or character vector. The function uses the label suffix to search for ground truth label files in dataFolder. By default, the function uses files ending in "_labels" as the ground truth labels.

Data Types: char | string

Pretrained model to use as a base model for training using transfer learning, specified as one of these values.

  • "" — Start training with an uninitialized Cellpose network.

  • Absolute path — Start training with a custom trained model by specifying the absolute path to the model on your machine.

  • Name of a pretrained Cellpose Library model — Start training with a pretrained cellpose model, specified as one of these options. To learn more about the pretrained models and their training data, see the Cellpose Library Documentation.

    • "cyto"

    • "cyto2"

    • "CP"

    • "CPx"

    • "nuclei"

    • "livecell"

    • "LC1"

    • "LC2"

    • "LC3"

    • "LC4"

    • "tissuenet"

    • "TN1"

    • "TN2"

    • "TN3"

Data Types: char | string

Pretrained model folder path, specified as a string scalar or character vector. This argument must be the full path to a folder containing the Cellpose model you want to train. By default, ModelFolder is a subfolder called cellposeModels within the folder returned by the userpath function. This argument has no effect when you train an uninitialized model by specifying PretrainedModel as "".

Data Types: char | string

Detectable cell diameter, specified as a numeric scalar. This argument specifies the cell diameter that you want the trained model to detect. This argument only has an effect when you train an uninitialized model by specifying PreTrainedModel as "". If you start training from a pretrained model, the detectable cell diameter of the newly trained model is the same as that of the pretrained model. This argument corresponds to the diam_mean parameter in the Cellpose Library.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Hardware resource used for training, specified as one of these values.

  • "auto" — Use a GPU if one is available. Otherwise, use the CPU.

  • "cpu" — Use the CPU.

  • "gpu" — Use the GPU.

The "gpu" option requires Parallel Computing Toolbox™. To use a GPU for deep learning, you must also have a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). If you choose the "gpu" option and Parallel Computing Toolbox or a suitable GPU is not available, then the function returns an error.

Data Types: char | string

Initial learning rate used for training, specified as a numeric scalar. This argument corresponds to the learning_rate parameter in the Cellpose Library.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Weight decay, specified as a numeric scalar. This argument corresponds to the weight_decay parameter in the Cellpose Library.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Path for saving the checkpoint model files, specified as a string scalar or character vector. By default, trainCellpose saves intermediate model files in the same parent folder as outputModelFile, within a subfolder named model. This argument corresponds to the save_path parameter in the Cellpose Library.

Data Types: char | string

Frequency for saving checkpoint model files during training, specified as a positive integer, in epochs. The function saves model files every CheckpointFrequency epochs. This argument corresponds to the save_every parameter in the Cellpose Library.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Maximum number of epochs to use for training, specified as a positive integer. This argument corresponds to the n_epochs parameter in the Cellpose Library.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

GPU batch size, specified as a positive integer. This argument has an effect only when training on a GPU. The batch size specifies the number of images per batch. Increasing the batch size increases speed, but also increases memory requirements. This argument corresponds to the batchsize parameter in the Cellpose Library.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

References

[1] Stringer, Carsen, Tim Wang, Michalis Michaelos, and Marius Pachitariu. “Cellpose: A Generalist Algorithm for Cellular Segmentation.” Nature Methods 18, no. 1 (January 2021): 100–106. https://doi.org/10.1038/s41592-020-01018-x.

[2] Pachitariu, Marius, and Carsen Stringer. “Cellpose 2.0: How to Train Your Own Model.” Nature Methods 19, no. 12 (December 2022): 1634–41. https://doi.org/10.1038/s41592-022-01663-4.

Version History

Introduced in R2023b