Train/test split#

class detectree.TrainingSelector(*, img_filepaths=None, img_dir=None, img_filename_pattern=None, gabor_frequencies=None, gabor_num_orientations=None, response_bins_per_axis=None, num_color_bins=None)[source]#

Select the images/tiles to be used to train the classifier(s).

__init__(*, img_filepaths=None, img_dir=None, img_filename_pattern=None, gabor_frequencies=None, gabor_num_orientations=None, response_bins_per_axis=None, num_color_bins=None)[source]#

Initialize the training selector.

The arguments provided to the initialization method will determine how the image descriptors are computed. See the background example notebook for more details.

Parameters:
  • img_filepaths (list-like, optional) – List of paths to the input tiles whose features will be used to train the classifier.

  • img_dir (str representing path to a directory, optional) – Path to the directory where the images whose filename matches img_filename_pattern are to be located. Ignored if img_filepaths is provided.

  • img_filename_pattern (str representing a file-name pattern, optional) – Filename pattern to be matched in order to obtain the list of images. If no value is provided, the value set in settings.IMG_FILENAME_PATTERN is used. Ignored if img_filepaths is provided.

  • gabor_frequencies (tuple, optional) – Set of frequencies used to build the Gabor filter bank. If no value is provided, the value set in settings.GIST_GABOR_FREQUENCIES is used.

  • gabor_num_orientations (int or tuple, optional) – Number of orientations used to build the Gabor filter bank. If an integer is provided, the corresponding number of orientations will be used for each scale (determined by gabor_frequencies). If a tuple is provided, each element will determine the number of orientations that must be used at its matching scale (determined by gabor_frequencies) - thus the tuple must match the length of gabor_frequencies. If no value is provided, the value set in settings.GIST_GABOR_NUM_ORIENTATIONS is used.

  • response_bins_per_axis (int, optional) – Number of spatial bins per axis into which the responses to the Gabor filter bank will be aggreated. For example, a value of 2 will aggregate the responses into the four quadrants of the image (i.e., 2x2, 2 bins in each axis of the image). If no value is provided, the value set in settings.GIST_RESPONSE_BINS_PER_AXIS is used.

  • num_color_bins (int, optional) – Number of bins in each dimension used to compute a joint color histogram in the L*a*b color space. If no value is provided, the value set in settings.GIST_NUM_COLOR_BINS is used.

train_test_split(*, method='cluster-II', n_components=12, num_img_clusters=4, train_prop=0.01, return_evr=False, pca_kwargs=None, kmeans_kwargs=None)[source]#

Select the image/tiles to be used for traning.

See the background example notebook for more details.

Parameters:
  • method ({'cluster-I', 'cluster-II'}, optional (default 'cluster-II')) – Method used in the train/test split.

  • n_components (int, default 12) – Number of principal components into which the image descriptors should be represented when applying the k-means clustering.

  • num_img_clusters (int, optional (default 4)) – Number of first-level image clusters of the ‘cluster-II’ method. Ignored if method is ‘cluster-I’.

  • train_prop (float, optional) – Overall proportion of images/tiles that must be selected for training.

  • return_evr (bool, optional (default False)) – Whether the explained variance ratio of the principal component analysis should be returned

  • pca_kwargs (dict, optional) – Keyword arguments to be passed to the sklearn.decomposition.PCA class constructor (except for n_components).

  • kmeans_kwargs (dict, optional) – Keyword arguments to be passed to the sklearn.cluster.KMeans class constructor (except for n_clusters).

Returns:

  • split_df (pandas.DataFrame) – The train/test split data frame.

  • evr (numeric, optional) – Expected variance ratio of the principal component analysis.