Custom score matrices

NBLAST calculates the similarity between neurons’ morphology. For more information on NBLAST in navis, see the other tutorials.

The core of the algorithm is the function which converts point matches (a distance and a dot product) into a score expressing how likely they are to have come from the same point cloud. This is generally a 2D lookup table, referred to as the score matrix, generated by “training” it on neurons known to be matching or non-matching.

navis provides (and uses by default) the score matrix used in the original publication (Costa et al., 2016) which is based on FlyCircuit, a light-level database of Drosophila neurons. This works quite well in many cases. However, how appropriate it is for your data depends on a number of factors:

  • How big your neurons are (commonly addressed by scaling the distance axis of the built-in score matrix)

  • How you have pre-processed your neurons (pruning dendrites, resampling etc.)

  • Your actual task (matching left-right pairs, finding lineages etc.)

  • How distinct you expect your matches and non-matches to be (e.g. how large a body volume you’re drawing neurons from)

Utilities in navis.nbl.smat allow you to train your own score matrix.

import navis
from navis.nbl.smat import Lookup2d, LookupDistDotBuilder, Digitizer

Lookup2d is the lookup table, and can be given to any nblast-related class or function. Each axis represented by a Digitizer, which converts continuous values into discrete indices. These indices are used to look up score values in an array.

The LookupDistDotBuilder is a class which generates Lookup2d instances from training data.

First, we need some training data. Let’s augment our set of example neurons by randomly mutating them:

import numpy as np

# use a replicable random number generator
RNG = np.random.default_rng(2021)


def augment_neuron(
    nrn: navis.TreeNeuron, scale_sigma=0.1, translation_sigma=50, jitter_sigma=10
):
    nrn = nrn.copy(deepcopy=True)
    nrn.name += "_aug"
    dims = list("xyz")
    coords = nrn.nodes[dims].to_numpy()

    # translate whole neuron
    coords += RNG.normal(scale=translation_sigma, size=coords.shape[-1])
    # jitter individual coordinates
    coords += RNG.normal(scale=jitter_sigma, size=coords.shape)
    # rescale
    mean = np.mean(coords, axis=0)
    coords -= mean
    coords *= RNG.normal(loc=1.0, scale=scale_sigma)
    coords += mean

    nrn.nodes[dims] = coords
    return nrn


original = list(navis.example_neurons())
jittered = [augment_neuron(n) for n in original]

dotprops = [navis.make_dotprops(n, k=5, resample=False) for n in original + jittered]
matching_pairs = [[idx, idx + len(original)] for idx in range(len(original))]

The score matrix builder needs some neurons as a list of navis.Dotprops objects, and then to know which neurons should match with each other as indices into that list. It’s assumed that matches are relatively rare among the total set of possible pairings, so non-matching pairs are drawn randomly (although a non-matching list can be given explicitly).

Then it needs to know where to draw the boundaries between bins in the output lookup table. These can be given explicitly as a list of 2 ``Digitizer``s, or can be inferred from the data: bins will be drawn to evenly partition the matching neuron scores.

The resulting Lookup2d can be imported/exported as a pandas.DataFrame for ease of viewing and storing.

builder = LookupDistDotBuilder(
    dotprops, matching_pairs, use_alpha=True, seed=2021
).with_bin_counts([8, 5])
smat = builder.build()
as_table = smat.to_dataframe()
as_table
[0.0,0.14588644700379452) [0.14588644700379452,0.29912252023057806) [0.29912252023057806,0.4802508950780032) [0.4802508950780032,0.7351365037506351) [0.7351365037506351,0.9988406500803395)
[2.0140850483155233,57.8493661406481) 0.486661 0.379042 0.387260 0.445648 0.855775
[57.8493661406481,81.31283353333698) 0.404766 0.343702 0.406445 0.417802 0.799595
[81.31283353333698,104.08576202392578) 0.257971 0.311827 0.217626 0.248743 1.013926
[104.08576202392578,128.14104591262304) 0.255728 0.089663 0.171599 0.115997 1.296510
[128.14104591262304,155.36119651794434) -0.136171 -0.107249 -0.125751 0.252883 1.987175
[155.36119651794434,202.6728515625) -0.575078 -0.448307 -0.475147 -0.221016 1.407061
[202.6728515625,395.9569088293945) -1.025938 -0.948679 -0.863801 -0.620512 0.054148
[395.9569088293945,4709.61474609375) -0.615558 -0.737251 -0.679764 -0.454779 -0.197384

Now that we can have this score matrix, we can use it for a problem which can be solved by NBLAST: we’ve mixed up a bag of neurons which look very similar to some of our examples, and need to know which they match with.

original_dps = dotprops[:len(original)]
new_dps = [
    navis.make_dotprops(
        augment_neuron(n),
        k=5,
        resample=False
    )
    for n in original
]
RNG.shuffle(new_dps)

result = navis.nblast(
    original_dps, new_dps,
    use_alpha=True, scores="mean", normalized=True,
    smat=smat,
    n_cores=1,
)
result.index = [dp.name for dp in original_dps]
result.columns = [dp.name for dp in new_dps]
result
NBLAST is optimized for data in microns and it looks like your queries are not in microns.
NBLAST is optimized for data in microns and it looks like your targets are not in microns.
754538881_aug 722817260_aug 1734350908_aug 1734350788_aug 754534424_aug
1734350788 -0.271504 -0.361381 -0.272018 0.159048 -0.325920
1734350908 -0.400432 -0.491376 0.858478 -0.437819 -0.216206
722817260 -0.157433 0.127931 -0.365195 -0.407436 -0.393940
754534424 -0.237769 -0.413532 -0.193794 -0.401148 0.857898
754538881 -0.021153 -0.213349 -0.200214 -0.303390 -0.115017

You can see that while there aren’t any particularly good scores (this is a very small amount of training data, and one would normally preprocess the neurons), in each case the original’s best match is its augmented partner.