Custom score matrices¶
NBLAST calculates the similarity between neurons’ morphology. For more information on NBLAST in
navis, see the other tutorials.
The core of the algorithm is the function which converts point matches (a distance and a dot product) into a score expressing how likely they are to have come from the same point cloud. This is generally a 2D lookup table, referred to as the score matrix, generated by “training” it on neurons known to be matching or non-matching.
navis provides (and uses by default) the score matrix used in the original publication (Costa et al., 2016) which is based on FlyCircuit, a light-level database of Drosophila neurons. This works quite well in many cases. However, how appropriate it is for your data depends on a number of factors:
How big your neurons are (commonly addressed by scaling the distance axis of the built-in score matrix)
How you have pre-processed your neurons (pruning dendrites, resampling etc.)
Your actual task (matching left-right pairs, finding lineages etc.)
How distinct you expect your matches and non-matches to be (e.g. how large a body volume you’re drawing neurons from)
navis.nbl.smat allow you to train your own score matrix.
import navis from navis.nbl.smat import Lookup2d, LookupDistDotBuilder, Digitizer
Lookup2d is the lookup table, and can be given to any nblast-related class or function.
Each axis represented by a
Digitizer, which converts continuous values into discrete indices. These indices are used to look up score values in an array.
First, we need some training data. Let’s augment our set of example neurons by randomly mutating them:
import numpy as np # use a replicable random number generator RNG = np.random.default_rng(2021) def augment_neuron( nrn: navis.TreeNeuron, scale_sigma=0.1, translation_sigma=50, jitter_sigma=10 ): nrn = nrn.copy(deepcopy=True) nrn.name += "_aug" dims = list("xyz") coords = nrn.nodes[dims].to_numpy() # translate whole neuron coords += RNG.normal(scale=translation_sigma, size=coords.shape[-1]) # jitter individual coordinates coords += RNG.normal(scale=jitter_sigma, size=coords.shape) # rescale mean = np.mean(coords, axis=0) coords -= mean coords *= RNG.normal(loc=1.0, scale=scale_sigma) coords += mean nrn.nodes[dims] = coords return nrn original = list(navis.example_neurons()) jittered = [augment_neuron(n) for n in original] dotprops = [navis.make_dotprops(n, k=5, resample=False) for n in original + jittered] matching_pairs = [[idx, idx + len(original)] for idx in range(len(original))]
The score matrix builder needs some neurons as a list of
navis.Dotprops objects, and then to know which neurons should match with each other as indices into that list. It’s assumed that matches are relatively rare among the total set of possible pairings, so non-matching pairs are drawn randomly (although a non-matching list can be given explicitly).
Then it needs to know where to draw the boundaries between bins in the output lookup table. These can be given explicitly as a list of 2 ``Digitizer``s, or can be inferred from the data: bins will be drawn to evenly partition the matching neuron scores.
Lookup2d can be imported/exported as a
pandas.DataFrame for ease of viewing and storing.
builder = LookupDistDotBuilder( dotprops, matching_pairs, use_alpha=True, seed=2021 ).with_bin_counts([8, 5]) smat = builder.build() as_table = smat.to_dataframe() as_table
Now that we can have this score matrix, we can use it for a problem which can be solved by NBLAST: we’ve mixed up a bag of neurons which look very similar to some of our examples, and need to know which they match with.
original_dps = dotprops[:len(original)] new_dps = [ navis.make_dotprops( augment_neuron(n), k=5, resample=False ) for n in original ] RNG.shuffle(new_dps) result = navis.nblast( original_dps, new_dps, use_alpha=True, scores="mean", normalized=True, smat=smat, n_cores=1, ) result.index = [dp.name for dp in original_dps] result.columns = [dp.name for dp in new_dps] result
NBLAST is optimized for data in microns and it looks like your queries are not in microns. NBLAST is optimized for data in microns and it looks like your targets are not in microns.
You can see that while there aren’t any particularly good scores (this is a very small amount of training data, and one would normally preprocess the neurons), in each case the original’s best match is its augmented partner.