# Custom score matrices¶

NBLAST calculates the similarity between neurons’ morphology. For more information on NBLAST in `navis`

, see the other tutorials.

The core of the algorithm is the function which converts point matches (a distance and a dot product) into a score expressing how likely they are to have come from the same point cloud. This is generally a 2D lookup table, referred to as the score matrix, generated by “training” it on neurons known to be matching or non-matching.

`navis`

provides (and uses by default) the score matrix used in the original publication (Costa et al., 2016) which is based on FlyCircuit, a light-level database of Drosophila neurons. This works quite well in many cases. However, how appropriate it is for your data depends on a number of factors:

How big your neurons are (commonly addressed by scaling the distance axis of the built-in score matrix)

How you have pre-processed your neurons (pruning dendrites, resampling etc.)

Your actual task (matching left-right pairs, finding lineages etc.)

How distinct you expect your matches and non-matches to be (e.g. how large a body volume you’re drawing neurons from)

Utilities in `navis.nbl.smat`

allow you to train your own score matrix.

```
import navis
from navis.nbl.smat import Lookup2d, LookupDistDotBuilder, Digitizer
```

`Lookup2d`

is the lookup table, and can be given to any nblast-related class or function.
Each axis represented by a `Digitizer`

, which converts continuous values into discrete indices. These indices are used to look up score values in an array.

The `LookupDistDotBuilder`

is a class which generates `Lookup2d`

instances from training data.

First, we need some training data. Let’s augment our set of example neurons by randomly mutating them:

```
import numpy as np
# use a replicable random number generator
RNG = np.random.default_rng(2021)
def augment_neuron(
nrn: navis.TreeNeuron, scale_sigma=0.1, translation_sigma=50, jitter_sigma=10
):
nrn = nrn.copy(deepcopy=True)
nrn.name += "_aug"
dims = list("xyz")
coords = nrn.nodes[dims].to_numpy()
# translate whole neuron
coords += RNG.normal(scale=translation_sigma, size=coords.shape[-1])
# jitter individual coordinates
coords += RNG.normal(scale=jitter_sigma, size=coords.shape)
# rescale
mean = np.mean(coords, axis=0)
coords -= mean
coords *= RNG.normal(loc=1.0, scale=scale_sigma)
coords += mean
nrn.nodes[dims] = coords
return nrn
original = list(navis.example_neurons())
jittered = [augment_neuron(n) for n in original]
dotprops = [navis.make_dotprops(n, k=5, resample=False) for n in original + jittered]
matching_pairs = [[idx, idx + len(original)] for idx in range(len(original))]
```

The score matrix builder needs some neurons as a list of `navis.Dotprops`

objects, and then to know which neurons should match with each other as indices into that list. It’s assumed that matches are relatively rare among the total set of possible pairings, so non-matching pairs are drawn randomly (although a non-matching list can be given explicitly).

Then it needs to know where to draw the boundaries between bins in the output lookup table. These can be given explicitly as a list of 2 ``Digitizer``s, or can be inferred from the data: bins will be drawn to evenly partition the matching neuron scores.

The resulting `Lookup2d`

can be imported/exported as a `pandas.DataFrame`

for ease of viewing and storing.

```
builder = LookupDistDotBuilder(
dotprops, matching_pairs, use_alpha=True, seed=2021
).with_bin_counts([8, 5])
smat = builder.build()
as_table = smat.to_dataframe()
as_table
```

[0.0,0.14588644700379452) | [0.14588644700379452,0.29912252023057806) | [0.29912252023057806,0.4802508950780032) | [0.4802508950780032,0.7351365037506351) | [0.7351365037506351,0.9988406500803395) | |
---|---|---|---|---|---|

[2.0140850483155233,57.8493661406481) | 0.486661 | 0.379042 | 0.387260 | 0.445648 | 0.855775 |

[57.8493661406481,81.31283353333698) | 0.404766 | 0.343702 | 0.406445 | 0.417802 | 0.799595 |

[81.31283353333698,104.08576202392578) | 0.257971 | 0.311827 | 0.217626 | 0.248743 | 1.013926 |

[104.08576202392578,128.14104591262304) | 0.255728 | 0.089663 | 0.171599 | 0.115997 | 1.296510 |

[128.14104591262304,155.36119651794434) | -0.136171 | -0.107249 | -0.125751 | 0.252883 | 1.987175 |

[155.36119651794434,202.6728515625) | -0.575078 | -0.448307 | -0.475147 | -0.221016 | 1.407061 |

[202.6728515625,395.9569088293945) | -1.025938 | -0.948679 | -0.863801 | -0.620512 | 0.054148 |

[395.9569088293945,4709.61474609375) | -0.615558 | -0.737251 | -0.679764 | -0.454779 | -0.197384 |

Now that we can have this score matrix, we can use it for a problem which can be solved by NBLAST: we’ve mixed up a bag of neurons which look very similar to some of our examples, and need to know which they match with.

```
original_dps = dotprops[:len(original)]
new_dps = [
navis.make_dotprops(
augment_neuron(n),
k=5,
resample=False
)
for n in original
]
RNG.shuffle(new_dps)
result = navis.nblast(
original_dps, new_dps,
use_alpha=True, scores="mean", normalized=True,
smat=smat,
n_cores=1,
)
result.index = [dp.name for dp in original_dps]
result.columns = [dp.name for dp in new_dps]
result
```

```
NBLAST is optimized for data in microns and it looks like your queries are not in microns.
NBLAST is optimized for data in microns and it looks like your targets are not in microns.
```

754538881_aug | 722817260_aug | 1734350908_aug | 1734350788_aug | 754534424_aug | |
---|---|---|---|---|---|

1734350788 | -0.271504 | -0.361381 | -0.272018 | 0.159048 | -0.325920 |

1734350908 | -0.400432 | -0.491376 | 0.858478 | -0.437819 | -0.216206 |

722817260 | -0.157433 | 0.127931 | -0.365195 | -0.407436 | -0.393940 |

754534424 | -0.237769 | -0.413532 | -0.193794 | -0.401148 | 0.857898 |

754538881 | -0.021153 | -0.213349 | -0.200214 | -0.303390 | -0.115017 |

You can see that while there aren’t any particularly good scores (this is a very small amount of training data, and one would normally preprocess the neurons), in each case the original’s best match is its augmented partner.