NBLAST using light-level data¶
One of the applications of NBLAST is to match neurons across data sets. Here we will illustrate this by taking a light-level, confocal image stack and finding the same neuron in an EM connectome.
Specifically, we will use an image from Janelia’s collection of split-Gal4 driver lines and match it against the neurons in the hemibrain connectome.
Before we get started make sure that:
- navis
is installed and up-to-date
- navis-flybrains <https://github.com/navis-org/navis-flybrains> is installed and you have downloaded the Saalfeld lab’s and VFB bridging transforms (see flybrains.download_...
functions)
- download and extract hemibrain-v1.2-skeletons.tar (kindly provided by Stuart Berg, Janelia)
Next we need to pick an image stack to use as query. You can browse the expression patterns of the Janelia split-Gal4 lines here. Here I picked LH1112
which is a very clean line containing a couple of WED projection neurons. Among other data, you can download these stacks as “gendered” (i.e. female or male) or “unisex” space. Unfortunately, all image stacks are in Janelia’s .h5j
format which I haven’t figured out how to import straight from Python. Two options:
Load them into Fiji and save the GFP signal channel as .nrrd file.
Go to VirtualFlyBrain, search for your line of interested LH1112 (not all lines are be available on VFB) and download the “Signal(NRRD)” at bottom of Term Info panel.
I went for option two here and downloaded a VFB_001013cg.nrrd
. This is the neuron we’ll be looking for:
Let’s get started!
import navis
First we need to load the image stack and turn it into dotprops:
query = navis.read_nrrd('VFB_001013cg.nrrd', output='dotprops', k=20, threshold=100)
query.id = 'LH1112' # manually set the ID to the Janelia identifier
query
type | navis.Dotprops |
---|---|
name | VFB_001013cg |
id | LH1112 |
k | 20 |
units | 1 micrometer |
n_points | 10375 |
fig = navis.plot3d(query)
Note
Depending on your image you will have to play around with the threshold to get a decent representation.
Next we need to load the hemibrain skeletons and convert them to dotprops:
sk = navis.read_swc('/Users/philipps/Downloads/hemibrain-v1.2-skeletons/',
include_subdirs=True, fmt='{name,id:int}.swc')
These 97k skeletons include lots of small fragments - there are only ~25k proper neurons or substantial fragments thereof in the hemibrain dataset. So to make our life a little easier, we will keep only the largest 30k skeletons:
sk.sort_values('n_nodes')
sk = sk[:30_000]
sk
type | name | n_nodes | n_connectors | n_branches | n_leafs | cable_length | soma | units | |
---|---|---|---|---|---|---|---|---|---|
0 | navis.TreeNeuron | 425790257 | 182514 | None | 20286 | 20851 | 1.001564e+07 | None | 1 dimensionless |
1 | navis.TreeNeuron | 5813105172 | 159749 | None | 22416 | 22866 | 7.610828e+06 | None | 1 dimensionless |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
29998 | navis.TreeNeuron | 5813035385 | 1018 | None | 90 | 100 | 6.213784e+04 | None | 1 dimensionless |
29999 | navis.TreeNeuron | 2375870447 | 1018 | None | 151 | 156 | 3.505048e+04 | None | 1 dimensionless |
Next up: turning those skeletons into dotprops:
# Note that we are resampling to 1 point for every micron of cable
# Because these skeletons are in 8nm voxels we have to use 1000/8
targets = navis.make_dotprops(sk, k=5, parallel=True, resample=1000/8)
Last but not least we need to convert the image’s dotprops from their current brain space (JRC2018U
, U
for “unisex”) to hemibrain (JRCFIB2018Fraw
, raw
for voxels) space.
import flybrains
query_xf = navis.xform_brain(query, source='JRC2018U', target='JRCFIB2018Fraw')
Transform path: JRC2018U -> JRC2018F -> JRCFIB2018Fum -> JRCFIB2018F -> JRCFIB2018Fraw
Now we can run the NBLAST:
# Note that we convert from voxels (8x8x8nm) to microns
scores = navis.nblast(query_xf / 125, targets / 125, scores='mean')
scores.head()
425790257 | 5813105172 | 5813057579 | 5813050499 | 859839499 | 5813039148 | 1640909284 | 423101189 | 1321564092 | 5813024698 | ... | 2032311172 | 2283477509 | 1686751696 | 2180970798 | 1904079437 | 1348510839 | 5813047115 | 516672812 | 5813035385 | 2375870447 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LH1112 | -0.793901 | -0.870922 | -0.767066 | -0.627762 | -0.612109 | -0.591567 | -0.827376 | -0.878581 | -0.680821 | -0.832845 | ... | -0.881125 | -0.817378 | -0.88186 | -0.881415 | -0.881596 | -0.549413 | -0.881259 | -0.880987 | -0.88139 | -0.667554 |
1 rows × 30000 columns
Note
You can greatly speed up NBLAST by installing the optional dependency pykdtree
:
pip3 install pykdtree
.
Now we can sort those scores to find the top matches:
scores.loc['LH1112'].sort_values(ascending=False)
2030342003 0.218951
2214504597 0.214873
1069223047 0.183628
856131667 0.119994
2214846055 0.094881
...
2056511651 -0.883308
296501634 -0.883329
2275102579 -0.883354
5813091313 -0.883436
910404284 -0.883766
Name: LH1112, Length: 30000, dtype: float64
So we did get a couple of hits here. Let’s visualize the top 3:
fig = navis.plot3d([query_xf, targets.idx[[2030342003, 2214504597, 1069223047]]])
On a final note: the scores for those matches are rather low (1 = perfect match).
The main reason for this is that our image stack contains two neurons (the left and the right version) so half of our query
won’t have a match in any of the individual hemibrain neurons. We could have fixed that by subsetting the query to the approximate hemibrain bounding box. This is also a good idea for bilateral neurons that have parts of their arbour outside the hemibrain volume:
# Remove the left-hand-side neuron based on the position
# along the x-axis (this is just one of the possible approaches)
# This is the approximate LHS boundary of the volume
flybrains.JRCFIB2018Fraw.mesh.vertices[:, 0].max()
TrackedArray(34499.97265625)
query_ss = navis.subset_neuron(query_xf, query_xf.points[:, 0] <= 35_000)
query_ss
type | navis.Dotprops |
---|---|
name | VFB_001013cg |
k | 20 |
units | 9.999999999999998 nanometer |
n_points | 5093 |
Using query_ss
should yield much improved scores.
Another potential pitfall is the generation of dotprops from the image itself: if you compare the image- against the skeleton-derived dotprops, you might notice that the latter have fewer and less dense points. That’s a natural consequence the image containing multiple individuals of the same cell type but we could have tried to ameliorate this by some pre-processing (e.g. downsampling or thinning the image).