Methods and Parameters for Linking

Linking assigns the detected features and segments in each timestep to trajectories, to enable an analysis of their time evolution. We will cover the topics:

We start by loading the usual libraries:

[1]:
import tobac
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

import seaborn as sns

sns.set_context("talk")

Understanding the Linking Output

Since it has a time dimension, the sample data from the testing utilities is also suitable for a demonstration of this step. The chosen method creates 2-dimensional sample data with up to 3 moving blobs at the same. So, our expectation is to obtain up to 3 tracks.

At first, loading in the data and performing the usual feature detection is required:

[2]:
%%capture

data = tobac.testing.make_sample_data_2D_3blobs_inv()
dxy, dt = tobac.utils.get_spacings(data)
features = tobac.themes.tobac_v1.feature_detection_multithreshold(
    data, dxy, threshold=1, position_threshold="weighted_abs"
)

Now the linking via trackpy can be performed. Notice that the temporal spacing is also a required input this time. Additionally, it is necessary to provide either a maximum speed v_max or a maximum search range d_max. Here we use a maximum speed of 100 m/s:

[3]:
track = tobac.themes.tobac_v1.linking_trackpy(features, data, dt=dt, dxy=dxy, v_max=100)
Frame 69: 2 trajectories present.

The output tells us, that in the final frame 69 two trajecteries where still present. If we checkout this frame via xarray.plot method, we can see that this corresponds to the two present features there, which are assigned to different trajectories.

[4]:
plt.figure(figsize=(6, 9))
data.isel(time=69).plot(x="x", y="y")
[4]:
<matplotlib.collections.QuadMesh at 0x7f064377b6d0>
../../_images/tutorials_Basics_Methods-and-Parameters-for-Linking_9_1.png

The track dataset contains two new variables, cell and time_cell. The first assigns the features to one of the found trajectories by an integer and the second specifies how long the feature has been present already in the data.

[5]:
track[["cell", "time_cell"]]
[5]:
<xarray.Dataset>
Dimensions:    (index: 110)
Coordinates:
  * index      (index) int64 0 1 2 3 4 5 6 7 ... 102 103 104 105 106 107 108 109
Data variables:
    cell       (index) float64 1.0 1.0 1.0 1.0 1.0 1.0 ... 3.0 2.0 3.0 2.0 3.0
    time_cell  (index) timedelta64[ns] 00:00:00 00:01:00 ... 00:29:00 00:19:00

Since we know that this dataset contains 3 features, we can visualize the found tracks with masks created from the cell variable:

[6]:
fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 9))

data.isel(time=50).plot(ax=ax, x="x", y="y", cmap="Greys")

for i, cell_track in track.groupby("cell"):

    cell_track.plot.scatter(
        x="projection_x_coordinate",
        y="projection_y_coordinate",
        ax=ax,
        marker="o",
        alpha=0.2,
        label="cell {0}".format(int(i)),
    )


ax.set_title("trajectories in the data")
plt.legend()
[6]:
<matplotlib.legend.Legend at 0x7f064d306590>
../../_images/tutorials_Basics_Methods-and-Parameters-for-Linking_13_1.png

The cells have been specified with different velocities that also change in time. tobac retrieves the following values for our test data:

[7]:
vel = tobac.analysis.calculate_velocity(track)
[8]:
fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(12, 6))

for i, vc in vel.groupby("cell"):

    # get velocity as xarray Dataset
    v = vc.to_xarray()["v"]
    v = v.assign_coords({"frame": vc.frame})

    v.plot(x="frame", ax=ax, label="cell {0}".format(int(i)))

    ax.set_ylabel("estimated velocity / (m/s)")

ax.set_title("velocities of the cell tracks")
plt.legend()
[8]:
<matplotlib.legend.Legend at 0x7f0642a4b490>
../../_images/tutorials_Basics_Methods-and-Parameters-for-Linking_16_1.png

Interpretation:

  • “cell 1” is actually specified with constant velocity. The initial increase and the final decrease come from edge effects, i.e. that the blob corresponding to “cell 1” has parts that that go beyond the border.

  • “cell 2” and “cell 3” are specified with linearily increasing x-component of the velocity. The initial speed-up is due to the square-root behavior of the velocity magnitude. The final decrease however come again from edge effects.

The starting point of the cell index is of course arbitrary. If we want to change it from 1 to 0 we can do this with the cell_number_start parameter:

[9]:
track = tobac.themes.tobac_v1.linking_trackpy(
    features, data, dt=dt, dxy=dxy, v_max=100, cell_number_start=0
)

fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 9))

data.isel(time=50).plot(ax=ax, x="x", y="y", cmap="Greys")

for i, cell_track in track.groupby("cell"):

    cell_track.plot.scatter(
        x="projection_x_coordinate",
        y="projection_y_coordinate",
        ax=ax,
        marker="o",
        alpha=0.2,
        label="cell {0}".format(int(i)),
    )

ax.set_title("trajectories in the data")
plt.legend()
Frame 69: 2 trajectories present.
[9]:
<matplotlib.legend.Legend at 0x7f0642918cd0>
../../_images/tutorials_Basics_Methods-and-Parameters-for-Linking_19_2.png

The linking in tobac is performed with methods of the trackpy-library. Currently, there are two methods implemented, ‘random’ and ‘predict’. These can be selected with the method keyword. The default is ‘random’.

[10]:
track_p = tobac.themes.tobac_v1.linking_trackpy(
    features, data, dt=dt, dxy=dxy, v_max=100, method_linking="predict"
)
Frame 69: 2 trajectories present.

Defining a Search Radius

If you paid attention to the input of the linking_trackpy() function you noticed that we used the parameter v_max there. The reason for this is that it would take a lot of computation time to check the whole data of a frame for the next position of a feature from the previous frame with the Crocker-Grier linking algorithm. Therefore the search is restricted to a circle around a position predicted by different methods implemented in trackpy.

The speed we defined before specifies such a radius timestep via

\[r_{max} = v_{max} \Delta t.\]

By reducing the maximum speed from 100 m/s to 70 m/s we will find more cell tracks, because a feature that is not assigned to an existing track will be used as a starting point for a new one:

[11]:
track = tobac.themes.tobac_v1.linking_trackpy(features, data, dt=dt, dxy=dxy, v_max=70)

fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 9))

data.isel(time=50).plot(ax=ax, x="x", y="y", cmap="Greys")

for i, cell_track in track.groupby("cell"):
    cell_track.plot.scatter(
        x="projection_x_coordinate",
        y="projection_y_coordinate",
        ax=ax,
        marker="o",
        alpha=0.2,
        label="cell {0}".format(int(i)),
    )

plt.legend()
Frame 69: 2 trajectories present.
[11]:
<matplotlib.legend.Legend at 0x7f0642957790>
../../_images/tutorials_Basics_Methods-and-Parameters-for-Linking_23_2.png

Above you see that previously orange track is split into five parts because the considered cell moves so fast.

The same effect can be achieved by directly reducing the maximum search range with the d_max parameter to

\[d_{max} = 4200 m\]

because

\[v_{max} \Delta t = 4200 m\]

for \(v_{max} = 70\) m/s in our case.

[12]:
track = tobac.themes.tobac_v1.linking_trackpy(
    features, data, dt=dt, dxy=dxy, d_max=4200
)

fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 9))

data.isel(time=50).plot(ax=ax, x="x", y="y", cmap="Greys")

for i, cell_track in track.groupby("cell"):
    cell_track.plot.scatter(
        x="projection_x_coordinate",
        y="projection_y_coordinate",
        ax=ax,
        marker="o",
        alpha=0.2,
        label="cell {0}".format(int(i)),
    )

plt.legend()
Frame 69: 2 trajectories present.
[12]:
<matplotlib.legend.Legend at 0x7f0641dbe510>
../../_images/tutorials_Basics_Methods-and-Parameters-for-Linking_26_2.png

v_max and d_max should not be used together, because they both specify the same qunatity, namely the search range, but it is neccery to set at least one of these parameters.

Working with the Subnetwork

If we increase v_max or d_max it can happen that more than one feature of the previous timestep is in the search range. The selection of the best matching feature from the set of possible features (which is called the subnetwork, for a more in depth explanation have a look here) is the most time consuming part of the linking process. For this reason, it is possible to limit the number of features in the search range via the subnetwork_size parameter. The tracking will result in a trackpy.SubnetOversizeException if more features than specified there are in the search range. We can simulate this situation by setting a really high value for v_max and 1 for the subnetwork_size:

[13]:
from trackpy import SubnetOversizeException

try:

    track = tobac.themes.tobac_v1.linking_trackpy(
        features, data, dt=dt, dxy=dxy, v_max=1000, subnetwork_size=1
    )
    print("tracking successful")

except SubnetOversizeException as e:

    print("Tracking not possible because the {}".format(e))
Frame 57: 3 trajectories present.
Tracking not possible because the Subnetwork contains 2 points

The default value for this parameter is 30 and can be accessed via:

[14]:
import trackpy as tp

tp.linking.Linker.MAX_SUB_NET_SIZE
[14]:
1

It is important to highlight that if the linking_trackpy()-function is called with a subnetwork_size different from 30, this number will be the new default. This is the reason, why the value is 1 at this point.

Timescale of the Tracks

If we want to limit our analysis to long living features of the dataset it is possible to exclude tracks which only cover few timesteps. This is done by setting a lower bound for those via the stups parameter. In our test data, only the first cell exceeds 50 time steps:

[17]:
track = tobac.themes.tobac_v1.linking_trackpy(
    features, data, dt=dt, dxy=dxy, v_max=100, stubs=50
)

fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 9))

data.isel(time=50).plot(ax=ax, x="x", y="y", cmap="Greys")

for i, cell_track in track.groupby("cell"):
    cell_track.plot.scatter(
        x="projection_x_coordinate",
        y="projection_y_coordinate",
        ax=ax,
        marker="o",
        alpha=0.2,
        label="cell {0}".format(int(i)),
    )

plt.legend()
Frame 69: 2 trajectories present.
[17]:
<matplotlib.legend.Legend at 0x7f0641d4d990>
../../_images/tutorials_Basics_Methods-and-Parameters-for-Linking_39_2.png

Analogius to the search range, there is a second option for that called time_cell_min. This defines a minimum of time in seconds for a cell to appear as tracked:

[18]:
track = tobac.themes.tobac_v1.linking_trackpy(
    features,
    data,
    dt=dt,
    dxy=dxy,
    v_max=100,
    time_cell_min=60 * 50,  # means 50 steps of 60 seconds
)

fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 9))

data.isel(time=50).plot(ax=ax, x="x", y="y", cmap="Greys")

for i, cell_track in track.groupby("cell"):
    cell_track.plot.scatter(
        x="projection_x_coordinate",
        y="projection_y_coordinate",
        ax=ax,
        marker="o",
        alpha=0.2,
        label="cell {0}".format(int(i)),
    )

plt.legend()
Frame 69: 2 trajectories present.
[18]:
<matplotlib.legend.Legend at 0x7f06412030d0>
../../_images/tutorials_Basics_Methods-and-Parameters-for-Linking_41_2.png

Gappy Tracks

If the data is noisy, an object may disappear for a few frames and then reappear. Such features can still be linked by setting the memory parameter with an integer. This defines the number of timeframes a feature can vanish and still get tracked as one cell. We demonstrate this on a simple dataset with only one feature:

[19]:
data = tobac.testing.make_simple_sample_data_2D()
dxy, dt = tobac.utils.get_spacings(data)
features = tobac.themes.tobac_v1.feature_detection_multithreshold(
    data, dxy, threshold=1
)
track = tobac.themes.tobac_v1.linking_trackpy(
    features, data, dt=dt, dxy=dxy, v_max=1000
)

fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(16, 6))

data.isel(time=10).plot(ax=ax, x="x", y="y", cmap="Greys")

for i, cell_track in track.groupby("cell"):
    cell_track.plot.scatter(
        x="projection_x_coordinate",
        y="projection_y_coordinate",
        ax=ax,
        marker="o",
        alpha=0.2,
        label="cell {0}".format(int(i)),
    )

plt.legend()
Frame 59: 1 trajectories present.
[19]:
<matplotlib.legend.Legend at 0x7f0641ccabd0>
../../_images/tutorials_Basics_Methods-and-Parameters-for-Linking_43_2.png

To simulate a gap in the data we set 5 timeframes to 0:

[20]:
data[30:35] = data[30] * 0

If we apply feature detection and linking to this modified dataset, we will now get two cells:

[21]:
features = tobac.themes.tobac_v1.feature_detection_multithreshold(
    data, dxy, threshold=1
)
track = tobac.themes.tobac_v1.linking_trackpy(
    features, data, dt=dt, dxy=dxy, v_max=1000
)

fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(16, 6))

data.isel(time=10).plot(ax=ax, x="x", y="y", cmap="Greys")

for i, cell_track in track.groupby("cell"):
    cell_track.plot.scatter(
        x="projection_x_coordinate",
        y="projection_y_coordinate",
        ax=ax,
        marker="o",
        alpha=0.2,
        label="cell {0}".format(int(i)),
    )

plt.legend()
Frame 59: 1 trajectories present.
[21]:
<matplotlib.legend.Legend at 0x7f064d10b110>
../../_images/tutorials_Basics_Methods-and-Parameters-for-Linking_47_2.png

We can avoid that by setting memory to a sufficiently high number. 5 is of course sufficient in our case as the situation is artificially created, but with real data this would require some fine tuning. Keep in mind that the search radius needs to be large enough to reach the next feature position. This is the reason for setting v_max to 1000 in the linking process.

[22]:
features = tobac.themes.tobac_v1.feature_detection_multithreshold(
    data, dxy, threshold=1
)
track = tobac.themes.tobac_v1.linking_trackpy(
    features, data, dt=dt, dxy=dxy, v_max=1000, memory=5
)

fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(16, 6))

data.isel(time=10).plot(ax=ax, cmap="Greys")

for i, cell_track in track.groupby("cell"):
    cell_track.plot.scatter(
        x="projection_x_coordinate",
        y="projection_y_coordinate",
        ax=ax,
        marker="o",
        alpha=0.2,
        label="cell {0}".format(int(i)),
    )

plt.legend()
Frame 59: 1 trajectories present.
[22]:
<matplotlib.legend.Legend at 0x7f064d07b9d0>
../../_images/tutorials_Basics_Methods-and-Parameters-for-Linking_49_2.png
[23]:
vel = tobac.analysis.calculate_velocity(track)
[24]:
fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(12, 6))

for i, vc in vel.groupby("cell"):

    # get velocity as xarray Dataset
    v = vc.to_xarray()["v"]
    v = v.assign_coords({"frame": vc.frame})

    v.plot(x="frame", ax=ax, label="cell {0}".format(int(i)), marker="x")

    ax.set_ylabel("estimated velocity / (m/s)")

ax.axvspan(30, 35, color="gold", alpha=0.3)
ax.set_title("velocities of the cell tracks")
plt.legend()
[24]:
<matplotlib.legend.Legend at 0x7f0642a91650>
../../_images/tutorials_Basics_Methods-and-Parameters-for-Linking_51_1.png

Velocity calculations work well across the gap (marked with yellow above). The initial and final drop in speed comes again from edge effects.

Other Parameters

The parameters d_min_, extrapolate and order do not have a working implmentation in tobac V2 at the moment.