Idealized Case 1: Tracking of a Test Blob in 2D

This tutorial shows the most important steps of tracking with tobac using an idealized case:

  1. Input Data

  2. Feature Detection

  3. Tracking / Trajectory Linking

  4. Segmentation

  5. Statistical Analysis

Import Libraries

We start by importing tobac:

[1]:
import tobac

We will also need matplotlib in inline-mode for plotting and numpy:

[2]:
import matplotlib.pyplot as plt

%matplotlib inline

import numpy as np

For a better readability of the graphs:

[3]:
import seaborn as sns

sns.set_context("talk")

Tobac works with a Python package called xarray, which introduces DataArrays. In a nutshell these are numpy-arrays with labels. For a more extensive description have a look at the xarray Documentation.

1. Input Data

There are several utilities implemented in tobac to create simple examples of such arrays. In this tutorial we will use the function make_simple_sample_data_2D() to create a moving test blob in 2D:

[4]:
test_data = tobac.testing.make_simple_sample_data_2D()
test_data
[4]:
<xarray.DataArray 'w' (time: 100, y: 50, x: 100)>
[500000 values with dtype=float64]
Coordinates:
  * time       (time) datetime64[ns] 2000-01-01T12:00:00 ... 2000-01-01T13:39:00
  * y          (y) float64 0.0 1e+03 2e+03 3e+03 ... 4.7e+04 4.8e+04 4.9e+04
  * x          (x) float64 0.0 1e+03 2e+03 3e+03 ... 9.7e+04 9.8e+04 9.9e+04
    latitude   (y, x) float64 ...
    longitude  (y, x) float64 ...
Attributes:
    units:    m s-1

As you can see our generated data describes a field called ‘w’ with the unit m/s at 100, 50 and 100 datapoints of time, x and y. Additionally, the data contains the latitude and longitude coordinates of the field values. To access the values of ‘w’ in the first timeframe, we can use

[5]:
test_data.data[0]
[5]:
array([[3.67879441e+00, 4.04541885e+00, 4.40431655e+00, ...,
        2.22319774e-16, 9.26766698e-17, 3.82489752e-17],
       [4.04541885e+00, 4.44858066e+00, 4.84324569e+00, ...,
        2.44475908e-16, 1.01912721e-16, 4.20608242e-17],
       [4.40431655e+00, 4.84324569e+00, 5.27292424e+00, ...,
        2.66165093e-16, 1.10954118e-16, 4.57923372e-17],
       ...,
       [6.45813368e-03, 7.10174389e-03, 7.73178977e-03, ...,
        3.90282972e-19, 1.62694148e-19, 6.71461808e-20],
       [4.43860604e-03, 4.88095244e-03, 5.31397622e-03, ...,
        2.68237303e-19, 1.11817944e-19, 4.61488502e-20],
       [3.02025230e-03, 3.32124719e-03, 3.61589850e-03, ...,
        1.82522243e-19, 7.60865909e-20, 3.14020145e-20]])

which is then just an array of numbers of the described shape.

To visualize the data, we can plot individual time frames using matplotlib per imshow:

[6]:
fig, axs = plt.subplots(ncols=1, nrows=3, figsize=(12, 16), sharey=True)
plt.subplots_adjust(hspace=0.5)

for i, itime in enumerate([0, 20, 40]):

    # plot the 2D blob field in colors
    test_data.isel(time=itime).plot(ax=axs[i])

    axs[i].set_title(f"timeframe = {itime}")
../../_images/tutorials_Basics_Idealized-Case-1_Tracking-of-a-Test-Blob-in-2D_15_0.png

This tells as that our data is a single moving blob, which is what we are going the analyze with tobac now.

2. Feature Detection

The first step of the general working routine of tobac is the identification of features. This essentially means finding the maxima or minima of the data.

To use the according functions of tobac we need to specify:

  • the thresholds below/above the features are detected

  • the spacing of our data

The spacing of the temporal and spatial dimension can be extracted from the data using a build-in utility:

[7]:
dxy, dt = tobac.utils.get_spacings(test_data)

To get an idea of which order of magnitude our thresholds should be, we check the maximum of our data:

[8]:
test_data.max()
[8]:
<xarray.DataArray 'w' ()>
array(10.)

Since we know that our data will only have one maximum it is reasoable to choose 9 as our threshold, but keep in mind that we could also add multiple values here if our data would be more complex.

[9]:
threshold = 9

Now we are ready to apply the feature detection algorithm. Notice that this is a minimal input. The function has several other option we will cover in later tutorials.

[10]:
%%capture
features = tobac.themes.tobac_v1.feature_detection_multithreshold(
    test_data, dxy, threshold
)

Let’s inspect the resulting object:

[11]:
features
[11]:
<xarray.Dataset>
Dimensions:                  (index: 51)
Coordinates:
  * index                    (index) int64 0 1 2 3 4 5 6 ... 45 46 47 48 49 50
Data variables: (12/13)
    frame                    (index) int64 0 1 2 3 4 5 6 ... 45 46 47 48 49 50
    idx                      (index) int64 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    hdim_1                   (index) float64 10.0 10.94 11.71 ... 48.15 48.6
    hdim_2                   (index) float64 10.0 11.85 13.66 ... 97.23 98.2
    num                      (index) int64 69 66 65 65 65 65 ... 39 34 23 13 5
    threshold_value          (index) int64 9 9 9 9 9 9 9 9 9 ... 9 9 9 9 9 9 9 9
    ...                       ...
    time                     (index) object 2000-01-01 12:00:00 ... 2000-01-0...
    timestr                  (index) object '2000-01-01 12:00:00' ... '2000-0...
    projection_y_coordinate  (index) float64 1e+04 1.094e+04 ... 4.86e+04
    projection_x_coordinate  (index) float64 1e+04 1.185e+04 ... 9.82e+04
    latitude                 (index) object 24.1 24.12 24.14 ... 24.97 24.98
    longitude                (index) object 150.1 150.1 150.1 ... 150.5 150.5

The ouputs tells us that features were found in 51 frames (index 0 to 50) of our data. The variable idx is 1 for every frames, which means that only 1 feature was found in every time step, as we expected. hdim_1 and hdim_2 are the position of this feature with respect to the y and x-indices.

We can plot the detected feature positions on top of the colored blob.

[12]:
fig, axs = plt.subplots(ncols=1, nrows=3, figsize=(12, 16), sharey=True)
plt.subplots_adjust(hspace=0.5)

for i, itime in enumerate([0, 20, 40]):

    # plot the 2D blob field in colors
    test_data.isel(time=itime).plot(ax=axs[i])

    # plot the detected feature as black cross
    f = features.sel(index=[itime])
    f.plot.scatter(
        x="projection_x_coordinate",
        y="projection_y_coordinate",
        ax=axs[i],
        color="black",
        marker="x",
    )

    axs[i].set_title(f"timeframe = {itime}")
../../_images/tutorials_Basics_Idealized-Case-1_Tracking-of-a-Test-Blob-in-2D_30_0.png

The function has succesfully detected the maximum of our data in the individual timeframes.

3. Trajectory Linking

After we are done finding the features and associated segments for each frame it is necessary for further analysis to keep track of those elements troughout time. Linking is the tool for that. It connects the features of the timesteps which belong together. We are going to use the linking_trackpy() function here. The required inputs are the features, the two spacings and a maximum velocity of the features.

[13]:
trajectories = tobac.themes.tobac_v1.linking_trackpy(
    features, test_data, dt=dt, dxy=dxy, v_max=100
)
Frame 50: 1 trajectories present.

fails without v_max

Unsurprisingly, one trajectory was found. The returned object is another Dataset:

[14]:
trajectories
[14]:
<xarray.Dataset>
Dimensions:                  (index: 51)
Coordinates:
  * index                    (index) int64 0 1 2 3 4 5 6 ... 45 46 47 48 49 50
Data variables: (12/15)
    frame                    (index) int64 0 1 2 3 4 5 6 ... 45 46 47 48 49 50
    idx                      (index) int64 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    hdim_1                   (index) float64 10.0 10.94 11.71 ... 48.15 48.6
    hdim_2                   (index) float64 10.0 11.85 13.66 ... 97.23 98.2
    num                      (index) int64 69 66 65 65 65 65 ... 39 34 23 13 5
    threshold_value          (index) int64 9 9 9 9 9 9 9 9 9 ... 9 9 9 9 9 9 9 9
    ...                       ...
    projection_y_coordinate  (index) float64 1e+04 1.094e+04 ... 4.86e+04
    projection_x_coordinate  (index) float64 1e+04 1.185e+04 ... 9.82e+04
    latitude                 (index) object 24.1 24.12 24.14 ... 24.97 24.98
    longitude                (index) object 150.1 150.1 150.1 ... 150.5 150.5
    cell                     (index) float64 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    time_cell                (index) timedelta64[ns] 00:00:00 ... 00:50:00

The new variable cell now indexes the features in the different time steps. Therefore we can use it to create a mask for our moving feature:

[15]:
track_mask = trajectories["cell"] == 1.0

This mask can then be used - to select the track:

[16]:
track = trajectories.where(track_mask).dropna("index")
  • and to show the track in our plots:

[17]:
fig, axs = plt.subplots(ncols=1, nrows=3, figsize=(12, 16), sharey=True)
plt.subplots_adjust(hspace=0.5)
# fig.tight_layout()


for i, itime in enumerate([0, 20, 40]):

    # plot the 2D blob field in colors
    test_data.isel(time=itime).plot(ax=axs[i])

    # plot the track
    track.plot.scatter(
        x="projection_x_coordinate",
        y="projection_y_coordinate",
        ax=axs[i],
        color="red",
        marker="o",
        alpha=0.2,
    )

    # plot the detected feature as black cross
    f = features.sel(index=[itime])
    f.plot.scatter(
        x="projection_x_coordinate",
        y="projection_y_coordinate",
        ax=axs[i],
        color="black",
        marker="x",
    )

    axs[i].set_title(f"timeframe = {itime}")
../../_images/tutorials_Basics_Idealized-Case-1_Tracking-of-a-Test-Blob-in-2D_42_0.png

4. Segmentation

Another step after the feature detection and tracking is the segmentation of the data. This means we find the area surrounding our features belonging to the same cluster. Logically, we now need the already detected features as an additional input.

Just a small thought here:

We often introduce the different steps as 1.feature detection, 2.segmentation and 3. tracking, so having the segmenation step actually before the trajectory linking. The order actually does not matter since the current segmentation approach is based on the feature detection output only and can be done before or after the linking.

On further extensions:

  • Also, one could actually use the output of the segmentation (features_test Dataset in the example below) as input for the tracking with the advantage that information on the area (ncells) is added in the dataframe.

  • One could also use the output of the tracking in the segmentation (trajectories Dataset from above) with the advantage that mask will contain only the features that are also linked to trajectories.

These possibilities exist and could be utlized by you …

[18]:
%%capture
mask, features_test = tobac.themes.tobac_v1.segmentation(
    features, test_data, dxy, threshold=9
)

As the name implies, the first object returned is a Boolean mask that is true for all segments belonging to features. The second output is again the features of the field.

[19]:
mask
[19]:
<xarray.DataArray 'segmentation_mask' (time: 100, y: 50, x: 100)>
[500000 values with dtype=int32]
Coordinates:
  * time       (time) datetime64[ns] 2000-01-01T12:00:00 ... 2000-01-01T13:39:00
  * y          (y) float64 0.0 1e+03 2e+03 3e+03 ... 4.7e+04 4.8e+04 4.9e+04
  * x          (x) float64 0.0 1e+03 2e+03 3e+03 ... 9.7e+04 9.8e+04 9.9e+04
    latitude   (y, x) float64 ...
    longitude  (y, x) float64 ...
Attributes:
    long_name:  segmentation_mask

Since True/False corresponds to 1/0 in python, we can visualize the segments with a simple contour plot:

[20]:
fig, axs = plt.subplots(ncols=1, nrows=3, figsize=(12, 16), sharey=True)
plt.subplots_adjust(hspace=0.5)


for i, itime in enumerate([0, 20, 40]):

    # plot the 2D blob field in colors
    test_data.isel(time=itime).plot(ax=axs[i])

    # plot the mask outline
    mask.isel(time=itime).plot.contour(levels=[0.5], ax=axs[i], colors="k")

    # plot the detected feature as black cross
    f = features.sel(index=[itime])
    f.plot.scatter(
        x="projection_x_coordinate",
        y="projection_y_coordinate",
        ax=axs[i],
        color="black",
        marker="x",
    )

    axs[i].set_title(f"timeframe = {itime}")
../../_images/tutorials_Basics_Idealized-Case-1_Tracking-of-a-Test-Blob-in-2D_48_0.png

Keep in mind that the area of the resulting segments crucially depends on the defined thresholds.

5. Statistical Analysis

Blob Velocity / Motion Speed

Now different functions of tobacs analysis module can be used to calculate or plot properties of the tracks and the features. For example the velocity of the feature along the track can be calulated via:

[21]:
vel = tobac.analysis.calculate_velocity(track)

The expected velocity (hard-coded in the test function) is:

[22]:
expected_velocity = np.sqrt(30**2 + 14**2)

Plotting the velocity vs the timeframe will give us

[23]:
plt.figure(figsize=(10, 5))
plt.tight_layout()
plt.plot(vel["frame"], vel["v"])

plt.xlabel("timeframe")
plt.ylabel("velocity [m/s]")
plt.grid()

plt.axhline(expected_velocity, color="darkgreen", lw=5, alpha=0.5)
[23]:
<matplotlib.lines.Line2D at 0x7fe2abdd2890>
../../_images/tutorials_Basics_Idealized-Case-1_Tracking-of-a-Test-Blob-in-2D_57_1.png

We can also create an histogram of the detected velocities:

[24]:
hist, edges = tobac.analysis.velocity_histogram(
    track,
    bin_edges=np.arange(14, 43, 3),
)
[25]:
width = 0.9 * (edges[1] - edges[0])
center = (edges[:-1] + edges[1:]) / 2

plt.tight_layout()
plt.bar(center, hist, width=width)
plt.ylabel("count")
plt.xlabel("velocity [m/s]")

plt.axvline(expected_velocity, color="darkgreen", lw=5, alpha=0.5)
[25]:
<matplotlib.lines.Line2D at 0x7fe2abcf2e00>
../../_images/tutorials_Basics_Idealized-Case-1_Tracking-of-a-Test-Blob-in-2D_60_1.png

Area

The area of the features can also be calculated and plotted throughout time:

[26]:
area = tobac.analysis.calculate_area(features, mask)
[27]:
plt.figure(figsize=(10, 5))
plt.tight_layout()
plt.plot(area["frame"], area["area"])
plt.xlabel("timeframe")
plt.ylabel(r"area [$m^2$]")
plt.grid()
../../_images/tutorials_Basics_Idealized-Case-1_Tracking-of-a-Test-Blob-in-2D_63_0.png

Lifetime

Another interesting property for real data are the lifetimes of our features. Tobac can also produce a histogram of this:

[28]:
hist, bins, centers = tobac.analysis.lifetime_histogram(
    track, bin_edges=np.arange(0, 200, 20)
)
[29]:
plt.tight_layout()
plt.bar(centers, hist, width=width)
plt.ylabel("count")
plt.xlabel("lifetime")
plt.show()
../../_images/tutorials_Basics_Idealized-Case-1_Tracking-of-a-Test-Blob-in-2D_66_0.png

We can deduce, that our singular feature in the data has a lifetime of 50.