Exploratory Analysis in Python

Use Python to explore and analyze data in Synnax

This guide will walk you through basic analysis on an example data set using Synnax. We’ll import the data set, explore it using the Synnax Console, run analysis, and attach post-processed results.

Prerequisites

Start a local cluster

This guide assumes you’ve started a local cluster running on localhost:9090 using the instructions here. You’re more than welcome to use a remote cluster, but you’ll need to keep in mind that those connection parameters may be different than the ones used in this guide.

Install the Synnax Console

We’re going to use the Synnax Console to plot our data. To install it, follow the instructions here.

Install and authenticate the Python client

Finally, you’ll need to make sure you have the synnax Python client installed and authenticated with your cluster. A guide for doing so is available here.

Importing a data set

Our first step is to import a data set into Synnax. We’ll be using a sample CSV file we’ve generated just for this guide. To download it, run the following command:

curl -O https://raw.githubusercontent.com/synnaxlabs/synnax/main/docs/examples/april_9_wetdress.csv

Next, we’ll use the CLI to import our data set into Synnax. To do so, run

synnax ingest april_9_wetdress.csv

This command will begin an interactive import process that will prompt us with a few questions:

Welcome to the Synnax Ingestion CLI! Let's get started.
Using saved credentials.
Connection successful!
Would you like to ingest all cahnnels? [y/n] (y):

This question is asking if we want to import data for all columns in the file. We do, so we’ll hit enter. Next, Synnax will check if all of the columns in the CSV correspond with existing channels in our cluster:

Validating that channels exist...
The following channels were not found in the database:
┏━━━━━━━━━━━━━━━━━┓
┃ name            ┃
┡━━━━━━━━━━━━━━━━━┩
│ ec_pressure_5   │
│ ec_pressure_7   │
│ ec_pressure_9   │
│ ec_pressure_11  │
│ ec_pressure_12  │
│ ec_pressure_14  │
│ ec_pressure_19  │
│ ec_tc_0         │
│ ec_tc_1         │
│ Time            │
└─────────────────┘
Would you like to create them? [y/n] (y):

We do want to create them, so we’ll just hit enter.

Any any channels indexes (e.g. timestamps)? [y/n] (y):

Synnax is asking if any of the columns in the contain timestamps that tell us when samples were taken. These types of columns are called “indexes” and are used to execute queries by time range. If you’d like to read more about the different channel types in Synnax, see this page.

In our case, we have a ‘Time’ column that contains timestamps, so we’ll enter y and hit enter. Synnax will then ask us to select the name of our index column:

You can enter 'all' for all channels or a comma-separated list of:
    1) Names (e.g. 'channel1, channel2, channel3')
    2) Channel indices (e.g. '1, 2, 3')
    3) A pattern to match (e.g. 'channel*, sensor*')
    
channels:

We’ll enter Time, and move on to the next step:

Do all non-indexed channels have the same data rate or index? [y/n] (y):

This question asks us if our Time column represents the timestamps for all of the other columns in the data set. In our case, it does, so we’ll enter y and continue. Synnax will then ask us to enter the name of our time column:

Enter the name of an index or a data rate:

Again, we’ll enter Time and continue. Then, Synnax will ask us about how we’d like to assign data types to our channels:

Please select an option for assigning data types:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ value                                                         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Guess data types from file                                    │
│ Assign the same data type to all channels (excluding indexes) │
│ Group channels by data type                                   │
└───────────────────────────────────────────────────────────────┘
Select an option # [0/1/2] (0):

We’ll select the first option, which will automatically guess the channel’s data types from the data in the file. We’ve now successfully created all of the channels in our data set.

The next time we import a data set with the same channels, Synnax will automatically detect the channels and skip this process.

Now, Synnax will ask us to confirm the starting timestamp for our data set.

Identified start timestamp for file as 2023-04-10T12:07:23.662716-04:00.
Is this correct? [y/n] (y):

This looks right, so we’ll hit enter. Finally, Synnax will ask us to name our data set, which we’ll use to reference it in future steps.

Please enter a name for the data set
Name (4_9_wetdress_data_cleaned.csv):

We’ll enter April 9 Wetdress and hit enter. Synnax will then begin importing the data set. When it’s done, we’ll see the following message:

━━━━━━━━━━━━━━━━━━ 100% 85740 out of 85570 samples - 7477108.2235981515 samples/s

Plotting our data set

Now that we’ve imported our data set, we can use Python to analyze it. We’ll create a new Python file and instantiate the Synnax client:

import synnax as sy

client = sy.Synnax()

Next, we’ll retrieve the range referencing the data we just imported:

data = client.ranges.retrieve("April 9 Wetdress")

ec_pressure_12 on our data set contains our flowmeter’s pressure differential readings in psi. To make things easier to work with, we’ll alias ec_pressure_12 to flow_dp:

data.ec_pressure_12.set_alias("flow_dp")

Next, we’ll filter our data set to only include samples where flow_dp is greater than 5 psi

pressure_mask = data.flow_dp > 5
filtered_dp = data.flow_dp[pressure_mask]
filtered_time = data.Time[pressure_mask]

We’ll also clean up our Time channel to use elapsed seconds instead of unix epoch nanoseconds. We can use the sy.elapsed_seconds utility to accomplish this:

elapsed_time = sy.elapsed_seconds(filtered_time)

Finally, we’ll plot our data with matplotlib:

import matplotlib.pyplot as plt

plt.plot(elapsed_time, filtered_dp)
plt.xlabel("Elapsed Time (s)")
plt.ylabel("Flow DP (psi)")
plt.show()

We’ll save this file as plot.py and run it with the following command:

python plot.py

This will open a window with the following plot: