InfluxDB Data Model

InfluxDB Data Elements
Series and Points
Assumptions and Conventions
1. Conventions Used In This Book
2. Meta Syntax for Examples
Introducing the IOx data model

The goal of this section is to provide the reader with a firm foundation in the InfluxDB data model, specifically:

Understanding the InfluxDB input format (line protocol).
Understanding the InfluxDB output format (annotated CSV).
The relationships between these two formats.
Understanding how the Influxdb storage engine persists data in a table on disk

InfluxDB Data Elements

Buckets

All data in InfluxDB gets written to a bucket. A bucket is a container that can hold points for as many measurements as you like. Buckets have some important properties:

They can be named whatever you want (within reason).
You can create tokens that can control read and write permissions for a bucket, scoped only to a specific bucket.
You must set a retention period on a bucket, upon creation. A retention period determines how long InfluxDB will store your time series data within that bucket. Retention periods are critical for time series database management. They provide users with a convenient solution for automatic expiration of old, useless data which enables them to focus on the recent, valuable data instead while reducing their storage bills.

These topics will be covered in detail in a later section, for now, it is enough to know that measurements are stored in and read from a bucket.

Measurements

A measurement is the highest level of data structure within a bucket. InfluxDB accepts one measurement per point. Use a measurement to organize similar data. In some ways, you can think of it as analogous to a table in a traditional database. Measurements are also indexed, which enables you to query data within a measurement more quickly when you filter for a specific measurement. Measurements must be a string type. Measurement names cannot begin with an underscore.

To further understand measurements, let’s imagine you’re building a weather app and writing temperature data across multiple cities to InfluxDB. For this time series use case, you might create a measurement named “air_temperature”.

Tag Sets

A tag set consists of key-value pairs, where the values are always strings. Tags are essentially metadata, typically encoding information about the source of the data.

Imagine you’re building a weather app and writing temperature data across multiple cities to InfluxDB. For this time series use case, you might add a tag key to your data called “location” where the “location” tag key contains tag values for the cities weather you’re monitoring. This way you can easily query for all of the temperature data for “Los Angeles”, for example, by filtering for the “location” tag key with the “Los Angeles” tag value.

Critically, tags are indexed, so they are an important element of designing for query performance. However, tags are also optional. Tag keys cannot begin with an underscore, because InfluxDB reserves leading underscores for its own purposes.

Field Sets

A field set consists of key-value pairs, where the values can be strings, integers, or floats. Fields are the actual data to store, visualize, use for computations, etc…

Building on the weather app example, the temperature readings would be an example of a field.

Field sets are required for InfluxDB. This is where you store the value of your time series data. Field values can be of either an integer, float, or string type. Since field values are almost always different, field keys are frequently referred to as just fields. Fields are not indexed. Field also cannot begin with an underscore.

Series and Points

A series is defined by the unique combination of measurement, tag set(s), and fields. A point is a datapoint from a series at a specific timestamp. If you rewrite a duplicate point with identical measurement, tag set, field, and timestamp values, this write will overwrite your previous point. If you try to write a new point to the same series but you change the field type, this write will fail.

Assumptions and Conventions

Before diving into more nuanced and technical topics around the InfluxDB Data Model, let’s take a moment to establish some baseline assumptions and conventions.

Conventions Used In This Book

Chapters in this book will generally introduce concepts using abstract and simplified examples, and then follow with detailed examples using real world data.

Meta Syntax for Examples

For the abstract and simplified examples, we will use names in the form of:

attributeorvaluen

Where “attributeorvalue” refers a column name of a table or a value in a point, and “n” is a number simply differentiating multiple identifiers of the same role. Roles can comprise any of the following:

Measurement
Tag
Tag Value
Field

Samples will also generally include:

Field Values
Timestamps

Field values will be represented by actual values. Timestamps are in the following timestamp formats:

Unix: The unix timestamp is a way to track time as a running total of seconds–i.e. 1465839830100400200
RFC3339: A standard for date and time representation from a Request For Comment document by the Internet Engineering Task Force (IETF)–i.e. 2019-08-28T22:00:000000000Z
Relative Duration: -1h
Duration: 1h

Timestamps will be represented by actual values or by unixtime1 or rfc3339time1 . In case we want to refer to another timestamp in the same example, we will use unixtime2 or rfc3339time2.

To refer to a measurement in an example, we will use: measurement1. In case we want to refer to another measurement in the same example, we will use measurement2, and so forth.

An example of line protocol (explained in depth later) then, may look like:

measurement1,tag1=tagvalule1,tag2=tagvalue2 field1=1i,field2=2 1628858104

From time to time an example may be focused on understanding a type. In such cases, we will use the form “atype” where “type” is the data type under focus. For example, if we are discussing that field names are always strings. we may say:

r._field == "astring"

Or if we are discussing type conflicts, we may say:

aint == afloat

Instead of an example with specific values such as:

1i == 1.0

Introducing the IOx data model

InfluxDB is getting a major upgrade with the new IOx data model.

Read this section to gain a firm understanding of the new InfluxDB data model using an IOx bucket, with a particular emphasis on how the IOx data model differs from the TSM data model.

Note: IOx is still in testing. Stay tuned on influxdata.com to hear when we officially roll out the new IOx data model to all InfluxDB Cloud production clusters.

Similarities Between the TSM and IOx Data Model

Data elements
Flux compatibility
Line protocol compatibility

Data Elements

IOx retains the same data elements that users are accustomed to in InfluxDB, namely:

Buckets for storing data.
Measurements for grouping like data together.
Tag sets, in conjunction with a timestamp, identify a unique row in a table.
Fields for storing actual data.
Timestamps, of course, for ordering data by the time dimension.

Current documentation for these elements is available here, and will be updated to reflect any changes related to IOx.

Flux Compatibility

Users who have existing Flux scripts for their application that are working well for them can be reassured that those scripts will continue working unmodified. Throughout the development of IOx such backward compatibility was a consistent focus. However, users should note that with some slight changes to their Flux, to be described later, they will be able to achieve significant performance improvements using IOx.

Line Protocol Compatibility

Similar to Flux, significant effort has been invested to ensure that InfluxDB Line Protocol compatibility is retained. Therefore, the documentation on line protocol available here remains relevant with the caveat that users should think about the disk persistence and the output format using Table Flux differently if they want to take full advantage of IOx. This document will pick up the story there, the model for persisting data to disk.

From Line Protocol to Tables on Disk

In previous versions of InfluxDB, it was most useful to envision the database storing data as a series, with each series in a separate table. The IOx data model is arguably more intuitive, because it builds tables that are returned by Table Flux.

IOx, like TSM before it, is a “schema on write” database engine. This means you are free to write data with new measurements, tags, tag values, and fields after deploying your application, and IOx will accept those writes and persist them as tables. Note that there are some important caveats regarding schema on write, for example, you cannot change the type of a field by merely writing a value with a new type.

A table in IOx is defined by a measurement name. Columns in the table include:

Tag names
Field names
A single time column

Therefore, rows contain:

Tag values
Field values
A single timestamp

Rows are identified by their tag values and time stamp. This becomes relevant to understanding when looking at Upserts.

In the following examples, we’ll explore how tables are created on writes in IOx.

Line Protocol, Fields, and Tables

The simplest line protocol is one measurement and one field with a value. In lieu of providing a timestamp, we can allow the database to add the timestamp by omitting it from the line protocol:

measurement1 field1=1i

This will be persisted by IOx as a table:

Name: measurement1
field1	time
1i	timestamp1

As you can see in the above example, the table is defined by the measurement name, and contains a single column, called “field1” and a single row of data. Writing more similar data will grow the table as expected. If we write another line of line protocol:

measurement1 field1=2i

Name: measurement1
field1	time
1i	timestamp1
2i	timestamp2

If we write a third line of line protocol, this time with a different field name, IOx will add the field to the table, and null the previous writes.

measurement1 field2=3i

Name: measurement1
field1	field2	time
1i	null	timestamp1
2i	null	timestamp2
null	3i	timestamp3

IOx still supports writing multiple fields in a single line of line protocol, of course, so we can write some line protocol like this:

measurement1 field1=4i,field2=4i

Name: measurement1
field1	field2	time
1i	null	timestamp1
2i	null	timestamp2
null	3i	timestamp3
4i	4i	timestamp4

Adding Tags

On the surface, it may appear that tags and fields are equivalent in IOx. For example, a simple bit of line protocol with a single tag and field will result in a table with 3 columns, one for the tag, one for the field, and one for the time. Consider the following line protocol and the ensuing table.

measurement1,tag1=tagvalue1 field1=1i

Name: measurement1
field1	tag1	time
1i	tagvalue1	timestamp1

In a departure from the previous data model, adding a new tag value gets added to the same table:

measurement1,tag1=tagvalue2 field1=2i

Name: measurement1
field1	tag1	time
1i	tagvalue1	timestamp1
2i	tagvalue2	timestamp2

Now, if we do a new write, but with a different tag name, similar to adding a field, this will update the table with a new column for the tag, and missing tag values will be set to null.

measurement1,tag2=tagvalue3 field1=3i

Name: measurement1
field1	tag1	tag2	time
1i	tagvalue1	null	timestamp1
2i	tagvalue2	null	timestamp2
3i	null	tagvalue3	timestamp3

We can continue adding to the measurement1 table in this manner by introducing new tags and fields as needed.

measurement1,tag1=tagvalue1,tag2=tagvalue3,tag3=tagvalue4 field1=4i,field2=true

Name: measurement1
field1	field2	tag1	tag2	tag3	time
1i	null	tagvalue1	null	null	timestamp1
2i	null	tagvalue2	null	null	timestamp2
3i	null	null	tagvalue3	null	timestamp3
4i	true	tagvalue1	tagvalue3	tagvalue4	timestamp4

It’s still possible to write a minimal line of line protocol, and add that to the table:

measurement1 field1=1i

Name: measurement1
field1	field2	tag1	tag2	tag3	time
1i	null	tagvalue1	null	null	timestamp1
2i	null	tagvalue2	null	null	timestamp2
3i	null	null	tagvalue3	null	timestamp3
4i	true	tagvalue1	tagvalue3	tagvalue4	timestamp4
1i	null	null	null	null	timestamp5

Timestamps

As discussed above, a row is identified by a combination of timestamps and tag values. As such, duplicate timestamps are valid, so long as the tag values are different. For example, we can add multiple rows with timestamp5 by varying the tag values. Consider this line protocol, which has a duplicate timestamp.

measurement1,tag1=tagvalue1 field1=1i timestamp5

Remembering that a row is defined by its tag values, despite the timestamp and the field being identical, this still represents a new row:

Name: measurement1
field1	field2	tag1	tag2	tag3	time
1i	null	tagvalue1	null	null	timestamp1
2i	null	tagvalue2	null	null	timestamp2
3i	null	null	tagvalue3	null	timestamp3
4i	true	tagvalue1	tagvalue3	tagvalue4	timestamp4
1i	null	null	null	null	timestamp5
1i	null	tagvalue1	null	null	timestamp5

When you send a line of line protocol, the write takes into account the entire set of tag values. So long as a single tag value is different, even if all the fields and the timestamp are identical, the write will still result in a new row.

For example, the following row is identical to an existing row, except that the value for tag3 is tagvalue5 instead of tagvalue4. Therefore, this will result in a new row being added.

measurement1,tag1=tagvalue1,tag2=tagvalue3,tag3=tagvalue5 field1=4i,field2=true timestamp4

Name: measurement1
field1	field2	tag1	tag2	tag3	time
1i	null	tagvalue1	null	null	timestamp1
2i	null	tagvalue2	null	null	timestamp2
3i	null	null	tagvalue3	null	timestamp3
4i	true	tagvalue1	tagvalue3	tagvalue4	timestamp4
4i	true	tagvalue1	tagvalue3	tagvalue5	timestamp4
1i	null	null	null	null	timestamp5
1i	null	tagvalue1	null	null	timestamp5

Upserts

It is useful to think of all writes to InfluxDB as Upserts. The term “Upsert” means that the write will “Update or Insert” on write. It will update if it matches an existing record, or will insert a new record if no existing records match.

Because Upserts match on the timestamp and the tag values, it is not possible to update a tag value! Only field values can be updated.

This line of line protocol includes no tags, a duplicate timestamp, and a duplicate field name, but a different field value. So, this matches a timestamp and the tag set (which happens to be empty). Therefore, this will result in an update to the table.

measurement1 field1=2i timestamp5

Name: measurement1
field1	field2	tag1	tag2	tag3	time
1i	null	tagvalue1	null	null	timestamp1
2i	null	tagvalue2	null	null	timestamp2
3i	null	null	tagvalue3	null	timestamp3
4i	true	tagvalue1	tagvalue3	tagvalue4	timestamp4
4i	true	tagvalue1	tagvalue3	tagvalue5	timestamp4
2i	null	null	null	null	timestamp5
1i	null	tagvalue1	null	null	timestamp5

The following line protocol also matches an existing timestamp, and a tag set, so it will result in an update. While only one of the fields is present in the line protocol, the row is still matched, so the field will be updated.

measurement1,tag1=tagvalue1,tag2=tagvalue3,tag3=tagvalue5 field2=false timestamp4

Name: measurement1
field1	field2	tag1	tag2	tag3	time
1i	null	tagvalue1	null	null	timestamp1
2i	null	tagvalue2	null	null	timestamp2
3i	null	null	tagvalue3	null	timestamp3
4i	true	tagvalue1	tagvalue3	tagvalue4	timestamp4
4i	false	tagvalue1	tagvalue3	tagvalue5	timestamp4
2i	null	null	null	null	timestamp5
1i	null	tagvalue1	null	null	timestamp5

A new field can be introduced similarly. The following line protocol matches the timestamp and all tag values for the existing row, so the row will be updated, along with the new field value. As expected, all existing rows without the new field will be set to null for that field.

measurement1,tag1=tagvalue1,tag2=tagvalue3,tag3=tagvalue5 field3=0.0 timestamp4

Name: measurement1
field1	field2	field3	tag1	tag2	tag3	time
1i	null	null	tagvalue1	null	null	timestamp1
2i	null	null	tagvalue2	null	null	timestamp2
3i	null	null	null	tagvalue3	null	timestamp3
4i	true	null	tagvalue1	tagvalue3	tagvalue4	timestamp4
4i	false	0.0	tagvalue1	tagvalue3	tagvalue5	timestamp4
2i	null	null	null	null	null	timestamp5
1i	null	null	tagvalue1	null	null	timestamp5

Next Section