Item data storage¶
The data values for each item are ordered and reorganized based on the selected spatial index tree. This leads to less chunky data retrieval. Therefore, for example with grids, we can read much smaller amount of data than we would read with reading full rows.
Data storage strategies/options¶
Based on the dataset definition and query/usage requirements item data can be stored in 1 or more blobs. There are 2 typical query types with regard to temporal domain
- Time step oriented single time step storage - Gets 1 time step data in 1 request, spatial domain can be big
- Time step oriented multi time step storage - Gets 1 time step data in 1 request, spatial domain should be small (e.g. <50000)
- Time series oriented storage - Gets many time steps data in 1 request, spatial domain should be smaller to avoid extra large results
Timestep oriented single timestep storage¶
- Optimized for Timestep retrieval.
- Each Timestep is stored in its own blob.
- Typical use case is when a time step of data should be rendered in a map.
Timestep oriented multi timestep storage¶
Used for internal storage efficiency in case of small spatial domains.
- Optimized for Timestep retrieval.
- Temporal domain is divided into groups. Each timestep group is stored in 1 file to avoid large number of small files.
- This storage strategy is used for smaller spatial domains with larger amount of timesteps
Timeseries oriented storage¶
- Optimized for Timeseries retrieval.
- This storage strategy is used for larger amount of timesteps
- Both temporal domain and spatial domain are divided into groups, each spatio-temporal group is stored in 1 file. Byte array is ordered first by time than by spatial position.
Timeseries oriented storage supports 2 block types:
Fixed block timeseries storage
Pros | Cons |
---|---|
Flexible updates, even single timestep can be added | Always allocates fixed block size |
Allows gaps | Uses more space |
Designed for incremental forecasts | Slower import/update operation |
Variable block timeseries storage
Pros | Cons |
---|---|
Less storage size, just imported data are stored | Only full blocks can be updated |
Faster import/update operation | Gaps cannot be easily filled later |
Designed for historical data load |
When importing data using Timeseries oriented storage, it is necessary to specify some extra writer parameters. See Create timeseries oriented dataset.