Working with Datasets
This page provides basic examples on how to work with datasets. For full explanation of all options, refer to the conceptual part of the documentation.
Create a dataset¶
Typically, creating a dataset means importing data using the transfer and conversion pipeline. See Upload files and Download files. Additionally, Timeseries service allows creation of a Timeseries dataset directly; see Create timeseries through the REST API and SDK timeseries.
List datasets¶
The DatasetClient
has a simple method to list datasets directly under a specific project.
var datasets = await _datasetClient.GetDatasetListAsync(projectId);
The method also has an override to filter by dataset name. Properties of the TextFilter
are applied in conjunction (AND
), while the strings in each property are applied in disjunction (OR
).
var validFilters = new [] {
new TextFilter() { Contains = new string[] { "one" } },
new TextFilter() { EndsWith = new string[] { "one" } },
new TextFilter() { StartsWith = new string[] { "one" } },
new TextFilter() { Contains = new string[] { "one", "two" } },
new TextFilter() { Contains = new string[] { "one", "two" }, EndsWith = new string[] { "one" } },
new TextFilter()
};
foreach (var nameFilter in validFilters) {
var filteredDatasets = await client.GetDatasetListAsync(projectId, nameFilter, false, default);
Console.WriteLine(filteredDatasets.Count());
}
To list datasets recursively (i.e. datasets within a project and within projects in that project), use the HierarchyClient
.
For instance, this example defines a method and uses it to fetch all Tile datasets under a specific project.
private async IAsyncEnumerable<IEnumerable<DatasetRecursiveListOutput>> _GetAllDatasets(
Guid projectId, DatasetType? datasetType,
[EnumeratorCancellation] CancellationToken cancellationToken = default
)
{
var page = await _hierarchyClient.GetRecursiveDatasetListAsync(projectId, datasetType, generateSasTokens: false, pageSizeLimit: 500, cancellationToken:cancellationToken);
while (page != default)
{
if (!page.Items.Any())
yield break;
yield return page.Items;
page = await page.NextAsync(cancellationToken);
}
}
var projects = await _GetAllDatasets(projectId, DatasetType.files);
Get dataset details¶
var dataset = await _datasetClient.GetDatasetAsync(datasetId);
Console.WriteLine(dataset.Id);
Console.WriteLine(dataset.Name);
Console.WriteLine(dataset.Description);
Update dataset description and metadata¶
To update a dataset, you must include the dataset's RowVersion
to respect potential edits from other users.
Therefore, you typically need to Get the dataset first. Just like with projects, Dataset metadata property can serve many purposes; see How to work with projects.
This sample updates a dataset's Name
by appending extra text to its original name.
var dataset = await _datasetClient.GetDatasetAsync(datasetId);
var updatedName = dataset.Name + " Expired at 2024/05/02";
var edit = new EditDatasetInput() { Id = dataset.Id, Name = updatedName, RowVersion = dset.RowVersion };
await _datasetClient.UpdateDatasetAsync(edit, projectId);
Get dataset data¶
There is a big difference between working with Dataset and working with Dataset data. Datasets are just records with descriptive information about the actual data. In other words, Dataset is a "metadata" record for actual data. The ways to get or modify actual data differ based on the dataset type. Please, refer primarily to the "Concepts" chapters of this documentation for further details.
Another important difference is between "Raw file storage" and "dedicated storage". Raw file storage is handled by the "File service". It allows upload and download of arbitrary data. The Platform has generally no understanding of internal structure of such data though. Also, data in the Raw file storage cannot be updated. On the other hand, dedicated storage, such as GIS Service or Multidimensional Service, has understanding of the internal structure of the data and allows for more diverse queries or even updates.
Some basic examples of SDK usage are below. For further details about a certain dataset type, refer to the corresponding "Data storage services" section of this documentation.
Raw File Storage¶
In C#, and in Python too, the simplest way to get data from the Raw file storage is using the TransferClient
. You can download to a specific path or use a stream. This sample will work in many situations. However, if you encounter issues or if your use case is complex (e.g. a different programming language, uploading large number of small or big files in close succession, etc.), explore also the RawClient
and refer to the full File service documentation section.
As mentioned above, data in Raw file storage cannot be updated. For instance, if you upload a long CSV file to the Raw file storage and then you want to upload a new version, you need to upload the new version as a new dataset and delete the dataset that represents the now old version. This means the new CSV data will have a different dataset ID. If the Dataset ID change creates an issue for your application, consider using a different identifier for data in your application, such as
dataset.Metadata["myapp:dataset-id"] = 12345
, or usedataset.Name = 12345
anddataset.Metadata["myapp:dataset-name"] = "The name"
. Also, see Sharing for possible alternatives.
await _transferClient.DownloadAsync(projectId, dataset.Id, $"c:/temp/{dataset.Name}");
await using (var fileStream = new FileStream($"c:/temp/{dataset.Name}", FileMode.Create))
{
await _transferClient.DownloadAsync(projectId, dataset.Id, fileStream);
}
TimeSeries Service¶
Basic work with TimeSeries data using the TimeSeriesClient
:
// Create dataset
var dataset = await _tsClient.CreateDatasetAsync(projectId, "My Time Series Dataset");
var item = new ItemDefinition
{
Name = "Temperature",
DataType = AttributeDataType.Single,
Item = Generic.MikeZero.eumItem.eumITemperature,
Unit = Generic.MikeZero.eumUnit.eumUdegreeCelsius,
TimeSeriesType = TimeSeriesDataType.MeanStepBackward
};
// Add timeseries
var ts = await _tsClient.AddTimeSeriesAsync(projectId, dataset.Id, item);
// Add time serie values
var data = new TimeSeriesData<float>(
new[] { DateTime.Today.ToUniversalTime(), DateTime.Today.AddDays(1).ToUniversalTime() },
new float?[] { 12.34f, 56.78f }
);
await _tsClient.AddTimeSeriesValuesAsync<float>(projectId, dataset.Id, ts.Id, data);
// Get timeseries values
var tsvResult = await _tsClient.GetTimeSeriesValuesAsync<float>(projectId, dataset.Id, ts.Id);
// Add additional timeseries values
var moreData = new TimeSeriesData<float>(
new[] { DateTime.Today.AddDays(2).ToUniversalTime() },
new float?[] { 90.12f }
);
await _tsClient.AddTimeSeriesValuesAsync<float>(projectId, dataset.Id, ts.Id, moreData);
// If your dataset contains many time series with various properties, you can get a subset of those like this:
var queryInput = new QueryFilter
{
Conditions = new List<QueryCondition>()
{
new AttributeQueryCondition() { Name = "Item", Operator = AttributeOperator.Equal, Value = "AirPressure" }
}
};
var timeSeriesDefinitions = await _tsClient.QueryTimeSeriesAsync(projectId, dataset.Id, queryInput);
foreach (var definition in timeSeriesDefinitions){
var fields = string.Join(',', definition.DataFields.Select(f => f.Name));
Console.WriteLine($"{definition.Id}: {fields}");
}
// Finally, it is possible to get time series values from multime time series at once. Returned data are in order of the input time series Ids.
var values = await _tsClient.GetMultipleTimeSeriesValuesAsync<float>(projectId, dataset.Id, new string[] { "TS98765", "TS12345" });
foreach (var timeSeriesData in values) {
var sum = timeSeriesData.Values.Sum(v => v ?? 0.0f); // sum the time series values where value is not null
var nan = timeSeriesData.Values.Count(v => !v.HasValue); // count number of missing time series values
Console.WriteLine($"{timeSeriesData.Count} time steps with {nan} missing values abd total sum = {sum}");
}
Multidimensional Service¶
For querying Multidimensional data, refer to Query timeseries, Query timestep, and other topics in the Multidimensional service description.
GIS Service¶
The GIS Service provides a registry of all coordinate systems recognized by the Platform. That is most of EPSG codes, all DHI projections, and a few custom coordinate systems.
var crsList = await _gisClient.GetCoordinateSystemListAsync();
// you can try to find a coordinate system by name, but this may return multiple records as the name can be similar for multiple coordinate systems.
var nameContains = "WGS 84 / Pseudo-Mercator";
var crss = await _gisClient.GetCoordinateSystemListAsync();
Console.WriteLine(crss.SingleOrDefault(c => c.Name == nameContains)?.Wkt);
// Getting coordinate system by SRID:
var wgs84 = await _gisClient.GetCoordinateSystemAsync(4326);
Console.WriteLine($"{wgs84.Authority}, {wgs84.Name}, {wgs84.Wkt}, {wgs84.Proj4String}");
The GIS Service returns data in GeoJson format. A dataset in the GIS service refers to a FeatureClass
and getting data from a FeatureClass
in C# is straightforward:
var fc1 = await _gisClient.GetFeatureClassAsync(projectId, datasetId);
// The `GetFeatureClassAsync` methods returns all feature and this may fail for large FeatureClasses. the `QueryFeatureClassAsync` method allows you to get a more managable subset of features form a large FeatureClass. Here we select all features called 'Austria' that are completely with a given polygon.
var queryGeometryWkt = "POLYGON((7.59 53.21,23.76 53.21,23.76 44.70,7.59 44.70,7.59 53.21))";
var wktReader = new WKTReader();
var queryGeometry = wktReader.Read(queryGeometryWkt);
var conditions = new QueryCondition[] {
new AttributeQueryCondition { Name="name", Operator = AttributeOperator.Equal, Value = "Austria" },
new SpatialQueryCondition { Geometry = queryGeometry, Operator = SpatialOperator.Within }
};
var fc = await _gisClient.QueryFeatureClassAsync(_fixture.ProjectId, datasetId, conditions);
// Write out all attribute names and types
Console.WriteLine(string.Join(';', fc.Attributes.Select(a => $"{a.Name}:{a.DataType.GetType().Name}")));
// Find the 10 largest values of an attribute "myAttribute", given the :myAttribute is `float`
var largest = fc.Features.Select(f => (float?)f.Attributes["myAttribute"])
.Order().Reverse().Take(10).ToArray();
// Select all features with area smaller than 1000.0 unit^2, given that the geometry type is Polygon and you know the area units of the coordinate system.
var smallOnes = fc.Features.Count(f => f.Geometry.Area < 1000.0);
The GeoJson format is suitable for many JavaScript mapping libraries, including Open Layers. A sample illustrating how to add a GIS Feature Class to an Open Layers map is here.
Tiles service¶
Raster tiles as WMTS or XYZ services can be rendered in a map. See Consuming tiles in Open Layers.
Move a dataset to a different project¶
The DatasetClient
has a method to move dataset from one project to another. The dataset is simply assigned to the new project and the dataset keeps its original dataset ID.
await _datasetClient.MoveDatasetAsync(sourceDatasetId, targetProjectId);
Delete a dataset¶
To delete a dataset completely, set the parameter permanently
to true
. The default value of the parameter is false
, which means the dataset would be "soft deleted" and ends up in the Recycle Bin, where you still pay for the storage consumption. Note however, you cannot restore a permanently deleted dataset. Deleting a dataset deletes also its data.
await projectClient.DeleteProjectAsync(subProject.Id, permanently: true);