Do not fetch details in loops
Suppose you want to make a List View (or a Table) of Datasets. You use the /api/metadata/project/{projectId}/dataset/list-summaries
endpoint to fetch a list basic dataset information. The response looks like this:
{
"data": [
{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"name": "A Dataset",
"datasetType": "file",
"dataPath": "/foo/bar/spam"
}
]
}
You notice that the list-summaries response does not include the createdAt
property that you want to show in your List View. You decide to loop over the list-summaries result and get each dataset detail using the GET /api/metadata/dataset/{id}
endpoint. This creates a lot of requests, which puts considerable load on the Platform, but also slows down your application because it needs to wait for many responses. A better approach is to use the GET /api/metadata/project/{projectId}/dataset/list
endpoint, where the response includes the createdAt
property for all datasets in a project.
A more complicated situation like this can occur if you use the /api/metadata/project/{projectId}/dataset/recursive-list
endpoint. That is a very efficient way to get information about all datasets under a certain project and its subprojects.
var allDatasets = new List<DatasetRecursiveListOutput>();
var page = await _hierarchyClient.GetRecursiveDatasetListAsync(projectId, datasetType, generateSasTokens: false, pageSizeLimit: 500, cancellationToken:cancellationToken);
while (page != default)
{
if (!page.Items.Any())
break;
allDatasets.Append(page.Items); // Even better would be wrapping this loop in a method and using `yield return page.Items;` and `yield break`.
page = await page.NextAsync(cancellationToken);
}
However, the response does not include all dataset properties. For instance, you wanted to display extents of all the datasets in a map, but there is no spatialInformation.location
property in this response. Consider the following option:
List projects recursively and use full GET /api/metadata/project/{projectId}/dataset/list
to get datasets with all properties:
var allProjects = new List<ProjectRecursiveListOutput>();
var allDatasets = new List<DatasetOutput>();
var page = await _hierarchyClient.GetRecursiveProjectListAsync(projectId, 500, cancellationToken:cancellationToken);
while (page != default)
{
if (!page.Items.Any())
break;
allProjects.Append(page.Items); // Even better would be wrapping this loop in a method and using `yield return page.Items;` and `yield break`.
page = await page.NextAsync(cancellationToken);
}
foreach (var project in allProjects) {
var dsets = await _datasetClient.GetDatasetListAsync(projectId);
allDatasets.Append(dsets);
}
If that does not lead to satisfactory performance, we recommend reaching out to the Platform team and discussing your specific use case with them before you resort to looping over the datasets in the recursive-list response and call GET /api/metadata/dataset/{id}
:
var datasetsWithSpatialInfo = new List<DatasetOutput>();
forach(var dset in allDatasets) {
var dataset = await _datasetClient.GetDatasetAsync(datasetId);
datasetsWithSpatialInfo.Append(dataset);
}
These examples used Datasets, but you may find similar situations with other resources too. There are tradeoffs to consider. For instance, is fewer requests better if it leads to much higher volume of data transferred? To understand how to use the Platform most efficiently, we recommend studying conceptual part of the documentation.