Timing considerations when requesting views or exports via QSDA API

Summary: A QSDA Collection should be in Status:Ready before fetching any View data.

A QSDA “Analysis” consists of two phases:

  1. “Collection” – The raw metadata – Fields, Expressions, etc.
  2. “Analysis” – Enrichment to metadata – Field usage counts, Flags, etc.

In the API, the entity is a “Collection” regardless of phase or status.

When a new Collection is started, whether through the UI or the API, a Collection is created and populated with the raw metadata in the collection phase. The collected metadata is stored in the repository. Then the analysis begins. Data added during the analysis phase is not stored in the repository.

When an analysis in unloaded from memory, only the raw collected data remains in the repository. If you later load that collection to memory, the analysis is automatically started. The API version of the load from repository is POST /collectors/id/load.

In the UI, the “View” button is not available until the analysis completes, indicated by the Collection Status:Ready.

If you use the API to request view data eg GET /collectors/id/views/flags before the analysis is complete, you may receive incomplete or missing data.

Before requesting any views from the API, you must verify the Collection Status is “Ready”.

You can retrieve Collection Status with a GET /collectors/id call. You may either loop on this call or use the replyEndpoint parameter of POST /collectors or POST /collectors/id/load to be notified when the Analysis is complete. You should still make at least one call to check status, as the completed status may be “Error”.

If you make a /views/ request to a collection that is not currently loaded, a /load request is automatically performed so the view can be returned. It does not wait for the analysis to complete so the view is highly likely to be incomplete. I regret this design choice.

I am proposing to modify all /views/* endpoints to check the Collection Status. If the the Status is not equal to Ready, the HTTP Response Code will be 409 Conflict and the response message will be:

{
  "isSuccessful": false,
  "message": "Collection not ready",
  "data": []
}

Do you see any issue with this change? You will still be responsible for checking that a Collection is ready, but the new behavior should guard against inadvertently getting incomplete results.

Here are the possible values for Collection Status in process order.

Created
Queued
CollectingProperties
CollectingCalctime
Collected
Analyzing
Ready
Error

-Rob