AEM Forms Data Workflow (Surviving The Purge!)

Posted by

Understanding the whereabouts of submitted form data is key to debugging your OSGi workflow applications. Unlike the JEE Workbench environment, we don’t have the ability to record and playback steps in real-time but OSGi provides a great deal of data for you to debug as long as you understand where and how data is stored. Additionally, the API’s available to access OSGi workflows and their data are a great deal easier to use and access than JEE. Accessing data programmatically to develop your own interfaces on top of OSGi workflows is by far easier than it ever was with JEE.

From Publish to Author

The first step in the data trail is the submission from Publish to Author.

A typical form submit on Publish will make calls to the two servlets to perform the submit:

http://%5Bserver%5D:%5Bport%5D/content/forms/af/%5Byour_form_name_and_location%5D/jcr:content/guideContainer.af.submit.jsp – standard submit.

and

http://[server]:[port]/content/forms/af/[your_form_name_and_location]/jcr:content/guideContainer.af.internalsubmit.jsp – submit with attachments and / or server-side validations.

then

http://%5Bserver%5D:%5Bport%5D//content/forms/af/%5Byour_form_name_and_location%5D/jcr:content/guideContainer.guideThankYouPage.html?owner=admin&status=Submitted – redirection to the thank you page.

If you’re using Lazy Loaded fragments, you’ll also notice a call out to Fragment loads while the user is selecting each of the Lazy Loaded Fragments:

http://%5Bserver%5D:%5Bport%5D//content/forms/af/%5Byour_form_name_and_location%5D/jcr:content/guideContainer.jsonhtmlemitter?templateId=%5Bthe_html_id_of_your_lazy_loaded_panel%5D&fetchJSON=true&fetchHTML=true

Note that if any of these calls return a 404 error while you’re testing your published forms, your dispatcher needs to be adjusted to allow the above calls to occur. This is typically a problem for forms authors as administrators rarely provision the dispatcher to properly allow for forms services after the forms package is installed.

If the form is able to submit, the Publish server will then immediately attempt to submit the form payload to the Author instance. In this workflow we sometimes refer to the Author instance as the “Processing instance”. A “Processing instance” is no different from any other Author instance, it may simply be a dedicated Author instance used for the accepting and processing of form submissions as apposed to the authoring of page and form content.

The AEM DS Settings Service configuration is one of the single most important services in the forms submission workflow. The processing server must be prefixed with the protocol (http/https) and reachable from the Publish server (if you’re using an Author instance on an embedded JEE server, it should be suffixed with “/lc”). The username and password provided should be a non-expiring user or service account with appropriate rights. Note that failure to provide an active server name and non-expiring credentials to the author instance can result in lost submitted data. Form submissions may still be accepted (and lost) by the Publish instance if this configuration is not maintained.

Form Payload Configuration

The Submit Action of the Adaptive Form drives the structure of the submitted payload.

In this example, the form will be submitted to an AEM workflow (Workflow 1). The submitted form data is named “formdata.xml”, the attachments will be stored in a folder named “attachments”, and the PDF generated (if any) when submitted will be named “documentOfRecord.pdf”. Note that these names are a typical standard naming convention that should be used for all of your forms. The exception being the form data extension being either “.xml” or “.json” depending on your data or schema type. Ensure developers across your organization are using similar standards and do not deviate from your selected standard.

In the above example we have a simple form with 5 text fields and 5 attachments. Following a successful submission, let’s take a look at how the workflow and payload entries match up with the form configuration.

Workflow Instance and Payload

To view the submitted workflow instances that are not yet completed on the Author instance select Adobe Experience Manager, Tools, Workflow, then Instances.

To view the submitted workflow instances that have successfully completed on the Author instance select Adobe Experience Manager, Tools, Workflow, then Archive. In the above example, our form submission is running a very simple single step workflow that will complete almost immediately and therefore should be displayed in the Archive. Regardless of the status, the workflow details and “payload” will still remain present until a purge occurs (discussed later on).

In the above display we have two complete submissions.

Each of the submitted workflows will have a “header” entry in the /var/workflows/instances folder. Underneath that folder there are a series of dates. These dates rotate on a regular basis depending on the number of workflow instances saved in each folder or as of the last restart of AEM. In this case, the entries are under the 2021-01-18 folder as the server was restarted on that date. The folder structure does not necessarily mean that workflows submitted the next day will be created in a 2021-01-19 folder. If the number of workflow instances remains low and the server is not restarted, the new instances will likely be under the same 2021-01-18 folder. The workflow properties startTime and endTime are the only indicator that should be used for actual submission times (as provided by date-related API’s).

Under the instance data folder, we find a payload folder with a path property. This property value directs us to the payload folder where submission data and attachments may be found. In this case, our payload can be found under /var/fd/dashboard/payload/server0/2021-01-18/EKBB2URNV5ILZJHNBTKJLLDVOY_8. This can be thought of as the “details” record for the workflow instance “header”.

The saved payload of the submission matches our Submit Action configuration exactly. Each of the files is placed under the “attachments” folder, we have a “documentOfRecord.pdf” as our PDF representation of the Adaptive Form, and the form data stored under formdata.xml.

Workflow Variable Data

In the below example we have a workflow that has 2 variables created, “variableA” and “variableB”. These are string variables that can be manipulated by custom services or by the Set Variable step.

As these variables are updated, their values are stored in the /var/workflow/instances/server[incrementer]/[date_node]/[workflow_id][incremental_counter]/data/metaData node. The important note here is that there is only a single instance of this variable. As this variable is updated throughout the workflow, you cannot retrieve its history, only its current value. For example, in this workflow example we have two Set Variable steps. The first updates variableA with “Hello World” and the second step updates the same variable with “Hello World Step 2”. This use case may be acceptable if we only need one instance of this variable and the history of its value doesn’t matter. However, if we need to keep a history of the values of variables throughout the workflow we need to ensure the values are stored separately or different variables are used. A prime example of the reuse of variables is with the Assign Task step. If you have multiple Assign Task steps in a workflow and the Route Variable name is reused in each step, you will only be able to obtain the last value of this variable at the end of the workflow. i.e.: You will not be able to see what decision was made by any previous user. Ensure that your workflow design and variable selection allows for all aspects of reporting you may need later on. In the below case we select “actionTaken” as our Route Variable for 2 separate Assign Task steps. The first user select “Reject” and the second user selects “Approve”. With the reuse of the variable and no additional steps to record the data externally, we will not be able to ascertain which choice our first user selected.

It should be noted that the workflow instances do contain history nodes for each of the steps executed (including a Node0 for the start of the workflow). These nodes do not include the values of updated variables.

The Purge

Why the reference to the horrifical movie? Well, that’s the world your data will be living in if you’re not familiar with this process.

The Adobe Granite Workflow Purge Configuration determines when workflow data will be completely purged from AEM. The most common configuration of the purge service is the deletion of all COMPLETED workflows after 30 days. In the forms world this would be considered a fairly aggressive purge strategy as we are used to accessing the history of processes inside Workspace for sometimes years back.

In the above example, when the purge service is executed, the workflow instances that were completed and aged 30 days will be removed from the JCR. This includes the workflow instance data as well as the payload. From a reporting perspective this means that we can no longer access the original form submission data, attachments, or document or record. No data related to the workflow of any kind will be accessible.

It is paramount that you create a long term storage architecture for your workflow and payload data outside of the embedded JCR. Ideally during the provisioning of AEM, you also provision access to your own RDBMS or No-SQL database that AEM will be able to communicate with.

One of the fastest ways to move workflow data to an outside service or RDBMS is by using the Form Data Model Service step. Alternatively the entire JSON structure of the payload and workflow instance can be created by accessing the JCR API’s. There are a large number of options and services to aid in the export and storage of your workflow assets for the longer term.

Using Form Data Model Service as Step in AEM 6.5 Workflow | Adobe Experience Manager

An additional option you can look towards for long term document of record storage is Amazon S3 Glacier. As the assets age, they can be pushed deeper into the Glacier storage retrieval API’s making long term storage extremely inexpensive when compared with traditional storage options.

https://aws.amazon.com/glacier/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s