Usefulness5/5
Frequency: 2/5
Power: 5/5
Unintuitiveness: 5/5
Complexity: 5/5

Long-running Workflows

Execute sub-processes that take an unknown amount of time

last changed at 2021-01-11 18:47:58 (UTC)
Advanced
Process Design
Splitting Things Up
UiPath

Splitting Things Up

This pattern is part of a series showcasing different techniques to split your business process into smaller pieces. Make sure to check out Splitting Things Up for an overview.

Situation

You have a business process where some steps involve waiting for an unknown amount of time on an external trigger or human action, but don’t want to idle a robot while it waits.

Compare

Make sure you follow the other parts of Splitting Things Up to understand the case study. In particular, you should know how to split your process into sub-processes and be familiar with queues.

Common examples

  • Human interaction — most often validation or approval — is required (via Action Center)
  • Handle non-time-sensitive tasks in batches to improve overall throughput by using queues
  • Parts of a process can only run at specific times (e.g. during/outside business hours)
  • Different unattended robots have to work on different parts of a process due to organizational alignment
  • An external system with an unknown service level has to be integrated
  • Unattended → attended hand-off

Long-running workflows

Long-running workflows (LRWF), also known as Orchestration processes in the documentation, allow you to put a job into a suspended state at a certain point. This special state, entered when the workflow hits a Wait for… activity, stores all variables currently in scope in the Orchestrator database.

Once a certain condition (see below) is met, Orchestrator instructs a robot to resume process execution at the point where the workflow was suspended.This could be the same or another robot assigned the same process

The general idea is fairly simple, but designing LRWFs well is daunting — not least because of the impressive list of gotchas you have to keep in mind for them to work correctly. These are for the most part non-obvious and may even appear to work in your test cases, but fail in other circumstances.

The most important one of these is that you shouldn’t use a regular For Each activity with the Wait… activities, or each iteration will only be able to execute once the previous one finished its long-running part. As this is usually not what you want, you’ll usually replace it with a Parallel For Each, with all the potential problems that entails.The most problematic of which is undoubtedly potential race conditions for common resources, so make sure to avoid sharing resources (especially for writing to) between the different iterations if at all possible.

To use LRWF, you can either start with the Orchestration Process template in Studio (when creating a new process) or include UiPath.Persistence.Activities and change the Supports Persistence project setting to true. To get the Wait for Validation Action activity for human validation in the context of Document Understanding, you also need *UiPat.IntelligentOCR.Activities

Persistence toggle in Project Settings
Persistence toggle in Project Settings

In the following, we will first look at the different suspension options and then return to our case study to see the pattern in action. Note that I will abbreviate the LRWF-related activities, which have absurdly long names.

Solutions

Start Job…

Start Job And Get Reference
Start Job And Get Reference

This pair of activities allows you to start a job asynchronously on an unattended robot and then suspend the robot until it is finished. You can use job arguments as usual, but remember that output arguments will, of course, only receive a value once the Wait for Job and Resume activity finished executing.

When to use

  • The LRWF is a chain of multiple sub-processes that you want to execute on any of a number of robots (but also see Invoke Process)
  • The orchestration robot (that is executing the LRWF) is not allowed to execute a sub-process due to your organization’s permission design
  • It’s appropriate to execute a separate job for each of a number of items

Comments

  • You can use this as a replacement for Start Job to start jobs with arguments on a different robot (e.g. attended → unattended handoff) — this is somewhat of a hack, so make sure you don’t use the Wait activity, which doesn’t work with attended robots.

Exercises

  1. Create a simple workflow to add two numbers. Call that process in sequence from a LRWF to add 4 numbers together.
  2. Modify the process from 1. to calculate a=(1+2), then b=(3+4) in parallel and then a+b after that.
  3. Assume you start 3 jobs from inside a Parallel For Each and a Log Message after the For each. Job 1 takes 10 s, job 2 takes 5 min and job 3 takes 30 min. Which timestamp do you expect for the 3 log messages log assuming the job starts at 8:00?
  4. Why can you not use the Delay or Retry Scope activity inside LRWF?
  5. Can you put Wait… activities into sub-workflows?

Solutions

  1. About 8:30 for all three messages. This is because the robot will not leave the Parallel For Each until all branches are finished by completing all their activities (including the Wait)
  2. This is due to an implementation detail of how the suspension works, which is the same as the Delay activity. Thus, if you use a Delay in a LRWF, it will fool the robot into thinking it should suspend, but without a Wait activity, Orchestrator will never resume it. Retry Scope uses Delay under the hood, so the same issue applies.
  3. No, this will cause problems with not finding the right entry point again. LRWF Wait activities are only allowed in the main entry point (usually Main.xaml).

Add Queue Item…

Add Queue Item And Get Reference
Add Queue Item And Get Reference

These let you add an item to a queue and wait until it’s finished. This is arguably the most common of the LRFW activities as LRWF are often used in processes that involve a large number of transactional items.

When to use

  • Coming from queues: you have to do something sequentially in the producer after a transaction has finished
  • Coming from other workflows: you have a large number of similar items or want to use any of the advanced queue features such as SLAs
  • Also see the list here which also applies to LRWF queues

Variants

If your LRWF contains more than one queue (or the same queue multiple times), there are two different patterns you might observe that I call the bead and tree pattern, respectively. The same patterns may occur for Start Job… but they are more common when working with queues

Beads and Tree patterns for LRWFs
Beads and Tree patterns for LRWFs

In the bead pattern, each set of Wait… activities will sit in its own Parallel For Each loop. When you draw a diagram of this, it looks a little like a piece of string with beads woven into it at certain points, then recombining the different strands.

In the tree pattern, on the other hand, we have nested Parallel For Each loops. All strands are only recombined at the very end of the workflow. A special case is when you only split once at the start and then subsequent Waits stay sequentially in the same for each branches (which I call the mop pattern for hopefully obvious reasons).

Of course these are just archetypes: useful concepts that help you reason about the design. Real processes sometimes transition between them or combine them in interesting ways. But don’t get too carried away.

Comments

  • Add queue item… has more design space than Start Job…
  • Queues naturally lead to better separation and better scaling behavior than separate jobs (e.g. only preparing the environment once per batch)
  • Properly designing the points where the different strands of your process recombine is crucial. Make sure you understand this deeply.
  • The LRWF parts go into the producer, not the consumer. In fact, making the consumer a LRWF is strongly discouraged because queue items automatically time out after 24 hours.

Exercises

  1. Create a simple workflow to add two numbers. Use a LRWF and a queue to calculate the sum of the first 10 integers. Hint: I would start adding 1 + 10, 2 + 9, and so onFeel free to start counting at 0 if you prefer
  2. Why shouldn’t you put the Wait… activities in a regular For Each loop?
  3. Name 3 advantages of using Add Queue Item over Start Job…
  4. Try to explain the concept of LRWF to a colleague. (Try is the operative work here, but don’t underestimate how useful it is to test your knowledge in this way)

Solutions

  1. Because it will lead to sequentially processing the loop iterations. Which is possible, but usually not what you want in a LRWF.
  2. There’s a large number of possible answers, here are some of mine:
  • Queues make the process faster on average
  • You can inspect the queue items to see how the process is progressing
  • You can use priorities and the other more advanced queue features
  • The intention of running the queue items in parallel feels clearer to me than with separate jobs, e) having a lot of pending jobs can interfere with your ability to run other scheduled jobs (a queue consumer can exit and resume later if the number of transactions is very large)
  • Retrying queue items is easy and can be done automatically — it’s far more complicated with jobs

Create Form Task

Create Form Task
Create Form Task

Create Form Task is used in so-called human-in-the-loop designs. The idea is that the robot needs some input from a person to make a decision the robot cannot (or is not allowed to) do. But we still want the robot to run unattended. And we also don’t want to block the robot for what could be hours or days until a human reacts.

To solve this, we can use Action Center to create a form (or document validation task) while the robot goes into suspended mode. Once an operator notices the task and finishes it, the robot will automatically resume its job with the information from its human slave master.

When to use

  • The robot needs additional information that it cannot get automatically
  • A judgment call has to be made
  • Approval must be granted
  • Document validation
  • As part of a learning loop where humans are asked for help on unknown items and the result is subsequently used by the robot to learn Most often in the context of machine learning tasks, but it can also be a humble lookup table

Variants

One of the most common usecases for this is document validation. The activities for this are called Create Document Validation Action and Wait For Document Validation Action And Resume. They are part of the IntelligentOCR package. Similar activities are available for document classification.

Comments

  • Action center requires a license for each user, make sure you check the licensing requirements
  • You can use Assign Tasks to automatically assign a task to a specific user
  • Building the forms is not exactly straightforward and there are many hidden features for them that use magic attribute names
  • The regular Form Activities work slightly differently from the Action Center-based ones

Exercises

  1. Create a simple approval task with two read-only text fields containing an invoice amount and a vendor name. It should only go ahead if approval was granted and throw a BusinessRuleException otherwise.
  2. Add an automatic assignment to yourself to the task
  3. Create a form with In/Out arguments and inspect how the data is changed when you handle the task
  4. Create a bead-like workflow with multiple form tasks being dispatched and execution only continuing after they have all been handled.

Create External Task

Create External Task
Create External Task

Create External Task allows you to use the Orchestrator API to build tasks handled by external applications into a LRWF.

The idea is that you create an action from the robot, your external program polls Orchestrator regularly to see if there are new actions and picks them up if that is the case. Once the external program is finished, it will return that information back to Orchestrator, which will resume the LRWF.

Comments

  • While this set of activities sounds interesting in the abstract, it is a rather niche use case that requires extensive setup and some programming knowledge

Resume After Delay

This activity will put the workflow into suspended state and wake it up again after a pre-determined amount of time. Useful for polling external systems regularly for updates when you can’t or don’t want to use Create External Task. Just run it, say, every 5 minutes to see if something changed and the robot can do other jobs in the meantime.

Comments

  • Resume After Delay is more useful than it might seem: it often allows you to use a simple polling loop where you would otherwise either have to block a robot or set up a complex operation with several queues, etc. to track the process’ progress
  • It is also a partial replacement for Delay, which you cannot use in LRWF

Case study

We will spice things up a little by making some changes to Maggie’s use case:

First, the company instituted a new policy that every new supplier has to be validated by a second person to prevent human error and fraud — so we will add a validation step using Action Center.

Second, we want to provide a monthly report to the head of accounting listing which suppliers were added.

Incorporating both of these, our process now looks roughly like this:

Supplier creation process
Supplier creation process

The Folder watcher will do nothing but continuously watch the hot folder for new files and start the LRWF once for each of them. It can run attended or unattended with slightly different designs: we will use an attended design as only a few people will ever add suppliers and all of them already have attended robots.

As a reporting DB we will use a shared Excel file, but in principle this could be anything. In case you’re wondering why the Reporter is a separate process, that is because of our… questionable choice of reporting DB. Also, I do like to separate concerns, so I might have done the reporting in its own process anyways

If we had a real DB and didn’t have to worry about race conditions, we could just run the reporting in the LRWF, but given that Excel files are blocked by the robot during read/write, we have to make sure the Reporter only executes on one robot at any given time.

In total, we will have 5 different UiPath processes that comprise this business process: Folder watcher, LRWF (Manager), Worker Wrapper, Worker and Reporter

LRWF

Let’s start with the heart of the matter: the long-running workflow. We will create a new Orchestration Process (using the Studio template) that will accept the path to a file in the hot folder as an argument.

The first step is to read that file and create a form for each supplier creation or change. Then, the LRWF will suspend until the change is either approved or rejected.

Orchestration process template
Orchestration process template

Note that the template uses a Flowchart by default. I prefer Sequences, so I switched it out. I also added the Excel and Form activities from the package manager, which we will need later.

As the Wait… activities only work inside Main, we will have more stuff in the main workflow this time, unfortunately.

Validate me! Please!

First, read the Workbook into a data table. Then we need to loop through it and use Create Form Task to create a form for each creation/change.

Now is the point where it gets tricky, so pay attention:

In the loop, we cannot directly use the Wait for Form Task activity. The reason is that the For Each Row runs sequentially, which would mean that, if we put the wait activity here, we only add one task and wait until it is finished. Then we continue to the next task. And so on.

But we want to create all tasks at once! So instead, we add the FormTaskData object returned by the Wait… activity to a collection.This allows us to loop through the collection with a Parallel For EachWe’ll use a generic List<FormTaskData>, which is perfect for this scenario loop and then handle all of the tasks as soon as they are finished.

Make sure you internalize the list of gotchas for LRWFs, which is quite important.

Supplier form data collection
Supplier form data collection

Note that we need to initialize this collection, which you can do either with a default value or with Assign: new List(Of FormTaskData). This is largely a matter of taste, but I prefer default values if it’s an empty initialization and Assign otherwise.

To add the FormTaskData object to our list, we can use the Add to Collection activity. Make sure you change the TypeArgument to FormTaskData.

In the end, it will look like this:

Parallel For Each task
Parallel For Each task

Formative years

So far, our form has nothing on it — which is suboptimal and will surely attract Maggie’s ire if we leave it like that. To forestall that, let’s add some action: we want to copy the variables and the Assign Variables sequence from the old Manager project and add them to the FormDataCollection on the Create Form Task activity.

Unfortunately, there is no way to bulk add form data items, so this once again requires some XAML hacking. Luckily, we can lift the arguments from Invoke Process, as the signature is identical.

XAML hacking the argument collection
XAML hacking the argument collection

Also change all the arguments to In/Out, so our approver can make changes to the supplier creation/change job if so desired. You can do that by finding & replacing InArgument with InOutArgument and VisualBasicValue with VisualBasicReference — but make sure you only do it in the selection.

In to In/Out argument change
In to In/Out argument change

Now when you open the Form Designer, you will find that UiPath was kind enough to automatically create fields for all our arguments. Neat! Some rearrangement is necessary, but this still saves us a ton of time. You might want to change the date field back to a DateTime to allow using a date picker, but I will leave that for the exercises.

Form designer automatically generated fields
Form designer automatically generated fields

Resuming the job

After approval is finished, we have get our variables back out from the form data. Luckily, UiPath does that automatically for us as long as a variable with the same name is in scope — so copy our variables from the For Each Row body into the body of Parallel For Each

Variables in the Parallel For Each
Variables in the Parallel For Each

This is a good opportunity to test our workflow so far — put an empty Write Line as a dummy activity below the Wait… activity and put a breakpoint on it. Then hit Debug. The process should run until the Wait… and then be suspended.

This is a particular feature of LRWF debugging: you have to manually hit Resume after the task is finished. You can hit it multiple times if you want, it won’t do anything.In test and production scenarios, LRWFs will run on an unattended robot and resume automatically

If everything works, you should have an action waiting for you in Action Center that looks similar to the following:

Orchestrator Action Form
Orchestrator Action Form

Do the changes

If you go back to the process graph, the next step (assuming approval was granted) is to create a queue item in the Supply_Drama_Q we created in Queues. As this is a LRWF — and we want to perform some tasks after the Worker finished — we will stay inside the Parallel For Each and suspend the workflow again.

This is analogous to the above, so we won’t go into too much detail here. You can XAML hack the arguments by copying them from Invoke Process in our previous Manager implementation. The main difference is that, for the rest of the workflow, we stay inside the body of the Parallel For Each. Astute readers will notice that this is a very simple example of the mop pattern

LRWF with worker queue items for approved changes
LRWF with worker queue items for approved changes

Reporting

The next step is to create the reporting workflow. As we are using a shared Excel file as a reporting “database” and it’s always a good practice to be wary of race conditions, we will ensure that no conflicts can arise by offloading the reporting entry into its own queue.

We will ensure in Orchestrator that the performer for this queue runs at most one instance at any given time.

  1. Create a new queue called Supply_Rep_Q — if you need help with this, check out Queues first
  2. Create a consumer workflow based on the RE Framework to handle these queue items

Note that I would generally recommend to use something other than Excel for this purpose, but this is common enough in practice that I want to at least give you a way to make it workable.

The Excel file we’re going to use should be located in a shared location. It is recommended to only give the robot user write access to it and restrict human users who might want to view the report to read-only permissions, or, better yet, only send a copy to them.

In terms of format, we will keep it simple: one Excel file for all changes, only listing the internal name, date and the ID of the Drama Q item (which we will use for the Reference in Supply_Rep_Q). You can find the template in the github repository in the Data/Input folder

Excel reporting template
Excel reporting template

Reporters report reports

As this process will be quite simple, we won’t bother with subprocesses this time and just put everything in Process.xaml and sub-workflows.

In addition to the normal config changes for RE Framework, I added an asset in Orchestrator to hold the reporting folder configuration. An alternative would be putting this into a process argument, but that is easier to forget and you cannot reuse it in different processes as a single source of truth.Putting such configuration in assets/arguments instead of the config file makes it easy to change later without republishing the workflow. Your mileage may vary

Report assets
Report assets

Since changes to the structure of the workbook require re-publishing the process anyways considering the template is stored in the process, I only added the other settings in the config file. Theoretically, it would make sense to store the template in a storage bucket instead, but that is more complicated and beyond the scope of this article.

Report constants
Report constants

I whimsically decided to do boilerplate for once, just to show that I can, and added some code to make sure the reporting directory actually exists and copy the report file to it if necessary. This will be called in InitAllApplications — see the repo for more details.

Ensuring the report exists
Ensuring the report exists

This is what Process.xaml looks like in the end: we assign the variables, read the table header, add a single row to the data table, then append the result

Process.xaml
Process.xaml

In case you are struggling with what to put in Add Data Row and how to get the Queue ID, set DataTable to the output from Read Range and ArrayRow to

QueueID = in_TransactionItem.Reference
{Now, InternalName, CreationDate, QueueID}

You may be wondering why we read the Excel header for every queue item instead of creating an empty table as a template in InitAllApplications and copying that. While it’s true that this could be optimized, it would make the code a bit more complicated because we’d have to pass the empty table around. In practice, the Workbook version of ReadRange is fast enough for our purposes here, so I decided to leave it as it is.

Queue triggered

How can you ensure that the reporter runs only once (that was the whole point, after all)? The easiest and clearest way is, in my opinion, to use a queue trigger for specifying this. Queue triggers allow you to automatically spawn a job for a certain process whenever new queue items arrive in a given queue.

Queue trigger for the Reporter
Queue trigger for the Reporter

The highlighted setting allows us to limit the number of concurrent executions: by setting it to 1, we ensure that only one simultaneous job will ever run for our reporting queue.

There is one minor drawback to this, which is that Orchestrator generates error messages when new queue items arrive and it hits the maximum, but that’s merely an annoyance and doesn’t affect functionality.

While we’re here, also add a queue trigger for the Worker Wrapper so we can execute everything automatically later on.

Please note that we’re now firmly in the realm of unattended automation, i.e. you have to set up an unattended robot for this to work. Covering how to do that is beyond the scope of this article, so please refer to the official documentation for more information.

Back to the LRWF

Now that the reporting process is implemented, we need to ensure that the reporter is fed with queue items correctly. There are two ways to do this: we can create queue items from the LRWF, or we can create them directly from the worker whenever it finishes an item.

Which of these options is preferable depends on what the reporting is supposed to achieve: if you want to ensure that a log is always written, no matter where the Worker is invoked from, putting it in the Worker is the only way to guarantee that. If you only want a log for this specific entry point, you should put it in the parent workflow.

In our case, I decided to put it into the LRWF as I only want reports generated when the process is triggered by the hot-folder-approval-worker flow. Admittedly, in this case is also a merely academic discussion because we only have the one entry point.

Refering back to the process diagram, we want to create the queue item after the Drama_Q item finished processing. We need three variables, CreationDate and InternalName (which come from the previous queue item and go into specific content), and QueueID , aka queueResult.ItemKey.ToString, (which will go into reference).

Dusty Tomes

The last step in our LRWF is, after the reporting line has been created, to archive the hot-folder file that was used to generate the Worker entries.

For that, we add an Asset in Orchestrator called SupplierChange_ArchiveFolder. While we’re at it, we might as well create SupplierChange_HotFolder for the hot folder, too.

Then, once all reporting entries have been made, all we have to do is retrieve the asset and use the Move File activity to move the file to it.

Move to Archive
Move to Archive

Note that the ability to wait until all queue items (or sub-processes) that belong under a single umbrella are finished, synchronizing the different strands of the process before proceeding, is a very powerful capability of LRWFs that is hard to replicate otherwise. Astute readers will notice the bead pattern at work here

Scope it early, scope it right

One pet peeve I have about LRWFs is the way they serialize everything in scope and store it in Orchestrator, with no manual override being offered.

In a longer workflow like this, if you want to prevent unnecessary information to be serialized and deserialized repeatedly, you have to make sure it is deleted once it is no longer needed.Not to mention the headaches a non-serializable variable, such as IEnumerable, can cause

This leads to adding either a lot of sequences to force variables to go out of scope, or to manually setting things back to Nothing. I tend to use a combination of both and don’t bother doing it at all for small value type variables, but it’s always frustrating.

In the end, this is what it looks like

Final state of the LRWF
Final state of the LRWF

Managers manage the managed

The last step is to test and connect everything. Once you’ve verified that the LRWF works, published it to Orchestrator, and created an Orchestrator process for it, let’s modify the Manager so we start the LRWF rather than creating queue items directly. Copy the Queue Manager project as a starting point.

In theory we could use Start Job for this, but as the LRWF takes the path to the Excel file as an argument, this won’t do. Luckily there is a workaround: the Start Job And Get Reference activity allows passing in arguments and — if you don’t use the Wait.. activity — works just like Start Job otherwise.

So the first step is to add the Persistence activities from the package manager.

Next, we remove the ReadSupplierChanges.xaml, EnterSupplierChanges.xaml and the Delete File at the end of Main.xaml, which are handled by the LRWF now. All we have to do in the Manager is watch the hot folder and pass the file path to the LRWF.

Long-running Workflow Manager
Long-running Workflow Manager
Arguments for Start Job...
Arguments for Start Job...

Final Result

If everything works correctly, you should now be able to test the following:

  1. Start the Manager
  2. Drop a Supplier file into the hot folder
  3. The LRWF will pick it up and create approval tasks
  4. Approve the tasks, possibly with some changes, or reject them
  5. For every approved task, the LRWF will create a queue item
  6. The WorkerWrapper should automatically start as soon as queue items are added and execute the Worker once for each
  7. The LRWF will create queue items for the Reporter
  8. The Reporter will run automatically and append the lines to the reporting file
  9. The LRWF will move the supplier file to the archive folder

Note that this solution is quite flexible as we baked in only minimal assumptions about where the process will execute. Depending on volume and how quickly approvals are granted, except for the Reporter (which should be pretty quick), the different parts of the process can run on different robots in parallel if necessary. That’s not terribly likely for this use case as suppliers are rarely changed in huge volumes, but can be quite important for other things such as document processing

By using queues, we can also benefit from some load-balancing by setting the queue trigger for the WorkerWrapper to execute multiple parallel processes on different robots.

That’s all, folks!

I hope you enjoyed this extended trip to long-running-workflow land. I know this one is significantly longer and more difficult than the other patterns, but that comes with the territory. LRWF (any concurrent solutions, really) are a power tool: when used correctly, they allow you to do much more than you might otherwise be able to — but it’s also easy to hurt yourself if you’re not careful. Perhaps this pattern will help you avoid that.

As always, you can find my version of the final result in this guide’s github repository: Main LRWF, Manager, and Reporter

Exercises

  1. Change the date field to a date time and use a date picker to edit it in the form
  2. Modify the Reporter to only instantiate the DataTable once and only write it back when all transactions are finished
  3. What are some of the trade-offs between Assets, process arguments and config files for workflow settings? When would you choose which?
  4. How do you make sure that only relevant information is stored in Orchestrator when using a Wait… activity?

Solutions

  1. This goes a bit beyond the scope of this article, but I was the one who brought it up, right?
  • Config files are set by the developer and only have one setting, period. There are two ways to do this: a common file in a shared location, or a project file. Changing project files requires re-publishing the workflow
  • Assets reside in Orchestrator, they either have a global setting (per folder) or one per robot.
  • Arguments are the most volatile, they must be provided whenever the process is started. They can be set by a trigger, the invoking process, or by users via Apps, Assistant, or the Processes part of Action Center

So the most important distinction is whether the setting’s value depends on the environment. If so, this immediately rules out config files and may rule out assets unless it’s a robot-specific setting.

The next question is how often a value has to change, and who should do the change. I’m sure you can figure out the details. There are some more minor considerations, but this answer is already long enough.

  1. By resetting variable values or making sure variables go out of scope before you hit the Wait… activity

References


© 2021, Stefan Reutter