AEM Bulk Asset Upload with ACS AEM Commons

January 31, 2017
Developer

When a new eCommerce solution goes live, there can be a large initial bulk upload of assets before the site is ready. This initial bulk upload can be tricky and the potential pitfalls are often overlooked as pulling over legacy assets is usually seen as a throwaway operation. This can cause roadblocks and production headaches. We're going to examine a particular case in Adobe Experience Manager (AEM), an enterprise-level content management system. Our solution not only makes the bulk upload relatively quick and easy, but also has little to no impact on end users. We also see the strength of pairing AEM with tools from Adobe Consulting Services (ACS), the ACS AEM Commons.

Developers and content authors can take advantage of a plethora of tools when managing day-to-day operations with AEM’s Digital Asset Management (DAM). One particular feature that many users enjoy is the automatic generation of renditions. You can set-up your AEM instance so that any time an image is uploaded, it is automatically resized and processed for a variety of occasions. Often you need at least a high quality rendition of an image for desktop, and a lower-resolution version that loads faster and uses less data on mobile devices. AEM can use these renditions to seamlessly blend functionality with responsive website designs, meaning content authors only have to place one piece of content without having to worry about choosing separate mobile and desktop versions. While this feature can be very powerful, it can also increase strain on the system during the initial bulk upload phase as developers, authors, and potentially even end users demand access to resources that are already under strain as the images are processed.

AEM does provide tools out of the box that allows developers to manage this bulk upload process. There is rich support on the backend to monitor system performance, to queue and efficiently throttle this bulk upload process. However, this must all be coded manually and so it can be complex to design and implement. The initial bulk upload phase involved in deploying a new website often happens only once at the beginning. Unless your site is planning on drawing in vast quantities of uploaded media every day, it is unlikely anyone wants to invest in a bunch of complex code for a one-time operation. We, therefore, need to come up with a less work-intensive solution.

The largest share of the initial processing is caused by Workflows, the AEM name for those operations such as generating renditions for your images. Workflows should be treated as heavyweight operations. If you have tens or hundreds of thousands of assets, then Workflows, which take only a fraction of a second per image, suddenly add up to days of processing time when added together. Imagine multiplying every single asset to be uploaded by all the Workflows that run on each asset and you start to get an idea of how much processing needs to happen.

To efficiently manage Workflows, we'll need to incorporate tools from ACS. ACS is a first-party service that provides support for developers using Adobe products. For AEM, ACS offers a suite of tools known as the Commons, or ACS AEM Commons in full. The ACS AEM Commons is a suite of tools to supercharge your productivity in AEM, provided free of charge for existing customers. We are making use of two tools to assist our bulk upload process--the CSV Asset Importer and the Bulk Workflow Manager.

CSV asset importer image

The CSV Asset Importer Tool makes it easy to process large numbers of uploads to the DAM. First, you should make the raw assets available to AEM, which might involve copying them to a storage area on the server. This is an important step, because we would like to have minimal latency once we begin importing the assets into AEM. You then provide a CSV file explaining how those assets should be handled. You can go to the ACS AEM Commons for detailed information on how the CSV is generated, but the crucial portion we want to look at is the option to throttle the upload.

CSV Asset Importer allows you to separate your upload into individual batches. Now instead of running Workflows for your entire batch operation at once, you can enter your desired batch size. It certainly makes a big difference to our server whether we’re uploading 100,000 images at once or only five, and this enables quite a bit of control in how we want to manage our upload. There is also an option to “Throttle in MS,” meaning to set how long to wait between batches (as a time given in milliseconds). Taking these options together, CSV Asset Importer allows us to manage the rate at which assets are processed by the system.

For small to medium batches, perhaps a few thousand assets depending on the complexity of the Workflows you've set-up, CSV Asset Importer should provide plenty of control by itself. Just experiment with a few small batches to get a feel for the timing, which can be done by cutting a few lines from your CSV and putting them into their own file. Watch the server logging to get the time until all processing is completed and you have an estimate of how to time your batches.

However, this isn’t sufficient for a larger bulk upload--certainly not anything over a few thousand assets in a typical case. The wary reader will note that this throttle is entirely static. Spikes in server load or having assets of varying sizes and processing requirements may lead to drastic differences in timing. Once the server has “missed” one batch window, the problem can quickly compound as Asset Importer continues to send in new batches. We could get around this if we could isolate the Workflows as a separate operation.

In order to separate Workflows from the initial upload, we have to temporarily disable those Workflows which apply to newly uploaded assets on the Author instance. As of AEM 6.1, this was done by going to the main AEM page and clicking Tools, Workflow, and Models. It's important to perform a few smaller uploads to get a sense of how long you're going to need to keep the Workflows disabled before performing this action on any higher environments. Authors should be warned not to upload any assets during this time, although other functions do work as normal. The uploading of assets is the only portion of the authoring process that should be affected, since those are the only Workflows that are turned off during this time. With those Workflows disabled, CSV Asset Importer quickly gets all of your assets loaded into the DAM, albeit without renditions or tagging. CSV Asset Importer remains an ideal tool for the portion of the work regardless of batch size, as the initial upload is very fast and stable once the Workflows are turned off. Just remember to re-enable your workflows when you are done! Unfortunately, re-enabling the Workflows won’t run them on existing assets so we need one more tool, the Bulk Workflow Manager.

bulk workflow setup image

The Bulk Workflow Manager handles the last portion we need, efficiently managing the Workflows that need to be run on all of the assets we just uploaded with CSV Asset Importer. Remember that in stock AEM, the system attempts to run all of the Workflows for all new assets the moment they arrive. In contrast, Bulk Workflow Manager kicks off a batch of Workflows, then waits the specified period before checking to see if those Workflows are completed and queuing up more.

The latest version of Bulk Workflow Manager allows you to specify any of several querying languages to find your assets. It also allows manual control of batch sizes and throttling in addition to the smart throttling inherent to the managed Workflow process. For example, if you choose a batch size of five, Bulk Workflow Manager only ever runs five workflows at once. That’s a vast improvement over the potentially millions of Workflows you might accidentally queue up if you feed in all your assets at once. Just as with CSV Asset Importer, you’ll want to try a few small batches just to ensure you’re selecting a reasonable batch size and timing. Unlike CSV Asset Importer, a sudden spike in system utilization only delays the Workflows currently in process rather than compounding issues for the rest of the bulk Workflow process.

Attempting to publish all of your assets at once may run into the same issue as any other mass Workflow operation, so ensure that the Bulk Workflow Manager is considered any time you need to perform operations on a large number of assets. It is strongly suggested that you run another Bulk Workflow Manager process, combined with a custom publish workflow, to handle the publishing all of these assets if they are needed on the Publish server. 

We have accomplished our goal. We now have a way to bulk upload the vast quantities of assets you need during a go-live. By leveraging ACS AEM Commons, we can bulk upload our assets with minimal set-up and minimal downtime. Content authors only need to be restricted during the initial CSV Asset Import phase, and the only restriction is that they cannot upload new assets because those Workflows are off during this step. Developers, authors, and even end users shouldn’t experience any other noticeable slowdown or interruption of service, even in a system that is fully live during the entire process.