Tag Archives: performance

Sneaky tricks for developing on small devices – Bitmap ‘folding’

DSC_1583One of the most problematic constraints when developing applications for mobile or Set Top Box is video memory (AKA VRAM). You often will not have control over how much video memory is allocated to your application, or what the fallback behaviour is when your application uses too much.

This can be a pain, especially when you wish to create some off-screen surfaces for caching or compositing to improve performance.

If your application runs too slowly, that’s an ISSUE; if your application crashes due to excessive memory usage, that’s a PROBLEM.

I recently built a feature into an application which required a bunch of external images to be loaded into a Set Top Box device for in-process caching. Since all Bitmap surfaces are allocated on the platform as ARGB, but the images were monochrome, I could store the images efficiently AND make them available for hardware-accelerated compositing by storing just the single monochrome channel of each image in a separate channel of an in-process cache Bitmap surface.

You can see in the attached image what a a mish-mash of logos is created. For debugging purposes, you can also see the same surface viewed with each other channel turned off. When a logo is requested, a dictionary finds the relevant Bitmap slot it exists in (given by a Rectangle and BitmapDataChannel number). When a new image is loaded, its single channel is copied into the next available slot in the cache, in FIFO fashion.

The alpha channel of the cache surface wasn’t used, due to the pre-multiplication problems you’ll get – though this can be worked around if you can ensure there are no zero-alpha pixels. The result is an in-process cache requiring no image decoding to composite images, storing 3 times as many images as a regular Bitmap FIFO. #WIN

IPTV development with AIR for TV

Having just finished building the UI for the YouView set top box, I thought I’d share some of my insights into best practices when building applications for such resource constrained devices. The YouView UI is AIR based, written in AS3 and runs in Stagecraft 2, also known as ‘AIR for TV’. As the name suggests, AIR for TV is a special version of the Flash player for embedded systems, such as set top boxes. The first incarnation of the YouView UI (back when it was just codenamed ‘canvas’) was for Stagecraft version 1, which means coding in AS2 and suffering the abysmal performance that comes with running on AVM1 (ActionScript Virtual Machine 1).

Despite the delays and the need to code the UI from scratch in AS3, I think it was ultimately the right decision. Stagecraft 2 is a much better platform – Stagecraft 2.5.1 to be precise. It was a great opportunity to learn how to write optimal code and use hardware acceleration effectively on resource constrained devices. I’ll be doing some tutorials on this in the near future, but here’s the key points to observe when developing for such platforms:

  • Limit the complexity of your display list heirarchy
    This may sound obvious, but ensure you nest as few things as possible, keeping the display list as shallow as possible. Stagecraft needs to traverse through the display list, working out which areas of the screen to redraw. This is similar to how the desktop Flash Player handles redraws, but with some key differences to how it decides what needs redrawing, how it tackles moving display objects and how it delegates the work of updating the frame buffer – a subject for another time. Mostly importantly, if you’re developing for a resource constrained device (such as mobile or set top box), you’ll have very limited CPU power, even if the device’s GPU (graphics processing unit) affords you great hardware acceleration capabilities. So, before Stagecraft can delegate any work to hardware, it enumerates changes in the display list in software. Complex display list heirarchies are a headache for some of the low-powered CPUs found in mobiles and set top boxes and this’ll show up as rocketing CPU usage, low framerates and few spare ‘DoPlays’ in Stagecraft (spare work cycles). By keeping your display list shallow, with only the bare minimum of display objects on stage at any one time, you’ll be making life easier for Stagecraft by doing less work on the CPU – whether or not graphics are drawn in software or hardware.
  • Benchmark everything
    When building an application for a resource constrained device, you should be able to run each component in isolation, to assess its drain on CPU and system/video memory. There’s no point optimising the hell out of one component, when it’s actually another one that is the source of your performance bottleneck.
  • Know thine hardware acceleration capabilities
    There’s no point blindly using cacheAsBitmap and cacheAsBitmapMatrix everywhere, if it’s not going to speed things up on the target device. Worse still, too many cacheAsBitmaps and you may be just wasting valuable video memory, or causing unnecessary redraws (again, the subject of a future article). A lot of platforms will accelerate bitmaps, even if stretched, but not necessarily if flipped or rotated. Alpha on bitmaps (or anything cached as bitmap) will usually be accelerated too, but this is not necessarily the case with all colour transforms. Benchmarking any component you’re building will quickly tell you where you might have pushed it too far, but you should also have a way of verifying that a particular set of transforms is indeed hardware accelerated. Stagecraft provides this when using its –showblit command line parameter. I’ll be going into more detail about this in another post.
  • Mind your memory
    When using various hardware acceleration tricks, especially on resource constrained devices, video memory is at a premium and usually in limited supply. You will need to know the limits and have a way of seeing how much video memory your application is using at any one time – ensuring you dispose and dereference any bitmaps you’re finished with too. If your platform uses DirectFB for its rendering, as YouView does, the executable ‘dfdump’ can show you just where your video memory is going. This is something else I’ll get into in another article.
  • Blit blit blit
    This refers to blitting, where blocks of pixels are copied from one bitmap to another. This technique is used a lot in games, where graphics performance is critical, you should arm yourself with the basics of how old video games used blitting of multiple things to a single bitmap for performance and video memory efficiency.

I’ll probably go into more depth on each of these things in forthcoming posts. Stay tuned.

CODING WRONGS – Where do I start with the bad?

It gets scary out there sometimes. During my freelance career I’ve worked at a lot of different companies and have seen such coding horrors as you cannot imagine. So I thought I’d start immortalising some of them – so that we can all learn better coding practices, by looking at the bad.

Starter for 10 – What’s wrong with this picture?

Did you spot the fubar? It’s not an obvious one.

This code potentially replaces a Bitmap’s BitmapData, without first explicitly disposing any existing BitmapData.

I see this kind of thing quite often and it’s the source of many a memory leak. AVM2 isn’t that great at dealing with this kind of situation and there’s a crucial difference between GC cleaning up out-of-scope objects for you and things like BitmapData: GC will reclaim the memory associated with objects ‘when it feels like it’, whereas explicitly calling the ‘dispose’ method of a BitmapData will immediately give you back that memory.

In the case of platforms with hardware accelerated graphics (such as mobile or set top box), the memory associated with the pixel data itself (video memory) will be reclaimed immediately.

The lesson?

Don’t make more work for the Garbage Collector when you can avoid it.

Loan Shark – fast object pooling utility

LoanShark AS3 Object Pooling UtilityA couple of years ago, I created an object pooling utility for a games project I was building in AS3. Since then, I’ve used it quite a few times, in order to speed up apps and improve resource management, easing the load on the garbage collector by reusing objects instead of recreating them.

While object pooling isn’t a magic bullet to speed up every use case, it works especially well on things that are heavy to continually construct and destroy. A good example is my History of the World project, which uses an object pool for item renderers, instead of creating and destroying them as you navigate around – press ALT+CTRL to bring up the resource debugger, which shows a little information on its usage.

I recently updated the utility, improving its performance, adding features and putting loads of unit tests around it. It’s now hosted it over at GitHub. Using it is a simple as:

Fastest way to add multiple elements to an Array / Vector

In a simple situation, where you wish to add many elements to an Array or Vector, you might just do:

However, the sizes of both Arrays are manipulated for each loop, which will have an adverse impact on speed and memory usage. So, we could cache the length of the input Array and not manipulate it:

But we’re still growing the size of the output Array incrementally, which is very bad. Since we know input.length in advance, we could grow the output Array to its new size just once, before the loop:

This is OK, but still involves a loop. If only we could push multiple elements into the push method in one go. Well, we can – enter the apply method. Since Array.push accepts multiple arguments (something rarely used) and apply allows us to pass an Array of arguments to any Function, one line and we’re done:

This works out faster and more memory efficient than the other methods. It works nicely for Vectors, too. If anyone has a faster method of doing this, do let me know.

SWFIdle – simple flash idling utility

If you’re still churning out Flash banners, please use this!

swfidleI created this simple utility, called SWFIdle, to enable the Flash Player to lower its CPU usage while the user is not interacting with it. Since it’s possible to have multiple Flash instances embedded in one page (for example, a game and a couple of banners), I recommend that everyone uses this in their projects, so that players needn’t fight for CPU and give a worse name than it has already.

I know there’s the hasPriority embed attribute now. But:

  • That assumes you have access to the HTML that embeds your SWF
  • If no other players are present, it has no effect
  • There’s still usually little reason to be running your SWF at a high framerate if the user isn’t interacting with it
  • Flash banners with wastefully unoptimised drawing routines are probably one of the key reasons that Flash got poo-pooed off of mobile platforms and disabled on everyone’s laptops – CPU usage = battery usage!

BBC – A History of the World in 100 Objects

I just finished a new project, called ‘A History of the World in 100 Objects‘, it’s a joint venture between BBC Radio 4 and the British Museum to chart human history in a new way. I developed the concept for the 3-D object explorer with the guys at GT/VML and built it using Flash 10’s native 3-D capabilities. Users are able to explorer objects from throughout human history in a potentially inifitely expanding 3-D time tunnel and even make history by uploading their own objects.

Here’s some footage of the 3-D explorer:

The main challenge facing the development of the 3-D explorer was to build something capable of handling up to 10,000 objects, loading in their images and displaying it all in glorious 3-D… all without crashing the user’s browser. Every object or filter set accessible within the explorer can be bookmarked, shared, or navigated with the browser back/forward buttons. For added accessibility, the explorer’s 3-D view itself can be navigated with the keyboard, mouse wheel or the on-screen controls.

I built the application strictly to optimise performance and memory management, while ensuring maximum stability. Coding techniques such as object pooling, typed arrays, load queueing, render deferral and the flyweight design pattern were used to maximise performance and minimise memory usage.

ActionScript performance tips

Here are a few simple tricks that may help the performance of your code/graphics intensive Flash movies. This is not an exhaustive list, by any means, rather some of the more effective performance tweaks to try out on your projects. There are the usual sensible coding tricks, like using local variables for oft-used references within functions, or planning your code loops carefully and breaking out of loops whenever feasible – but you should be doing these already. I’ll be adding to this post as and when I feel necessary, but will generally avoid the more granular tricks, such as bytecode optimisation. Some of those methods are too complex to explain in simple terms here and generally have a low effort-to-benefit ratio anyway:

Use scrollRect in conjunction with cacheAsBitmap. The cacheAsBitmap parameter of a movieclip can improve performance dramatically, but will cause problems if the render area of the movieclip gets too large (e.g. larger than 2880 in width or height), regardless of whether it is cropped by the viewable area of the Flash Player. The solution is to use scrollRect to constrain the rendering area to desired limits, in this example, the stage width/height:

Create bitmap snapshots of complex movieclips. Where you may have a movieclip full of layered graphical effects that isn’t animated, you can save a lot of rendering time by creating a snapshot of the movie. Similar to the cacheAsBitmap parameter, but will improve performance further if your movieclip comprises many lines or alphas. The following function shows a quick and dirty way of duplicating a movieclip as a snapshot of the original:

Use the opaque window mode trick – sparingly! Setting the wmode=opaque HTML parameter of your Flash object can improve rendering performance, but at a potential cost. Not only does it make the rendering order of movieclips and frames more ‘lazy’, but will effect keyboard interaction adversely in some browsers (especially FireFox). Use with caution.

I’ll be updating and tracking back to this post occasionally, so stay tuned…