Something I hear often is the difficulty in doing TouchDesigner blob tracking. “It’s too slow!” or “the tracking isn’t working!” or “I don’t know how to use it!”. I’m sure you’ve been there and experienced similar, so in this post I wanted to share some tricks for working with Blob Track TOP and getting your TouchDesigner blob tracking setups working better.
What is blob tracking?
If you’re unfamiliar with blob tracking, it’s a computer vision (CV) technique where you detect moving regions in an image or video stream, draw a bounding box around them, and then identify as much info as possible about that moving bounding box. This usually involves the computer trying to differentiate between the “background” and the “foreground” using different techniques. TouchDesigner implements blob tracking from openCV, which is an open source computer vision library. It is pretty robust and has recently seen major performance improvements in TouchDesigner since a lot of the processing has been moved to the GPU. But why’s it still difficult?
Thou shalt process inputs!
Blob tracking is not a new technique or discovery. In typical computer vision workflows, it’s a time-tested process that works quite well. One of the main problems in TouchDesigner blob tracking setups is that we often don’t know the fundamentals of blob tracking workflows. We assume you just plug in your input and you’re good to go! But that’s not quite the case. One example of this fundamental knowledge that many TouchDesigner developers haven’t been exposed to is that art of input processing. A good example of this is our concept of textures. We think about textures in a lot of different ways, but we don’t often think about them as matrices of data, which is a critical concept in computer vision. Here’s an image example of how computer vision folks would perceive an image as data:
As an artist, your first reaction might be “yeesh!” but these kind of thought processes lay the foundations of computer vision. So one of the largest improvements we can make in our blob tracking workflows to make them fast and usable is to process the input. There’s a few easy ways to approach that.
This is uncomfortable for a lot of folks at first. The idea of scaling down the resolution of your image by a lot sounds counter intuitive. “We want all the data, dont we?” Yes and no. We want the data, but in most cases, especially for real-time applications, a lot of the data ends up being non-useful and ends up eating computation time for no extra benefit. This is also compounded by the fact that in most cases, the blob tracking algorithms don’t need the extra data we’re giving them. Depending on if you go for background subtraction or if the algorithm is trying to detect movement over a bunch of frames, in both cases the algorithms won’t need millions and millions of pixels.
Here’s a conceptual test for you. Can you quickly identify the statue in this image?
Answer is almost certainly yes. How about this one?
Probably also a yes for the second. The interesting thing is that the first image was 1200×1900 pixels source file while the second was…200×299 pixels. So the first image was 6x higher resolution, but our eyes we’re still able to discern enough detail that we can identify the statue from the background. And guess what, modern computer vision also has the same ability! So why waste all that extra processing on those pixels when they aren’t exactly necessary for our purposes.
Reducing the resolution of your input image before it goes to blob tracking is a huge way to increase TouchDesigner blob tracking performance.
Smooth out your differences
On top of down-resing, one further trick you can use is applying blur to the image. This usually allows you to go even lower resolutions and still have a passable image quality. For example, I took the above photo and dropped it down by more than half, so it’s now 80×120 pixels (the original image was 15x higher resolution):
Starting to look a bit crunchy and artifacts/noise are appearing. In this case, a slight bit of blur may be useful to taper the edges down and remove some of the extra noise in the image because you may get false positive detection from things like random patches of heavy noise. This is common when working with noisy depth sensors as well, so it’s a good trick to keep in mind. Let’s add a slight bit of blur.
Now you might be thinking “well it’s becoming a bit harder for me to distinguish the statue now” which is true. Like anything in our field, you’ll have to use your tools with care, in the sense that some will be useful in some occasions and detrimental in others. In this case a blur may not be appropriate. But if the image was noisier (like from a Kinect depth sensor), then this would be a life-saver. The important thing to know is that it’s a useful tool in your tool kit.
Simplify the matrix
As we saw in the image at the beginning of this post, computer vision folks perceive image data as matrices. Usually you’ll have three matrices, one for each of the RGB channels. It’s a very common practice to simplify this down to a grayscale image and manipulate it in different ways. This is because most types of computer vision (outside of colour tracking) don’t really care about “colours.” So at that point, there are two avenues we can take. The first is that we can grayscale an image and then bump up contrast and other levels to try to make our subjects much more apparent, like the below:
Which although it looks uglier and uglier to us, as long as your subject is apparent in the image in an obvious way, it will work for blob tracking. If this was a video, the tower would always remain a bright white area that the blob tracking would determine to be the “foreground.”
Another option is to threshold the pixels. This is especially useful if you have a situation where maybe you have a dark background and subjects will be lit up brightly, for example if you’re using IR lights to illuminate the subjects and you’re using an IR camera to capture that input. The threshold process would ignore any pixels below a certain brightness and leave big moving white areas that the blob tracking will detect with ease.
If that was all that was moving in the image, the blob tracking algorithm would draw a firm bounding box around it and would hold it solidly. This doesn’t take into consideration that you could continue to process the input such as adding another blur pass to smooth things out (and maybe smooth over some of the gaps created by thresholding):
Again, not aesthetically pleasing to look at, but if you were to imagine this is a video stream, the statue would be a big white blob on a black background that the blob tracking would have no problem at all detecting and putting a bounding box around the whole thing.
If you’ve tried TouchDesigner blob tracking before and had a bad experience, I challenge you to give it another try. While much of this is rudimentary, even without diving too deep into the parameters of the Blob Track TOP or anything else, you’ll get orders of magnitudes of better results if you spend time processing your inputs with diligence. That’s the main point of this piece, to get your thinking about it. Drop the resolution, maybe a bit of a blur, increase contrast or threshold pixels, and watch the Blob Track TOP take over from there. If you’re finding success with that, then there are tons of openCV blob tracking tutorials out there that can take you even deeper on the road of input processing. Enjoy!