What is Pixel Packing?
There are many strange terms in interactive media and pixel packing probably takes the cake for me. Pixel packing is a technique for taking data values and transferring them into pixel values in an image or frame of a video. Imagine a switchboard of LEDs flickering as 0s and 1s…but turned up in complexity quite a bit. There are a ton of benefits to using this method but first let’s talk a bit about how it works.
Pixel packing is a technique for taking data values and transferring them into pixel values in an image or frame of a video.
How it works
Pixel packing works by taking arbitrary values and “packing” them into arbitrary pixel channels. This can be simple or difficult depending on the type of data and image or movie format you plan to use.
We don’t ever really spend time thinking about our data types. We get a value and use it. Sometimes we cast it to another data type. Rarely do we wonder about the amount of bits we need to hold that data and the limitations around that.
Most of the data points we use today are 16-bit or 32-bit. There are a small number of sources that fit within just 8-bits of data, such as MIDI signals. An 8-bit data source has 256 steps of data (0 – 255), which fits the 128 steps of MIDI data with room to spare.
256 steps may sound like a lot of steps, but your range gets used up quickly. You wouldn’t be able to accurately represent a pixel position on a 1024 pixel LED strip if you only had 256 steps to use. Every step you took in your 8-bit value would correspond to 4 real world pixels.
A 16-bit data source, on the other hand, will have 65536 data steps (0 – 65535). This starts to get into more usable ranges of values. A 32-bit value will have even more steps, 4,294,967,296 to be exact!
The first step in pixel packing is figuring out how many bits your data needs. Are you using MIDI? Then each of your data points will fit in an 8-bit value. Are you measuring the forces created when 2 atoms smash together? You’ll probably want the precision to 32-bit.
The second step in pixel packing is figuring out the type of image or movie format you’ll be using. This is your output or transfer specifications. Pixel packing can be quite easy in some cases.
For example, if you had 10 MIDI channels and an 8-bit RGB image or video format, you could put each MIDI channel in a single channel of your texture. This means you could fit all your data in 4 pixels. This is a nice 1:1 scenario, but more often than not, this isn’t a common scenario.
Dealing with 16-bit and 32-bit values is much more common. Where output formats really start influencing your process is when you need to decide how to break your data value apart. Take the same example as above. We have ten 32-bit channels of data and we’re using an 8-bit RGBA texture format. We can’t map our channels directly to pixel channels because there’s more data than there is room for that data in a single channel. What we have to do is spread our 32-bit value over more than one 8-bit value using bit shifting (more on that later).
This case is still quite manageable because it turns out that it takes four 8-bit values to hold a 32-bit value, and our texture format has 4 channels per pixel. Instead of a 1:1 between data channel:pixel channel, we can create a 1:1 relation between data channel:pixel, where each data channel uses all 4 channels of a pixel. Our ten 32-bit values would need 10 RGBA pixels in this setup.
Deciding your output format will really depend on your software usages. There are a number of 16-bit and 32-bit texture formats that you can use that might help create an easier data:pixel relation, but using bit shifting to move your data between different bit depths isn’t a taxing process.
We mentioned bit shifting a few times above. It’s really the magic glue of pixel packing that helps you take whatever bit depth of data you have and allows you to pack it into whatever bit depth of texture format you’re going to use.
Bit shifting is a handful of a topic worth of it’s own post, so I recommend reading about it on the web. The overall idea is that you’re performing operations on the bit representation of your value, and not your value.
0001 is the 4-bit binary representation of the number 1. If you were to bit shift this one position to the left, you would have 0010, which is the binary representation of the number 2. If you were to shift it again, you would have 0100, or the number 4.
In binary, you only have 0s and 1s. All your large numbers and values just become patterns of 0s and 1.
Think of this visualization to understand the use of bit shifting for pixel packing:
We have a 16-bit data value below:
0101 0101 1010 1010
And we want to put it in two 8-bit channels (R and G channel):
R: 0000 0000
G: 0000 0000
Smart people somewhere thought to themselves, “Why not just split up the 16-bit binary pattern and put each half in an 8-bit value, then combine them again later as a 16-bit value?” Ding, ding, ding, that is pixel packing.
So we would take the left 8 bits of our 16-bit value (0101 0101) and put them in our R channel and then hold the right 8 bits in our G channel (1010 1010).
If you imagine that the computer could only transfer the right 8 binary digits (since you technically read it right to left), its easy to move those 8 bits into our G channel. How would we get the 8 bits on the left? Bit shifting! This is where we would shift all those 8 bits over (and basically push out the first 8 digits), which would give us:
0000 0000 0101 0101
Then the first (or right) 8 bits could be stored in our R channel. Make sense?
Disclaimer: This is an over simplification and purposefully leaves out some details just to try and make a bit of a clearer image of what’s going on. Many developers, including myself, find bit shifting and bitwise operations to be really confusing, so this example is meant as more of an easy to grasp introduction.
You can do a web search for how you would create bit shifting code for your specific use case. There are many references on pixel packing and pixel packing algorithms in many languages.
Benefits of pixel packing
I know your brain might be slightly numb by now, but here’s what we’ve been waiting for. Why even both with all this input/output and bit shifting nonsense? It seems complicated and hard. Why not just make some CSV files or large JSONs or XML or anything?
Data sizes and transmission ability
You would not believe the amount of data you can pack into a regular old image or movie file. Let’s do some quick math.
Standard 1920×1080 8-bit RGBA image has 2,073,600 pixels. This means it has 8,294,400 channels that can hold:
– 8,294,400 channels of 8-bit data, or
– 4,147,200 channels of 16-bit data, or
– 2,073,600 channels of 32-bit data
Just take a moment to think about how much data that is. Imagine the JSON or CSV file that would need to store that. It may be workable, but it could quickly become a bit unruly. A single image is something we’re used to dealing with and our media consumption on the internet has built an infrastructure based around sharing media. Which leads to the next question.
Have you ever tried to send 2 million channels of data in real time somewhere? Maybe, but it was probably hard and needed quite a bit of infrastructure. Have you ever tried to send someone an image or stream a 1920×1080 RGBA video feed somewhere? Almost certainly you have, and it’s such an easy thing to do now.
Image and video formats have been quickly changing to support higher resolutions and faster frame rates. This means that they’re becoming better and better at storing more and more data.
Take the 1920×1080 example above and compare it to your normal data needs. In cases where you need more data, such as trying to save a particle system simulation point positions, you can quickly just increase the size of your texture. Up the resolution or your texture to 4K and you’re now able to hold 8,294,400 channels of 32-bit data. Images of that resolution are quite normal in photography and are the new norm in video.
Video codecs like HAP allow you to read files of even bigger size by pushing the grunt of the decoding to the GPU. We’ve had projects where we were reading multiple layers of 10,000×3240 videos at 30fps in real-time using some affordable PCIe solid state drives. That is over 32 million pixels being read in real-time at 30fps for about 500$ worth of hardware that you can buy over the counter.
It should be noted that HAP codecs are compressed and can lead to artifacts in your data. Lossless codecs like Animation codec or image sequences created from lossless image formats are preferred for storage purposes.
I talked about how to think like a GPU in a previous post. All the benefits and workflows in that post are easy to apply to data that is in a texture format.
This is because once that data is uploaded to the GPU for display, it’s all just sitting on the GPU ready to be worked with. You’ve already performed the key step of finding the best approach to getting it there, and now you can access the data easily from within vertex or fragment shaders.
How to get started
We created a set of examples available on our GitHub to help developers get started with the concepts I covered in this post.
Head over to our GitHub repo to download the examples.
There are 2 example sets that demonstrate doing a simulation of some sort in one app and then sending the data to another app for drawing. We created an example of bouncing balls in both openFrameworks and Processing 3 that send the ball positions and sizes for drawing in TouchDesigner. There is a readme on the GitHub that should help get you up and running with the examples.
Side note: TouchDesigner has built in functionality to help you with pixel packing in the Pack TOP operator. You can experiment with it within TouchDesigner to help you get more familiar with the process before moving to different frameworks.
Experiment with pixels as data. There’s a little conceptual hurdle to getting started but once you’re comfortable with the workflow, you’ll soon find yourself able to harness the specialities of a number of different frameworks and have them all communicate in real-time large amounts of data.