NeoPixel dithering with Pico

In the extra special Raspberry Pi Pico launch issue of HackSpace magazine, editor Ben Everard shows you how to get extra levels of brightness out of your LEDs with our new board.

WS2812B LEDs, commonly known as NeoPixels, are cheap and widely available LEDs. They have red, green, and blue LEDs in a single package with a microcontroller that lets you control a whole string of them using just one pin on your microcontroller.

The three connections may be in a different order on your LED strip, so check the labels to make sure they’re connected correctly
The three connections may be in a different order on your LED strip, so check the labels to make sure they’re connected correctly

However, they do have a couple of disadvantages:

1) The protocol needed to control them is timing-dependent and often has to be bit-banged.

2) Each colour has 8 bits, so has 255 levels of brightness. However, these aren’t gamma-corrected, so the low levels of brightness have large steps between them. For small projects, we often find ourselves only using the lower levels of brightness, so often only have 10 or 20 usable levels of brightness.

There will usually be wires already connected to your strip, but if you cut it, you’ll need to solder new wires on
There will usually be wires already connected to your strip, but if you cut it, you’ll need to solder new wires on

We’re going to look at how two features of Pico help solve these problems. Firstly, Programmable I/O (PIO) lets us implement the control protocol on a state machine rather than the main processing cores. This means that we don’t have to dedicate any processor time to sending the data out. Secondly, having two cores means we can use one of the processing cores to dither the NeoPixels. This means shift them rapidly between different brightness levels to make pseudo-levels of brightness.

For example, if we wanted a brightness level halfway between levels 3 and 4, we’d flick the brightness back and forth between 3 and 4. If we can do this fast enough, our eyes blur this into a single brightness level and we don’t see the flicker. By varying the amount of time at levels 3 and 4, we can make many virtual levels of brightness. While one core is doing this, we still have a processing core completely free to manipulate the data we want to display.

First, we’ll need a PIO program to communicate with the WS2812B LEDs. The Pico development team have provided an example PIO program to work with – you can see the full details here, but we’ll cover the essentials here. The PIO code is:

.program ws2812
.side_set 1
.define public T1 2
.define public T2 5
.define public T3 3
bitloop:
    out x, 1.      side 0 [T3 - 1]
    jmp !x do_zero side 1 [T1 - 1]
 do_one:
 jmp bitloop       side 1 [T2 - 1]
 do_zero:
 nop               side 0 [T2 - 1]

We looked at the PIO syntax in the main cover feature, but it’s basically an assembly language for the PIO state machine. The WS2812B protocol uses pulses at a rate of 800kHz, but the length of the pulse determines if a 1 or a 0 is being sent. This code uses jumps to move through the loop to set the timings depending on whether the bit (stored in the register x) is 0 or 1. The T1, T2, and T3 variables hold the timings, so are used to calculate the delays (with 1 taken off as the instruction itself takes one clock cycle). There’s also a section in the pio file that links the PIO code and the C code:

% c-sdk {
#include "hardware/clocks.h"

static inline void ws2812_program_init(PIO pio,
uint sm, uint offset, uint pin, float freq, bool
rgbw) {

    pio_gpio_select(pio, pin);
    pio_sm_set_consecutive_pindirs(pio, sm, pin, 1,
true);
    pio_sm_config c = ws2812_program_get_default_
config(offset);
     sm_config_set_sideset_pins(&c, pin);
     sm_config_set_out_shift(&c, false, true, rgbw ?
32 : 24);
    sm_config_set_fifo_join(&c, PIO_FIFO_JOIN_TX);
 
    int cycles_per_bit = ws2812_T1 + ws2812_T2 +
ws2812_T3;
    float div = clock_get_hz(clk_sys) / (freq *
cycles_per_bit);

    sm_config_set_clkdiv(&c, div);
    pio_sm_init(pio, sm, offset, &c);
 
    pio_sm_set_enable(pio, sm, true);
}
%}

Most of this is setting the various PIO options – the full range is detailed in the Raspberry Pi Pico C/C++ SDK document.

 sm_config_set_out_shift(&c, false, true, rgbw ? 32
: 24);

This line sets up the output shift register which holds each 32 bits of data before it’s moved bit by bit into the PIO state machine. The parameters are the config (that we’re setting up and will use to initialise the state machine); a Boolean value for shifting right or left (false being left); and a Boolean value for autopull which we have set to true. This means that whenever the output shift register falls below a certain threshold (set in the next parameter), the PIO will automatically pull in the next 32 bits of data.

Using a text editor with programmer’s features such as syntax highlighting will make the job a lot easier
Using a text editor with programmer’s features such as syntax highlighting will make the job a lot easier

The final parameter is set using the expression rgbw ? 32 : 24. This means that if the variable rgbw is true, the value 32 is passed, otherwise 24 is passed. The rbgw variable is passed into this function when we create the PIO program from our C program and is used to specify whether we’re using an LED strip with four LEDs in each (using one red, one green, one blue, and one white) or three (red, green, and blue).

The PIO hardware works on 32-bit words, so each chunk of data we write with the values we want to send to the LEDs has to be 32 bits long. However, if we’re using RGB LED strips, we actually want to work in 24-bit lengths. By setting autopull to 24, we still pull in 32 bits each time, but once 24 bits have been read, another 32 bits are pulled in which overwrite the remaining 8 bits.

sm_config_set_fifo_join(&c, PIO_FIFO_JOIN_TX);

Each state machine has two four-word FIFOs attached to it. These can be used for one going in and one coming out. However, as we only have data going into our state machine, we can join them together to form a single eight-word FIFO using the above line. This gives us a small buffer of time to write data to in order to avoid the state machine running out of data and execution stalling. The following three lines are used to set the speed the state machine runs at:

int cycles_per_bit = ws2812_T1 + ws2812_T2 +
ws2812_T3;
    float div = clock_get_hz(clk_sys) / (freq *
cycles_per_bit);
   sm_config_clkdiv(&c, div);

The WS2812B protocol demands that data is sent out at a rate of 800kHz. However, each bit of data requires a number of state machine cycles. In this case, they’re defined in the variables T1, T2, and T3. If you look back at the original PIO program, you’ll see that these are used in the delays (always with 1 taken off the value because the initial instruction takes one cycle before the delay kicks in). Every loop of the PIO program will take T1 + T2 + T3 cycles. We use these values to calculate the speed we want the state machine to run at, and from there we can work out the divider we need to slow the system clock down to the right speed for the state machine. The final two lines just initialise and enable the state machine.

The main processor

That’s the code that’s running on the state machine, so let’s now look at the code that’s running on our main processor cores. The full code is on github. Let’s first look at the code running on the second core (we’ll look at how to start this code running shortly), as this controls the light levels of the LEDs.

int bit_depth=12; 
const int PIN_TX = 0;

uint pixels[STRING_LEN]; 
uint errors[STRING_LEN];
 
static inline void put_pixel(uint32_t pixel_grb) {
    pio_sm_put_blocking(pio0, 0, pixel_grb << 8u);
}
static inline uint32_t urgb_u32(uint8_t r, uint8_t
g, uint8_t b) {
    return
            ((uint32_t) (r) << 8) |
            ((uint32_t) (g) << 16) |
            (uint32_t) (b);
}
void ws2812b_core() {
   int valuer, valueg, valueb;
   int shift = bit_depth-8;
 
    while (1){

     for(int i=0; i<STRING_LEN; i++) {
       valueb=(pixelsb[i] + errorsb[i]) >> shift;
       valuer=(pixelsr[i] + errorsr[i]) >> shift;
       valueg=(pixelsg[i] + errorsg[i]) >> shift;
       put_pixel(urgb_u32(valuer, valueg, valueb));
       errorsb[i] = (pixelsb[i] + errorsb[i]) -
(valueb << shift);
       errorsr[i] = (pixelsr[i] + errorsr[i]) -
(valuer << shift);
       errorsg[i] = (pixelsg[i] + errorsg[i]) -
(valueg << shift);
     }
     sleep_us(400);
   }
}

We start by defining a virtual bit depth. This is how many bits per pixel you can use. Our code will then attempt to create the necessary additional brightness levels. It will run as fast as it can drive the LED strip, but if you try to do too many brightness levels, you’ll start to notice flickering.

We found twelve to be about the best with strings up to around 100 LEDs, but you can experiment with others. Our code works with two arrays – pixels which holds the values that we want to display, and errors which holds the error in what we’ve displayed so far (there are three of each for the different colour channels).

If you just want to see this in action, you can download the UF2 file from hsmag.cc/orfgBD and flash it straight to your Pico
If you just want to see this in action, you can download the UF2 file from hsmag.cc/orfgBD and flash it straight to your Pico

To explain that latter point, let’s take a look at the algorithm for determining how to light the LED. We borrowed this from the source code of Fadecandy by Micah Scott, but it’s a well-used algorithm for calculating error rates. We have an outer while loop that just keeps pushing out data to the LEDs as fast as possible. We don’t care about precise timings and just want as much speed as possible. We then go through each pixel.

The corresponding item in the errors array holds the cumulative amount our LED has been underlit so far compared to what we want it to be. Initially, this will be zero, but with each loop (if there’s a difference between what we want to light the LED and what we can light the LED) this error value will increase. These two numbers (the closest light level and the error) added together give the brightness at the pseudo-level, so we need to bit-shift this by the difference between our virtual level and the 8-bit brightness levels that are available.

This gives us the value for this pixel which we write out. We then need to calculate the new error level. Let’s take a look at what this means in practice. Suppose we want a brightness level halfway between 1 and 2 in the 8-bit levels. To simplify things, we’ll use nine virtual bits. 1 and 2 in 8-bit is 2 and 4 in 9 bits (adding an extra 0 to the end multiplies everything by a power of 2), so halfway between these two is a 9-bit value of 3 (or 11 in binary, which we’ll use from now on).

In the first iteration of our loop, pixels is 11, errors is 0, and shift is 1.

value = 11 >> 1 = 1
errors = 11 – 10 = 1

So this time, the brightness level of 1 is written out. The second iteration, we have:

value = 100 >> 1 = 10
errors = 100 – 100 = 0

So this time, the brightness level of 10 (in binary, or 2 in base 10) is written out. This time, the errors go back to 0, so we’re in the same position as at the start of the first loop. In this case, the LED will flick between the two brightness levels each loop so you’ll have a brightness half way between the two.

Using this simple algorithm, we can experiment with different virtual bit-depths. The algorithm will always handle the calculations for us, but we just have to see what creates the most pleasing visual effect for the eye. The larger the virtual bit depth, the more potential iterations you have to go through before the error accumulates enough to create a correction, so the more likely you are to see flicker. The biggest blocker to increasing the virtual bit depth is the sleep_us(400). This is needed to reset the LED strip.

NeoPixels come in many different shapes and sizes

Essentially, we throw out bits at 800kHz, and each block of 24 bits is sent, in turn, to the next LED. However, once there’s a long enough pause, everything resets and it goes back to the first LED. How big that pause is can vary. The truth is that a huge proportion of WS2812B LEDs are clones rather than official parts – and even for official parts, the length of the pause needed to reset has changed over the years.

400 microseconds is conservative and should work, but you may be able to get away with less (possibly even as low as 50 microseconds for some LEDs). The urgb_u32 method simply amalgamates the red, blue, and green values into a single 32-bit string (well, a 24-bit string that’s held inside a 32-bit string), and put_pixel sends this to the state machine. The bit shift there is to make sure the data is in the right place so the state machine reads the correct 24 bits from the output shift register.

Getting it running

We’ve now dealt with all the mechanics of the code. The only bit left is to stitch it all together.

int main() {

   PIO pio = pio0;
   int sm = 0;
   uint offset = pio_add_program(pio, &ws2812_
program);
    ws2812_program_init(pio, sm, offset, PIN_TX,
1000000, false);
   multicore_launch_core1(ws2812b_core);

   while (1) {
       for (int i = 0; i < 30; ++i) {
        pixels[i] = i;

        for (int j=0;j<30;++j){
          pixels[0] = j;
            if(j%8 == 0) { pixels[1] = j; }
               sleep_ms(50);
            }
        for (int j=30;j>0;--j){
         pixels[0] = j;
            if(j%8 == 0) { pixels[1] = j; }
           sleep_ms(50);
         }
      }
   } 
}

The method ws2812_program_init calls the method created in the PIO program to set everything up. To launch the algorithm creating the virtual bit-depth, we just have to use multicore_launch_core1 to set a function running on the other core. Once that’s done, whatever we put in the pixels array will be reflected as accurately as possible in the WS2812B LEDs. In this case, we simply fade it in and out, but you could do any animation you like.

Get a free Raspberry Pi Pico

Would you like a free Raspberry Pi Pico? Subscribe to HackSpace magazine via your preferred option here, and you’ll receive your new microcontroller in the mail before the next issue arrives.

14 comments

Avatar

Great article. Looking forward to a steady stream of these. Just waiting for Pimoroni to ship my unit.

Avatar

Are you powering the strip with 3.3v and therefore undervolting them as they require 5v? Or are the pico outputs 5v? How much current can the pico supply?

Avatar

I believe the Pico has one output that is level-shifted up to 5V intended for this very purpose (the rest of the GPIO being 3.3V), and it also has a power pin that is whatever the USB voltage input is, typically near 5V.

Avatar

I can see no reference in the specifications to a level shifted pin. As the pin used for the Neopixels is also used as one of the SPI0 pins it is very unlikely. A 3.3v pin driving is below that stated in the specification for most Neopixels when running at 5v, but it looks like they get away with it! I have always used a level shifter with Rpi and other 3.3v micro boards.

Avatar

Sorry, I was remembering a different board from Adafruit, using the same chip, mentioned in the official intro article: “Itsy Bitsy RP 2040 is positively tiny, but it still has lots of GPIO, 4MB of QSPI flash, boot and reset buttons, a built-in RGB NeoPixel, and even a 5V output logic pin…”

Avatar

You can get away with triggering ws2811 from 3.3v TTL whilst powering the strip with 5v if the cable runs are short. Otherwise power the strip from 3.3v and take the brightness hit.

Avatar

I add one WS2812 to my boards which is powered by 5v lowered by a diode (so near 4.3v) then a 3.3volt input data signal is working robustly. The buffered logical output of 4.3v is then high enough for the first in externally 5V powered LED chains.

Avatar

I have used these on a Pi, but found they could not be controlled by 3.3V logic, and a level converter is needed.

Avatar

This is a great article i must say.. very informative

Avatar

thank you

Avatar

Because that depends if you have WS2812B Eco which accepts 3.3V TTL. You can search WS2812B-V5 for further details

Avatar

I’m not sure if this kind of dithering makes sense for ws2812 based RGB LEDs: According to https://cdn-learn.adafruit.com/downloads/pdf/adafruit-neopixel-uberguide.pdf, the refresh rate of those is only 400Hz, so if you update more often, some of the updates will get lost, missing the point of the error accumulation algorithm.
If you do one update every 2.5ms (ie. at 400Hz), you could indeed get 4 additional bits of resolution if you accept LEDs blinking with 25Hz. But I guess that would require somehow syncing the refresh to the WS2812 PWM cycle.
Trying to increase the update rate by decreasing the reset time seems pointless.

Avatar

Cool stuff! How many strips of WS2812 can be adressed by the state-machines on seperate GPIOs? Eight or two? Is there a library for usig this yet?

Avatar

I have the same question. I’ve found no examples using more than one pin, and some posts suggest it isn’t possible. Although I assume this to mean it isn’t possible with current libraries.

Leave a Comment

Comments are closed