### Running on Average

Large increases in cost with only questionable increases in performance can be tolerated only in race horses and women.
Lord Kelvin

September 3, 2014
This morning I was browsing LMR and ran across this post from janson. In it he creates an averaging filter.  An averaging filter is a simple filter for smoothing out variations ("noise") in data, especially from real-world sensors.  There are many types of filters, some very simple, some very complex, some in between.  The advantages of the averaging filter are it is very simple and (can be) very fast.  It doesn't filter all that well, but often is all that is needed.

Considering that it could be very handy, and also that some of the techniques could be useful for lots of other purposes, I decided to post an "improved" version. Some of the techniques I will use don't seem to be very common in hobby projects, so I will explain what the code does, how it does it, and the special techniques I am using. Fair warning, I haven't tested this code so there may be bugs. I'm sure they will be pointed out if anyone finds any :-).

``` // Running average function // unsigned int running_average( unsigned int latest); // bdk6 September 3, 2014 // // Use like this: // unsigned int smoothed; // smoothed = running_average( latest_reading);   unsigned int running_average( unsigned int latest) { static unsigned int storage = {0,0,0,0,0,0,0,0}; // keeps the old values to subtract static unsigned int next = 0; // points to location to insert latest value static unsigned int total = 0; // the running total unsigned int ret = 0; //the value that will be returned // Here is the heart of it // subtract the oldest value from the total total -= storage[ next ]; // oldest is in the same place as next total += latest; // add in the new value storage[next] = latest; // and save for later next = (next +1) & 7; // increment next and wrap around if > 7 ret = total >> 3; // shifting right by 3 is dividing by 8 return ret; } ```

The first thing to note about this function is just that: it is a function.  When you call it it returns a value.  So every time you feed it a new sample, it returns a filtered sample to you.  That is important.  It means you don't have to collect a bunch of samples then process them all at once before getting a value to use.

The function is written with the intent of using it with 16 bit integers as used on Arduino and many others. It should work with 32 bit integers as well, though. The code is standard C so it should work with any C compiler. It takes a single unsigned int as input and returns an unsigned int. It is always best to use unsigned numbers when they are appropriate. In this case some of the math might not be done right if I used signed int's.

Next, notice the first three lines of code inside the function. There are three variables declared as "static unsigned int". Global variables are bad. Computer scientists have known that for over 60 years. There have even been languages that did not allow global variables at all. In Arduino (or any other C) when you declare a variable at the top of the file, outside any function, it is global. That means any part of your program can see it. More importantly it means any part of your program can change it. That is a magnet for bugs. The normal alternative is "local" variables, that are declared and only accessible within a function. (They can also be local within any block inside a function, but that is another story for another time.) The three variables above are local to the function, but the keyword "static" makes them different still. A "normal" local variable is also called an "automatic" variable. In fact, you could put the word automatic where the word static is in the declarations above and it would compile just fine. The keyword "automatic" is default. An automatic variable "automatically" gets created whenever the function gets called, and destroyed when it exits. That means a new one is created each time, and any old values are lost. The keyword "static" changes that. The static variable is local, but it lives forever just like a global variable. It retains its value between calls. It can be initialized as I have done, and that happens only once when it compiles (so the initial value must be constant). When the function runs it can read and change the value, and when it exits that value will remain. So the static variables are just what we need in this situation. They "hang around" between calls to the function, but their "scope" (where they are visible) is limited to within the function.

The first static variable is the array "storage". This is used to hold our previous samples. Notice it is a power of 2 (2, 4, 8, 16, etc). The variable "next" holds the index of the next location to insert data. "total" will keep the total of all the values in the array so that we don't have to add them up every time the functions is called. After all the static variables is the "normal" (automatic) variable "ret" which is simply used to hold the return value. It is a good practice to define a variable like that and initialize it rather than return the calculated value on the spot. There are several reasons, including clarity and easier debugging.

Now we reach the heart of the function, the filtering process. Notice it is quite short and simple, with no loops and a minimum of math and processing. It should run very quickly. The actual processing part only takes 5 lines. When we enter the function, the array "storage" is holding the last 8 samples and the variable "total" is holding the sum of those 8. We start by subtracting the oldest value from the total, leaving the total of the last 7 samples. Notice that since we put the samples in the array in a circular pattern (circular buffer) the oldest value is in the same place as we will store the newest.

We then add the latest sample to the total, giving us the new total for the most recent 8 samples, and then store it into the array. We update the "next" index to point to the location to insert the next sample. The method used to update "next" is a bit tricky and one of the neat speedups we can use. We want the "next" pointer to "wrap around" when it reaches the end of the array and normally we would use the "modulo" operator (%) to achieve that. But division is almost always slower than most other arithmetic and we try to avoid it whenever possible. The "modulo" operator is a division, returnring the remainder instead of the quotient. Instead, we use a power of two as the size of the array and use the properties of how the numbers are stored. A power of two is stored in binary by having a single bit set in the memroy. In this case, the 8 is stored as 0b00001000. That means that as we count, we don't ever want to count past 0b00000111 (7). When we add one to the variable "next", it will stay in the range 0 to 7 (0b00000000 to 0b00000111) until it reaches 8 (0b00001000). We always "and" it with the value 0b00000111 which has no effect until it reaches 8, at which point it "masks" the bit that is set to 1 back to 0 (0b00001000 & 0b00000111 = 0b00000000) giving us a nice wraparound, just what we need. That only works if the size of the array is a power of two. You can change the size, but the size of the "mask" value (7 here) must be changed to one less than the size ( 3 for size 4, 15 for size 16, etc). We avoided a complex and slow division by using one of the fastest operations the processor is capable of, a logical "and".

The next line calculates the average value to return. We already have the total of all the samples stored in the "total" variable, so there is no need to loop through and sum them up. We only need to divide by the number of samples. But here again, we use the properties of the binary number system to speed things up. Since in binary, "shifting" the bits to the right is equivalent to division, we use shifting instead of division. The same holds in decimal math. If we take the number 1000 decimal, and "shift" the digits right by one, losing the last, we get 100, a division by 10. The "right shift" operator (>>) tells the processor to shift all the bits of the left operand to the right by the number of places in the second operand. In our case, the "total" variable gets shifted right 3 places, which is the same as dividing by 8. ( 2^3 = 8). Again, this only works if the size of the array is a power of 2. If you change the size, the number of bits to shift must be changed as well ( >>2 = /4, >>4 = /16, etc).

The last thing to do is simply return the calculated value. And there we have it: a nice, simple, and very fast way to calculate a "running average" filter. We only explicitly used two adds, one subtract, one "and" and one right shift. All very simple and fast operations. Under the hood, the compiler has to do a couple of multiplies and adds to access the array, but those are mostly unavoidable anyway. This filter should be extremely fast.

There is one last point to make about this function. Because it stores the data within itself, it cannot be used to process two different streams of data simultaneously. If you need to process two different streams, you will need to use two separate copies of the code (with different names). Otherwise the data will become mixed together giving invalid results for both streams.

I hope you find this useful and find some nice tips to help you write better, simpler, and faster code.