Category Archives: 闲话创客

Lessons in Algorithms


Earlier this year, Nathan Seidle, founder of SparkFun, created the Crowdsourcing Algorithms Challenge (aka, the Speed Bag Challenge). After numerous fantastic entries, one was chosen. The winner, Barry Hannigan, was asked to write up his process involved in solving this problem. This article is Barry Hannigan’s winning approach to solving real-world problems, even when the problem is not tangibly in front of you.

Firmware Resources

You can view Barry’s code by clicking the link below.



As the winner of Nate’s Speed Bag Challenge, I had the wonderful opportunity to meet with Nate at SparkFun’s headquarters in Boulder, CO. During our discussions, we thought it would be a good idea to create a tutorial describing how to go about solving a complex problem in an extremely short amount of time. While I’ll discuss specifics to this project, my hope is that you’ll be able to apply the thought process to your future projects—big or small.

Where to Start

In full-fledged software projects, from an Engineer’s perspective, you have four major phases:

  • Requirements
  • Design
  • Implementation
  • Test

Let’s face it; the design and coding is what everyone sees as interesting, where their creative juices can flow and the majority of the fun can be had. Naturally, there is the tendency to fixate on a certain aspect of the problem being solved and jump right in designing and coding. However, I will argue that the first and last phase can be the most important in any successful project, be it large or small. If you doubt that, consider this: my solution to the speed bag problem was designed wickedly fast, and I didn’t have a bag to test it on. But, with the right fixes applied in the end, the functionality was tested to verify that it produced the correct results. Conversely, a beautiful design and elegant implementation that doesn’t produce the required functionality will surely be considered a failure.

I didn’t mention prototype as a phase, because depending on the project it can happen in different phases or multiple phases. For instance, if the problem isn’t fully understood, a prototype can help figure out the requirements, or it can provide a proof of concept, or it can verify the use of a new technology. While important, prototyping is really an activity in one or more phases.

Getting back to the Speed Bag Challenge, in this particular case, even though it is a very small project, I suggest that you spend a little time in each of the four areas, or you will have a high probability of missing something important. To get a full understanding of what’s required, let’s survey everything we had available as inputs. The web post for the challenge listed five explicit requirements, which you can find here. Next, there was a link to Nate’s Github repository that had information on the recorded data format and a very brief explanation of how the speed bag device would work.

In this case, I would categorize what Nate did with the first speed bag counter implementation as a prototype to help reveal additional requirements. From Nate’s write-up on how he built the system, we know it used an accelerometer attached to the base of a speed bag and that the vibration data samples about every 2ms are to be used to count punches. We also now know that applying a polynomial smoothing function and looking for peaks above a threshold doesn’t accurately detect punches.

While trying not to be too formal for a small project, I kept these objectives (requirements) in mind while working the problem:

  • The algorithm shall be able to produce the correct number of hits from the recorded data sets
  • The solution shall be able to run on 8-bit and 32-bit micros
  • Produce documentation and help others learn from the solution put forth
  • Put code and documents in a public repository or website
  • Disclose the punch count and the solution produced for the Mystery data sets
  • Accelerometer attached to top of speed bag base, orientation unknown except +Z is up -Z is down
  • Complex data patterns will need more than polynomial filtering; you need to adjust to incoming data amplitude variations—as Nate suspects, resonance is the likely culprit
  • You have 15 days to complete (Yikes!)

Creating the Solution

As it goes in all projects, now that you know what should be done, the realization that there isn’t enough time sets in. Since I didn’t have the real hardware and needed to be able to visually see the output of my algorithm, I started working it out quickly in Java on my PC. I built in a way to plot the results of the waveforms on my screen. I’ve been using NetBeans for years to do Java development, so I started a new speed bag project. I always use JFreeChart library to plot data, so I added it to my project. Netbeans has a really good IDE and built-in GUI designer. All I had to do was create a GUI layout with a blank panel where I want the JFreeChart to display and then, at run time, create the JFreeChart object and add it to the panel. All the oscilloscope diagrams in this article were created by the JFreeChart display. Here is an image from my quick and dirty oscilloscope GUI design page.

NetBeans IDE

This algorithm was needed in a hurry, so my first pass is to be very object oriented and use every shortcut afforded me by using Java. Then, I’ll make it more C like in nature as I nail down the algorithm sequence. I jumped right in and plotted the X, Y and Z wave forms as they came from the recorded results. Once I got a look at the raw data, I decided to remove any biases first (i.e., gravity) and then sum the square of each waveform and take the square root. I added some smoothing by way of averaging a small number of values and employed a minimum time between threshold crossings to help filter out spikes. All in all, this seemed to make the data even worse on the plot. I decided to throw away X and Y, since I didn’t know in what orientation it was mounted and if it would be mounted the same on different speed bag platforms anyway. To my horror, even with just the Z axis, it still just looked like a mess of noise! I’m seeing peaks in the data way too close together. Only my minimum time between thresholds gate is helping make some sense of the punch count, but there really isn’t anything concrete in the data. Something’s not adding up. What am I missing?

Below is an image of the runF1 waveform. The blue signal is the filtered z axis, and the red line is a threshold for counting punches. As I mentioned, if it weren’t for my 250ms minimum between punch detections, my counter would be going crazy. Notice the way I have introduced two 5 millisecond delays in my runF1() processing so thresholding would be a little better if the red line were moved to the right by 10 milliseconds. I’ll talk more about aligning signals later in this article, but you can see in this image how time aligning signals is crucial for getting accurate results.

First filter with many peaks

The blue signal is the filtered z axis, and the red line is a threshold for counting punches.

If you look at the virtual oscilloscope output, you can see that between millisecond 25,000 and 26,000, which is 1 second in time, there are around nine distinct acceleration events. No way Nate is throwing nine punches in a second. Exactly how many punches should I expect to see per second? Back to the drawing board. I need another approach. Remember humility is your friend; if you just rush in on your high horse you usually will be knocked off it in a hurry.

Understand the Domain

Typically the requirements are drafted in the context of the domain of the problem that’s being solved, or some design aspects are developed from a requirement with domain knowledge applied. I don’t know the first thing about boxing speed bags, so time to do some Googling.

The real nugget I unearthed was that a boxer hits a speed bag, and it makes three contacts with the base: once forward (in punch direction), then it comes all the way back (opposite of punch direction) and strikes the base, and then it goes all the way forward again striking the base (in punch direction). Then the boxer punches it on its way back toward the boxer. This actually gives four opportunities to generate movement to the base, once from the shock of the boxer contacting the bag, and then three impacts with the base.

Now, what I see on the waveforms makes more sense. There isn’t a shock of the bag hitting the base once per punch. My second thought was how many punches can a boxer throw at a speed bag per second. Try as I might, I could not find a straight answer to this question. I found lots of websites with maximum shadow boxing punches and actual punches being thrown maximums but not a maximum for a speed bag. Time to derive my own conclusion: I thought about how far the speed bag must travel per punch and concluded that there must be a minimum amount of force to make the bag travel the distance it needs to impact the base three times. Since I’m not a boxer, all I could do is visualize hitting the bag as slowly as possible and it making three contacts. I concluded from the video in my mind’s eye that it would be difficult to hit a bag less than twice per second. OK, that’s a minimum; how about a maximum? Again, I summoned my mind’s eye video and this time moved my fist to strike the imaginary bag. I concluded with the distance the bag needed to travel and the amount of time to move a fist in and out of the path of the bag that about four per second is all that is possible, even with a skilled boxer. OK, it’s settled. I need to find events in the data that are happening between 2 and 4 hertz. Time to get back to coding and developing!

Build a little, Test a little, Learn a lot

While everyone’s brain works a little differently, I suggest that you try an iterative strategy, especially when you are solving a problem that does not have a clearly defined methodology going into it. I also suggest that when you feel you are ready to make a major tweak to an algorithm, you make a copy of the algorithm before starting to modify the copy, or start with an empty function and start pulling in pieces of the previous iteration. You can use source control to preserve your previous iteration, but I like having the previous iteration(s) in the code so I can easily reference it when working on the next iteration. I usually don’t like to write more than 10 or 20 lines of code without at minimum verifying it complies, but I really want to run it and print something out as confirmation that my logic and assumptions are correct. I’ve done this my entire career and will usually complain if I don’t have target hardware available to actually run what I’m coding. Around 2006, I heard a saying from a former Rear Admiral:


Build a little, Test a little, Learn a lot.

-Wayne Meyers, Rear Admiral, U.S. Navy

I really identify with that statement, as it succinctly states why I always want to keep running and testing what I’m writing. It either allows you to confirm your assumptions or reveals you are heading down the wrong path, allowing you to quickly get on the right path without throwing away a lot of work. This was yet another reason that I chose Java as my prototype platform, as I could quickly start running and testing code plus graph it out visually, in spite of not having the actual speed bag hardware.

Additionally, you will see in the middle of all six runFx() functions there is code that keeps track of the current time in milliseconds and verifies that the time stamp delta in milliseconds has elapsed or it sleeps for 1 millisecond. This allowed me to watch the data scroll by in my Java plotting window and see how the filtering output looks. I passed in X, Y and Z acceleration data along with an X, Y and Z average value. Since I only used Z data in most algorithms, I started cheating and sending in other values to be plotted, so it’s a little confusing when looking at the graphs of one through five since they don’t match the legend. However, plotting in real time allowed me to see the data and watch the hit counter increment. I could actually see and feel a sense of the rhythm into which the punches were settling and how the acceleration data was being affected by the resonance at prolonged constant rhythm. In addition to the visual output using the Java System.out.println() function, I can output data to a window in the NetBeans IDE.

If you look in the Java subdirectory in my GitHub repository, there is a file named In that file, I have a few functions named run1() through run6(). These were my six major iterations of the speed bag algorithm code.

Here are some highlights for each of the six iterations.


runF1() used only the Z axis, and employed weak bias removal using a sliding window and fixed amplification of the filtered Z data. I created an element called delay, which is a way to delay input data so it could be aligned later with output of averaged results. This allowed the sliding window average to be subtracted from Z axis data based on surrounding values, not by previous values. Punch detection used straight comparison of amplified filter data being greater than average of five samples with a minimum of 250 milliseconds between detections.


runF2() used only Z axis, and employed weak bias removal via a sliding window but added dynamic beta amplification of the filtered Z data based on the average amplitude above the bias that was removed when the last punch was detected. Also, a dynamic minimum time between punches of 225ms to 270ms was calculated based on delta time since last punch was detected. I called the amount of bias removed noise floor. I added a button to stop and resume the simulation so I could examine the debug output and the waveforms. This allowed me to see the beta amplification being used as the simulation went along.


runF3() used X and Z axis data. My theory was that there might be a jolt of movement from the punching action that could be additive to the Z axis data to help pinpoint the actual punch. It was basically the same algorithm as RunF2 but added in the X axis. It actually worked pretty well, and I thought I might be onto something here by correlating X movement and Z. I tried various tweaks and gyrations as you can see in the code lots of commented out experiments. I started playing around with what I call a compressor, which took the sum of five samples to see if it would detect bunches of energy around when punches occur. I didn’t use it in the algorithm but printed out how many times it crossed a threshold to see if it had any potential as a filtering element. In the end, this algorithm started to implode on itself, and it was time to take what I learned and start a new algorithm.


In runF4(), I increased the bias removal average to 50 samples. It started to work in attenuation and sample compression along with a fixed point LSB to preserve some decimal precision to the integer attenuate data. Since one of the requirements was this should be able to run on 8-bit microcontrollers, I wanted to avoid using floating point and time consuming math functions in the final C/C++ code. I’ll speak more to this in the components section, but, for now, know that I’m starting to work this in. I’ve convinced myself that finding bursts of acceleration is the way to go. At this point, I am removing the bias from both Z and X axis then squaring. I then attenuate each, adding the results together but scaling X axis value by 10. I added a second stage of averaging 11 filtered values to start smoothing the bursts of acceleration. Next, when the smoothed value gets above a fixed threshold of 100, the unsmoothed combination of Z and X squared starts getting loaded into the compressor until 100 samples have been added. If the compressor output of the 100 samples is greater than 5000, it is recorded as a hit. A variable time between punches gate is employed, but it is much smaller since the compressor is using 100 samples to encapsulate the punch detection. This lowers the gate time to between 125 and 275 milliseconds. While showing some promise, it was still too sensitive. While one data set would be spot on another would be off by 10 or more punches. After many tweaks and experiments, this algorithm began to implode on itself, and it was once again time to take what I’ve learned and start anew. I should mention that at this tim I’m starting to think there might not be a satisfactory solution to this problem. The resonant vibrations that seem to be out of phase with the contacts of the bag just seems to wreak havoc on the acceleration seen when the boxer gets into a good rhythm. Could this all just be a waste of time?


runF5()’s algorithm started out with the notion that a more formal high pass filter needed to be introduced rather than an average subtracted from the signal. The basic premise of the high pass filter was to use 99% of the value of new samples added to 1% of the value of average. An important concept added towards the end of runF5’s evolution was to try to simplify the algorithm by removing the first stage of processing into its own file to isolate it from later stages. Divide and Conquer; it’s been around forever, and it really holds true time and time again. I tried many experiments as you can see from the many commented out lines in the algorithm and in the file. In the end, it was time to carry forward the new Front End Processor concept and start anew with divide and conquer and a need for a more formal high pass filter.


With time running out, it’s time to pull together all that has been learned up to now, get the Java code ready to port to C/C++ and implement real filters as opposed to using running averages. In runF6(), I had been pulling together the theory that I need to filter out the bias on the front end with a high pass filter and then try to use a low pass filter on the remaining signal to find bursts of acceleration that occur at a 2 to 4 Hertz frequency. No way was I going to learn how to calculate my own filter tap values to implement the high and low pass filters in the small amount of time left before the deadline. Luckily, I discovered the t-filter web site. Talk about a triple play. Not only was I able to put in my parameters and get filter tap values, I was also able to leverage the C code it generated with a few tweaks in my Java code. Plus, it converted the tap values to fixed point for me! Fully employing the divide and conquer concept, this final version of the algorithm introduced isolated sub algorithms for both Front End Processor and Detection Processing. This allowed me to isolate the two functions from each other except for the output signal of one becoming the input to the other, which enabled me to focus easily on the task at hand rather than sift through a large group of variables where some might be shared between the two stages.

With this division of responsibility, it is now easy to focus on making the clear task of the Front End Processor to remove the bias values and output at a level that is readily acceptable for input into the Detection Processor. Now the Detection processor can clearly focus on filtering and implementing a state machine that can pick out the punch events that should occur between 2 and 4 times per second.

One thing to note is that this final algorithm is much smaller and simpler than some of the previous algorithms. Even though its software, at some point in the process you should still do a technique called Muntzing. Muntzing is a technique to go back and look at what can be removed without breaking the functionality. Every line of code that is removed is one less line of code that can have a bug. You can Google Earl “Madman” Muntz to get a better understanding and feel for the spirit of Muntzing.

Final output of DET

Final output of DET

Above is the visual output from runF6. The Green line is 45 samples delayed of the output of the low pass filter, and the yellow line is an average of 99 values of the output of the low pass filter. The Detection Processor includes a detection algorithm that detects punches by tracking min and max crossings of the Green signal using the Yellow signal as a template for dynamic thresholding. Each minimum is a Red spike, and each maximum is a Blue spike, which is also a punch detection. The timescale is in milliseconds. Notice there are about three blue spikes per second inside the 2 to 4Hz range predicted. And the rest is history!

Algorithm Components

Here is a brief look at each type of component I used in the various algorithms.


This is used to buffer a signal so you can time align it to some other operation. For example, if you average nine samples and you want to subtract the average from the original signal, you can use a delay of five samples of the original signal so you can use values that are itself plus the four samples before and four samples after.


Attenuation is a simple but useful operation that can scale a signal down before it is amplified in some fashion with filtering or some other operation that adds gain to the signal. Typically attenuation is measured in decibels (dB). You can attenuate power or amplitude depending on your application. If you cut the amplitude by half, you are reducing it by -6 dB. If you want to attenuate by other dB values, you can check the dB scale here. As it relates to the Speedbag algorithm, I’m basically trying to create clear gaps in the signal, for instance squelching or squishing smaller values closer to zero so that squaring values later can really push the peaks higher but not having as much effect on the values pushed down towards zero. I used this technique to help accentuate the bursts of acceleration versus background vibrations of the speed bag platform.

Sliding Window Average

Sliding Window Average is a technique of calculating a continuous average of the incoming signal over a given window of samples. The number of samples to be averaged is known as the window size. The way I like to implement a sliding window is to keep a running total of the samples and a ring buffer to keep track of the values. Once the ring buffer is full, the oldest value is removed and replaced with the next incoming value, and the value removed from the ring buffer is subtracted from the new value. That result is added to the running tally. Then simply divide the running total by the window size to get the current average whenever needed.


This is a very simple concept which is to change the sign of the values to all positive or all negative so they are additive. In this case, I used rectification to change all values to positive. As with rectification, you can use a full wave or half wave method. You can easily do full wave by using the abs() math function that returns the value as positive. You can square values to turn them positive, but you are changing the amplitude. A simple rectify can turn them positive without any other effects. To perform half wave rectification, you can just set any value less than zero to zero.


In the DSP world Compression is typically defined as compressing the amplitudes to keep them in a close range. My compression technique here is to sum up the values in a window of samples. This is a form of down-sampling as you only get one sample out each time the window is filled, but no values are being thrown away. It’s a pure total of the window, or optionally an average of the window. This was employed in a few of the algorithms to try to identify bursts of acceleration from quieter times. I didn’t actually use it in the final algorithm.

FIR Filter

Finite Impulse Response (FIR) is a digital filter that is implemented via a number of taps, each with its assigned polynomial coefficient. The number of taps is known as the filter’s order. One strength of the FIR is that it does not use any feedback, so any rounding errors are not cumulative and will not grow larger over time. A finite impulse response simply means that if you input a stream of samples that consisted of a one followed by all zeros, the output of the filter would go to zero within at most the order +1 amount of 0 value samples being fed in. So, the response to that single sample of one lives for a finite amount of samples and is gone. This is essentially achieved by the fact there isn’t any feedback employed. I’ve seen DSP articles claim calculating filter tap size and coefficients is simple, but not to me. I ended up finding an online app called tFilter that saved me a lot of time and aggravation. You pick the type of filter (low, high, bandpass, bandstop, etc) and then setup your frequency ranges and sampling frequency of your input data. You can even pick your coefficients to be produced in fixed point to avoid using floating point math. If you’re not sure how to use fixed point or never heard of it, I’ll talk about that in the Embedded Optimization Techniques section.

Embedded Optimization Techniques

Magnitude Squared

Mag Square is a technique that can save computing power of calculating square roots. For example, if you want to calculate the vector for X and Z axis, normally you would do the following: val = sqr((X * X) + (Y * Y)). However, you can simply leave the value in (X * X) + (Y * Y), unless you really need the exact vector value, the Mag Square gives you a usable ratio compared to other vectors calculated on subsequent samples. The numbers will be much larger, and you may want to use attenuation to make them smaller to avoid overflow from additional computation downstream.

I used this technique in the final algorithm to help accentuate the bursts of acceleration from the background vibrations. I only used Z * Z in my calculation, but I then attenuated all the values by half or -6dB to bring them back down to reasonable levels for further processing. For example, after removing the bias if I had some samples around 2 and then some around 10, when I squared those values I now have 4 and 100, a 25 to 1 ratio. Now, if I attenuate by .5, I have 2 and 50, still a 25 to 1 ratio but now with smaller numbers to work with.

Fixed Point

Using fixed point numbers is another way to stretch performance, especially on microcontrollers. Fixed point is basically integer math, but it can keep precision via an implied fixed decimal point at a particular bit position in all integers. In the case of my FIR filter, I instructed tFilter to generate polynomial values in 16-bit fixed point values. My motivation for this was to ensure I don’t use more than 32-bit integers, which would especially hurt performance on an 8-bit microcontroller.

Rather than go into the FIR filter code to explain how fixed point works, let me first use a simple example. While the FIR filter algorithm does complex filtering with many polynomials, we could implement a simple filter that outputs the same input signal but -6dB down or half its amplitude. In floating point terms, this would be a simple one tap filter to multiply each incoming sample by 0.5. To do this in fixed point with 16 bit precision, we would need to convert 0.5 into its 16-bit fixed point representation. A value of 1.0 is represented by 1 * (216) or 65,536. Anything less than 65536 is a value less than 1. To create a fixed point integer of 0.5, we simply use the same formula 0.5 * (216), which equals 32,768. Now we can use that value to lower the amplitude by .5 of every sample input. For example, say we input into our simple filter a sample with the value of 10. The filter would calculate 10 * 32768 = 327,680, which is the fixed point representation. If we no longer care about preserving the precision after the calculations are performed, it can easily be turned back into a non-fixed point integer by simply right shifting by the number of bits of precision being used. Thus, 327680 >> 16 = 5. As you can see, our filter changed 10 into 5 which of course is the one half or -6dB we wanted out. I know 0.5 was pretty simple, but if you had wanted 1/8 the amplitude, the same process would be used, 65536 * .125 = 8192. If we input a sample of 16, then 16 * 8192 = 131072, now change it back to an integer 131072 >> 16 = 2. Just to demonstrate how you lose the precision when turning back to integer (the same as going float to integer) if we input 10 into the 1/8th filter it would yield the following, 10 * 8192 = 81920 and then turning it back to integer would be 81920 >> 16 = 1, notice it was 1.25 in fixed point representation.

Getting back to the FIR filters, I picked 16 bits of precision, so I could have a fair amount of precision but balanced with a reasonable amount of whole numbers. Normally, a signed 32-bit integer can have a range of – 2,147,483,648 to +2,147,483,647, however there now are only 16 bits of whole numbers allowed which is a range of -32,768 to +32,767. Since you are now limited in the range of numbers you can use, you need to be cognizant of the values being fed in. If you look at the FEPFilter_get function, you will see there is an accumulator variable accZ which sums the values from each of the taps. Usually if your tap history values are 32 bit, you make your accumulator 64-bit to be sure you can hold the sum of all tap values. However, you can use a 32 bit value if you ensure that your input values are all less than some maximum. One way to calculate your maximum input value is to sum up the absolute values of the coefficients and divide by the maximum integer portion of the fixed point scheme. In the case of the FEP FIR filter, the sum of coefficients was 131646, so if the numbers can be 15 bits of positive whole numbers + 16 bits of fractional numbers, I can use the formula (231)/131646 which gives the FEP maximum input value of + or – 16,312. In this case, another optimization can be realized which is not to have a microcontroller do 64-bit calculations.

Walking the Signal Processing Chain

Delays Due to Filtering

Before walking through the processing chain, we should discuss delays caused by filtering. Many types of filtering add delays to the signal being processed. If you do a lot of filtering work, you are probably well aware of this fact, but, if you are not all that experienced with filtering signals, it’s something of which you should be aware. What do I mean by delay? This simply means that if I put in a value X and I get out a value Y, how long it takes for the most impact of X to show up in Y is the delay. In the case of a FIR filter, it can be easily seen by the filter’s Impulse response plot, which, if you remember from my description of FIR filters, is a stream of 0’s with a single 1 inserted. T-Filter shows the impulse response, so you can see how X impacts Y’s output. Below is an image of the FEP’s high pass filter Impulse Response taken from the T-Filter website. Notice in the image that the maximum impact on X is exactly in the middle, and there is a point for each tap in the filter.

Impulse response from T-Filter

Below is a diagram of a few of the FEP’s high pass filter signals. The red signal is the input from the accelerometer or the newest sample going into the filter, the blue signal is the oldest sample in the filter’s ring buffer. There are 19 taps in the FIR filter so they represent a plot of the first and last samples in the filter window. The green signal is the value coming out of the high pass filter. So to relate to my X and Y analogy above, the red signal is X and the green signal is Y. The blue signal is delayed by 36 milliseconds in relation to the red input signal which is exactly 18 samples at 2 milliseconds, this is the window of data that the filter works on and is the Finite amount of time X affects Y.

Delayed Signal Example

Notice the output of the high pass filter (green signal) seems to track changes from the input at a delay of 18 milliseconds, which is 9 samples at 2 milliseconds each. So, the most impact from the input signal is seen in the middle of the filter window, which also coincides with the Impulse Response plot where the strongest effects of the 1 value input are seen at the center of the filter window.

It’s not only a FIR that adds delay. Usually, any filtering that is done on a window of samples will cause a delay, and, typically, it will be half the window length. Depending on your application, this delay may or may not have to be accounted for in your design. However, if you want to line this signal up with another unfiltered or less filtered signal, you are going to have to account for it and align it with the use of a delay component.

Front End Processor

I’ve talked at length about how to get to a final solution and all the components that made up the solution, so now let’s walk through the processing chain and see how the signal is transformed into one that reveals the punches. The FEP’s main goal is to remove bias and create an output signal that smears across the bursts of acceleration to create a wave that is higher in amplitude during increased acceleration and lower amplitude during times of less acceleration. There are four serial components to the FEP: a High Pass FIR, Attenuator, Rectifier and Smoothing via Sliding Window Average.

The first image is the input and output of the High Pass FIR. Since they are offset by the amount of bias, they don’t overlay very much. The red signal is the input from the accelerometer, and the blue is the output from the FIR. Notice the 1g of acceleration due to gravity is removed and slower changes in the signal are filtered out. If you look between 24,750 and 25,000 milliseconds, you can see the blue signal is more like a straight line with spikes and a slight ringing on it, while the original input has those spikes but meandering on some slow ripple.

FEP Highpass In Out

Next is the output of the attenuator. While this component works on the entire signal, it lowers the peak values of the signal, but its most important job is to squish the quieter parts of the signal closer to zero values. The image below shows the output of the attenuator, and the input was the output of the High Pass FIR. As expected, peaks are much lower but so is the quieter time. This makes it a little easier to see the acceleration bursts.

FEP Atten Out

Next is the rectifier component. Its job is to turn all the acceleration energy in the positive direction so that it can be used in averaging. For example, an acceleration causing a positive spike of 1000 followed by a negative spike of 990 would yield an average of 5, while a 1000 followed by a positive of 990 would yield an average of 995, a huge difference. Below is an image of the Rectifier output. The bursts of acceleration are slightly more visually apparent, but not easily discernable. In fact, this image shows exactly why this problem is such a tough one to solve; you can clearly see how resonant shaking of the base causes the pattern to change during punch energy being added. The left side is lower and more frequent peaks, the right side has higher but less frequent peaks.

FEP Rectifier Out

The 49 value sliding window is the final step in the FEP. While we have done subtle changes to the signal that haven’t exactly made the punches jump out in the images, this final stage makes it visually apparent that the signal is well on its way of yielding the hidden punch information. The fruits of the previous signal processing magically show up at this stage. Below is an image of the Sliding Window average. The blue signal is its input or the output of the Rectifier, and the red signal is the output of the sliding window. The red signal is also the final output of the FEP stage of processing. Since it is a window, it has a delay associated with it. Its approximately 22 samples or 44 milliseconds on average. It doesn’t always look that way because sometimes the input signal spikes are suddenly tall with smaller ringing afterwards. Other times there are some small spikes leading up to the tall spikes and that makes the sliding window average output appear inconsistent in its delay based on where the peak of the output shows up. Although these bumps are small, they are now representing where new acceleration energy is being introduced due to punches.

FEP Final Out

Detection Processor

Now it’s time to move on to the Detection Processor (DET). The FEP outputs a signal that is starting to show where the bursts of acceleration are occurring. The DET’s job will be to enhance this signal and employ an algorithm to detect where the punches are occurring.

The first stage of the DET is an attenuator. Eventually, I want to add exponential gain to the signal to really pull up the peaks, but, before doing that, it is important to once again squish down the lower values towards zero and lower the peaks to keep from generating values too large to process in the rest of the DET chain. Below is an image of the output from the attenuator stage, it looks just like the signal output from the FEP, however notice the signal level peaks were above 100 from the FEP, and now peaks are barely over 50. The vertical scale is zoomed in with the max amplitude set to 500 so you can see that there is a viable signal with punch information.


With the signal sufficiently attenuated, it’s time to create the magic. The Magnitude Square function is where it all comes together. The attenuated signal carries the tiny seeds from which I’ll grow towering Redwoods. Below is an image of the Mag Square output, the red signal is the attenuated input, and the blue signal is the mag square output. I’ve had to zoom out to a 3,000 max vertical, and, as you can see, the input signal almost looks flat, yet the mag square was able to pull out unmistakable peaks that will aid the detection algorithm to pick out punches. You might ask why not just use these giant peaks to detect punches. One of the reasons I’ve picked this area of the signal to analyze is to show you how the amount of acceleration can vary greatly as you can see the peak between 25,000 and 25,250 is much smaller than the surrounding peaks, which makes pure thresholding a tough chore.

DET Mag Square

Next, I decided to put a Low Pass filter to try to remove any fast changing parts of the signal since I’m looking for events that occur in the 2 to 4 Hz range. It was tough on T-Filter to create a tight low pass filter with a 0 to 5 Hz band pass as it was generating filters with over 100 taps, and I didn’t want to take that processing hit, not to mention I would then need a 64-bit accumulator to hold the sum. I relaxed the band pass with a 0 to 19 Hz range and the band stop at 100 to 250 Hz. Below is an image of the low pass filter output. The blue signal is the input, and the red signal is the delayed output. I used this image because it allows the input and output signal to be seen without interfering with each other. The delay is due to 6 sample delay of the low pass FIR, but I have also introduced a 49 sample delay to this signal so that it is aligned in the center of the 99 sample sliding window average that follows in the processing chain. So it is delayed by a total of 55 samples or 110 milliseconds. In this image, you can see the slight amplification of the slow peaks by their height and how it is smoothed as the faster changing elements are attenuated. Not a lot going on here but the signal is a little cleaner, Earl Muntz might suggest I cut the low pass filter out of the circuit, and it might very well work without it.

Low pass delayed DET

The final stage of the signal processing is a 99 sample sliding window average. I built into the sliding window average the ability to return the sample in the middle of the window each time a new value is added and that is how I produced the 49 sample delayed signal in the previous image. This is important because the detection algorithm is going to have 2 parallel signals passed into it, the output of the 99 sliding window average and the 49 sample delayed input into the sliding window average. This will perfectly align the un-averaged signal in the middle of the sliding window average. The averaged signal is used as a dynamic threshold for the detection algorithm to use in its detection processing. Here, once again, is the image of the final output from the DET.

DET Final Out

In the image, the green and yellow signals are inputs to the detection algorithm, and the blue and red are outputs. As you can see, the green signal, which is a 49 samples delayed, is aligned perfectly with the yellow 99 sliding window average peaks. The detection algorithm monitors the crossing of the yellow by the green signal. This is accomplished by both maximum and minimum start guard state that verifies the signal has moved enough in the minimum or maximum direction in relation to the yellow signal and then switches to a state that monitors the green signal for enough change in direction to declare a maximum or minimum. When the peak start occurs and it’s been at least 260ms since the last detected peak, the state switches to monitor for a new peak in the green signal and also makes the blue spike seen in the image. This is when a punch count is registered. Once a new peak has been detected, the state changes to look for the start of a new minimum. Now, if the green signal falls below the yellow by a delta of 50, the state changes to look for a new minimum of the green signal. Once the green signal minimum is declared, the state changes to start looking for the start of a new peak of the green signal, and a red spike is shown on the image when this occurs.

Again, I’ve picked this time in the recorded data because it shows how the algorithm can track the punches even during big swings in peak amplitude. What’s interesting here is if you look between the 24,750 and 25,000 time frame, you can see the red spike detected a minimum due to the little spike upward of the green signal, which means the state machine started to look for the next start of peak at that point. However, the green signal never crossed the yellow line, so the start of peak state rode the signal all the way down to the floor and waited until the cross of the yellow line just before the 25,250 mark to declare the next start of peak. Additionally, the peak at the 25,250 mark is much lower than the surrounding peaks, but it was still easily detected. Thus, the dynamic thresholding and the state machine logic allows the speed bag punch detector algorithm to “Roll with the Punches”, so to speak.

Final Thoughts

To sum up, we’ve covered a lot of ground in this article. First, the importance of fully understanding the problem as it relates to the required end item along with the domain knowledge needed to get there. Second, for a problem of this nature creating a scaffold environment to build the algorithm was imperative, and in this instance, it was the Java prototype with visual display of the signals. Third, was implement for the target environment, on a PC you have wonderful optimizing compilers for powerful CPUs with tons of cache, for a microcontroller the optimization is really left to you. Use every optimization trick you know to keep processing as quick as possible. Fourth, iterative development can help you on problems like this. Keep reworking the problem while folding in the knowledge you are learning during the development process.

When I look back on this project and think about what ultimately made me successful, I can think of two main things. Creating the right tools for the job was invaluable. Being able to see how my processing components were affecting the signal was really invaluable. Not only plotting the output signal, but having it plot in realtime, allowed me to fully understand the acceleration being generated. It was as if Nate was in the corner punching the bag, and I was watching the waveform roll in on my screen. However, the biggest factor was realizing that in the end I am looking for something that happens 2 to 4 times per second. I latched on to that and relentlessly pursued how to translate the raw incoming signal into something that would show those events. There was nothing for me to Google to find that answer. Remember knowledge doesn’t really come from books, it gets recorded in books. First, someone had to go off script and discover something and then it becomes knowledge. Apply the knowledge you have and can find, but don’t be afraid to use your imagination to try what hasn’t been tried before to solve an unsolved problem. So remember in the future, metaphorically when you come to the end of the paved road. Will you turn around looking for a road already paved ,or will you lock in the hubs and keep plowing ahead to make your own discovery. I wasn’t able to just Google how to count punches with an accelerometer, but now someone can.





Strawbees是近年Maker Faire 的常客,采用吸管骨架的“小卡子”。看完视频你就会惊叹他们的创造力,在此之前,我们也想不到小小的吸管配合开源硬件,竟然能做出这么多有趣好玩的东西。玩家还可以用这些“小卡子”制造房子、风筝,等等等等。Strawbees 为每个人提供了一个支点,至于怎么撬动地球就凭你的想像了。





















有个同学说到:“我们似乎善于引进别人的东西,而对自己的好东西却不谙发散之道。” 我想为什么我之前拒绝把logo展示出来,恐怕是源于对自己的不信任。


这是极端抽象化的”创“字,说实话并没多少人在第一眼就看出来。当时很郁闷,很纠结。到底设计一个logo是真的要让人一眼就看懂,还是。。。已经混乱了。记得 周俨老师 说了,“设计是一种语言,只有自己能看懂的那就是艺术了。”或许,自己一直都没走出自我的包围圈。






















2 3 
































































北京景山学校吴俊杰 2014424日 于自缚居




不过这次在2014深圳制汇节(Make Faire)遇到了杨总,让我幡然悔悟,其实中国应该有很多潜藏创客,只是一直没有露脸,甚至不知道创客这个称呼而已,而他们在做创客做的事情,已经不是一两年了。







在刚刚过去的Maker Faire ShenZhen 2014上,小编Ted被意念控制飞行器深深吸引,戳图->2
















  1. 低噪放大器很关键:人脑电信号的微弱是的放大过程中极易被噪声所淹没,低噪放的性能影响着“神念”的初始样貌;
  2. 处理单元的算法设计是核心:得到的数字化0101看不出什么,这需要设计者设计出一种可靠的方法去理解“神念”的意志;

最后跟大家侃一下大山,原来神念公司的CEO是华人Stanley Yang(杨士玉),UCB的高材生喔!


人性化的科技時代:楊士玉 (Stanley Yang) at TEDxTaipei 2013首先你得有个翻墙工具···)



后来Stanley给他们带来了“神念”,当然“神念”只能读懂儿子的Yes or No。







然后Stanley问他:Mike,Do U Love your Mother?

儿子:Yes ! Yes !  Yes ! Yes !


科技创客不仅仅是technology and make something awsome,我们更加关注的是源于人类内在的人性,more human insight! 谢谢!











现代专利制度制定的初衷是提供给发明者在一定期限内保护自己想法的法律措施,将聪明才智纳入实用轨道的各种机制,这样做被认为可以鼓励创新。这一点在专利 制度诞生的初期确实让这些保护知识产权的国家吸引了大批优秀的人才,并且使社会形成了尊重知识的风气。最为典型的例子当属发明万能蒸汽机的詹姆斯瓦特,瓦 特早年生活非常艰难,还抚养了数个孩子,但晚年的瓦特非常富庶,凭借的就是被到大量转让万能蒸汽机的使用权。



2、 科技研究者个人,匆匆将自己的想法申请专利。为了防止在申请过程中出现可能的竞争者,往往在申请专利的一年半载期间,将技术或者想法对外保密,这阻止了技 术人员之间正常的交流,造成人为的技术封锁,在技术高速发展的时代,很可能当你获得专利权的时候,你所申请专利的技术已经失去了使用价值。

3、 专利的经济学色彩非常浓厚,趋利与垄断是专利的本质属性。很多的大公司利用专利制度来巩固自己的垄断地位,以获得高额而稳定的利润,破坏了市场竞争的公 平。那些拥有相当经济价值的专利要么被大公司抢先申请,要么被他们通过高价在申请者个人手上购买后,就成为推动自己垄断地位形成的工具。


早在上个世纪60、70年代的时候,计算机还是各个大学、研究所、大公司才能拥有的庞然大物时,这些机器上面普遍运行着的是一个被称为Unix的操作系统。Unix是由AT&T公司开发的一款优秀的操作系统,你虽然在使用的时候需要向AT&T公 司支付许可费用,但那个时候的代码从技术上来说是开源的,因为你可以获得它,甚至修改它。在那个时代,使用计算机的往往极少数由于工作等原因而接触到计算 机的工作者,他们都是熟知计算机工作原理的骇客。软件在一些计算机爱好者组成的社团之间相互分发,而且往往是免费的,这样他们就可以在别人的软件上面略作 改动以开发出适应自己需要的软件。





在爱好者中当前最为严峻的问题是,缺乏好的软件和教程,如果不懂编程的计算机用户没有好的软件,他的计算机资源就是一种浪费。业余爱好者难以写出高质量的 软件,软件为什么不能出售牟利,因为你们大多数人盗用了别人的劳动成果,硬件需要付费而软件却可以免费共享。那些为软件而开发工作的人却没有应得的回报, 这公平吗?谁会拿没有收入的工作作为自己的职业?这种社团里相互对内对外派发免费软件的方式不仅让劳动者无法获得回报,还阻碍了优秀软件的诞生。欢迎任何 愿意购买软件的人提建议和我联系。

众所周知,此后的微软渐渐走上了以闭源方式开发和出售软件的道路,并支配着当今大多数个人电脑的操作系统。如今的软件公司也大多追随微软,采用同样的盈利 模式来开发应用软件,刻录在光盘上并且出售给普通消费者。他们认为软件的算法和源代码是公司赖以生存的机密,是公司财富的一部分,为了保持自己的领先地 位,决不允许不必要的人能够获得,即使是自己的开发人员,也可能只能看到和自己开发工作相关的一部分。



查理·斯托曼是一位计算机爱好者,上个世纪70年 代他在麻省理工的人工智能实验室工作。他加入了一个计算机骇客社团,这个社团保持着和“佳酿电脑俱乐部”类似的运作模式。但后来他渐渐接触到有产权的商业 软件,闭源软件令他感到不安,这让继续开发和完善软件的愿望被限制,人们无法从软件公司获得源代码以便做些修改来适应自己的需求,即使这样的修改是有益 的。让我们来听听他的一些阐述:

那(商业软件)把我推到了一个道德上两难的境地,当你得到了一个授权的操作系统,你就必须和开发者签署一个所谓的协议,你不能和其他人共享软件,相反用户 却受到软件开发公司的支配和限制。对我来说,这个协议的本质就是要我去做坏蛋,去背叛世界上的其他人。把我从社会,从一个合作的团体中分割出来。我体验过 这种待遇,当别人拒绝和我们分享时候的感受,因为他们也签署了类似的协议。这在阻挠我们做某些有益的工作,所以软件版权的理念是错误的,我们要的不是这样 的生活方式。


被称为开源软件之父的查理斯托曼辞掉了当时在麻省理工的工作,创立了自由软件协会(FSF/Free Software Foundation),并公开发表声明,号召有志之士参与联合开发GUN项目。这个项目的目的就是编写一个又一个新的程序以替代Unix系统上应用程序。只有一点是不同的,这些软件都是开源的,你可以轻易的从互联网获得他们的源代码并且做出修改。当他们把文本编辑器、C编译器、调试器、E-mail收发程序等一系列必备软件都开发完成以后,他们开始尝试项目中最难的一个,开发一个类Unix的操作系统,以完全替代Unix,让之前的程序不需要再在版权软件的操作系统上运行,以构建完整的开源软件计算机软件系统。不过在这一过程中,据查理·斯托曼自己后来回顾,他们刚开始的思路可能不是很恰当,导致了后来调试的困难,进度缓慢。而在大洋彼岸的芬兰,在赫尔辛基大学的学生林纳斯·托瓦兹却也在做同样的事情,不过因为他的进度更快,当他把自己所写的操作系统的源代码释放到网上供大家免费下载的时候,许多电脑爱好者才发现,这正好是GUN项目中空白的操作系统。这样,一个完整的开源软件系统就完成了,它就是我们今天熟知的Linux操作系统。




开源社区一般由拥有共同兴趣爱好的人所组成,通常以类似论坛的形式出现,根据相应的开源软件许可证协议公布软件源代码,同时提供一个自由学习交流的空间。 由于开放源码软件由这些分散在世界各地的程序员类似于蚁工一般的点滴贡献来开发新程序,因此开源社区在推动开源软件发展的过程中起着巨大的作用。


大教堂模式(The Cathedral model):由专门的公司组织专门人员开发的闭源软件或者源代码在软件发行后公开的项目,软件的每个版本在开发过程中是由一个专属的团队所控管的。

市集模式(The Bazaar model):源代码在开发过程中即在互联网上公开,供人检视及开发。以Linux核心的创始者林纳斯·托瓦兹带领Linux核心的开发为例

“让够多人看到源代码,错误将无所遁形”(Given enough eyeballs, all bugs are shallow)。作者表示大教堂模式的软件开发让程序开发过程中调试debug的时间大幅增加,因为只有少数的开发者可参与修改工作。市集模式则相反。


为了保护开源软件不被闭源模式的企业或个人用以牟利,保证共享精神的延续,GUN计划中提出了“Copyleft”一说。开源软件其实是有版权的,只不过大家会选择一些协议,来保证在你赋予用户获得、修改和重新发布你作品的同时,也必须保持你给他同样的权利来发布他改进后的作品,不多也不少,这样才能保证分享的劳动成果不会被私吞及派生作品的延续。有人也将其译为“著佐权”,以彰显Copyleft是补足著作权(Copyright, 版权)不足的意义。另有译为“反版权”、“版权属左”、“脱离版权”、“版权所无”、“版权左派”、“公共版权”或“版责”,但这些译名的其中几个在意义上有所偏差。Copyleft许可协议不反对著作权的基本体制,却是通过利用著作权法来进一步地促进创作自由。


对于开源模式开发软件的商业化进程,可谓好事多磨。不仅在开源社区诞生之初,就遭到了微软等传统闭源软件公司的敌视、魔化甚至企图给开源运动者安上某些罪 名后将他们送上法庭。时至今日,许多对开源运动全貌并不了解的人也会时常像宣扬世界末日论那样宣称开源软件的商业模式行将就木。





不过,摆在走开源模式的工程师、商人、企业家、服务商面前的第一节事就是,如何赚钱。虽然开源运动创立之初是由不少乐于分享和贡献的开发者所带动的,但开 源的软件开发模式本身也应该寻求自己的盈利模式,否则开源运动就难以回答比尔盖茨给“佳酿电脑俱乐部”的公开信中所提到的问题————“谁会以没有报酬的 事情作为自己的职业?”,而开源运动也难以持续扩大。就好比我拿起一把扫把自发的去扫大街,确确实实为社会做出了贡献,但是如果不能从中获得合理的回报, 这样的行为又能发动多少人去做,长期不懈的去做并且做好。

首先我们来探讨传统软件的盈利模式,闭源软件的盈利模式显而易见,自己封闭起来开发,然后把软件产品放出去卖钱,如果你知道程序设计的原理,你就会明白经 过编译以后的程序几乎是不可能让人看懂的,只能给机器执行,这样的软件你很难改进或者学习研究。他们的理由是这些代码或者算法是公司的商业机密,公司赖以 生存的摇钱树和镇山之宝。这样的软件,升级和漏洞问题等售后服务只能由他们独家提供,也难怪这些公司的服务会如此的差了。

那开源软件呢?开源软件由社区协力开发,谁都可以免费下载获得并装载运行在自己的计算机上。这些开发者如何从中牟利?事实上当软件用于商业用途,那么就会 有许多问题需要人来维护支持,甚至是在原有的软件上修改或者二次开发,以适应自己的需求,那么我们的生意就来了。而且,对开源软件贡献越多,在开源社区声 望越高的开发者,雇用他们进行商业维护和开发的费用也越高,又或者他们可以成立咨询服务公司,开源社区事实上为开发人员提供了一个完全开放的展现自己实力 的舞台,正所谓是骡子是马,拉出来遛一遛就知道。开源可以让整个社会都重用别人开发过的优秀代码,如果可以重用的部分就用上,不需要自己再从头写一遍,而 只需要专注尚未有解决方案的新功能、新需求,这让整个社会都提高了资源的利用率。



Massimo Banzi是意大利一所设计学校的老师。他的学生们经常抱怨找不到便宜好用的微控制器来发挥他们在设计上的创意。2005年冬天,Massimo Banzi 跟David Cuartielles 讨论了这个问题。David Cuartielles 是一个西班牙的芯片工程师,当时在这所学校做访问学者。两人决定设计自己的电路板,并请Banzi 的学生 David Mellis 为电路板设计编程语言。数天后,电路板和开发环境都已完成,并被命名为Arduino。这个名字的灵感来源于意大利历史上一位著名的国王。

很快,他们发现这块板子交给即使完全不懂电子和程序设计的学生,只要稍加指点,他们就能用 Arduino 迅速实现他们的创意,做出很酷的东西。之后,三位作者把设计放到了互联网上,数月后,他们的设计作品就在网上得到了迅速的传播。

现在,Arduino论坛和社区不仅有了大量的访问者和支持者,诸如Seeed Studio和Sparkfun这样为Arduino提供扩展电子元器件的公司也越来越火热。通过这些资源,即使没有电子基础的人也很快能够上手,实现自己在电子设计方面的想法,而且论坛和社区上也不乏乐意分享和指导初学者的高手们。





我们生活在真实的物理世界中,我们需要食物、饮水还有各种各样实在的物品来实现我们生活质量的提高。然而在过去数十年计算机技术迅速发展的影响下,电脑、 手机、互联网等深深影响着我们的生活,似乎这些存在于虚拟的“比特”世界中的东西似乎对我们更有吸引力。因为它们发展得很快,而我们真实生活中各种产品的 进步却远远滞后了。




工程项目在开发的不同阶段有着不同的特点,当上述的发现需求的阶段过后,我们就进入了一个称之为头脑风暴的阶段,这个阶段是要在创意和风险之间寻找到一个 平衡点,不仅创意可能多种多样,实现的途径也可能有多种方案。正所谓送欲善其事,必先利其器,这个时候,柔性化的产品开发工具就是我们必不可少的利器。通 常的产品往往并非纯机械、纯电子或者纯软件的产品,而绝大多数是这三者的有机结合。

综上所述,我们在软件和电子领域已经有了许多不错的柔性工具可以使用,那机械方面呢?也许有人会提出使用乐高,创客中也有结合Arduino和 乐高积木来制作机电一体化产品的提议。乐高已经是在机械灵活度上最好的一个,但它的价格太贵而且是塑料材质,乐高本身的电子模块也不多,可以扩展使用的各 类传感器、电机非常有限。此外,乐高是完全封闭的平台,乐高官方几乎没有为人们在扩展其零件和功能上给出足够的支持,也难怪乐高的电子模块如此有限。如果 你基于乐高制作新零件并发放到市场,弄不好还会因为无意中触犯专利的法律条文而被乐高公司告上法庭。







1、    平民级的3D打印机多使用的是有限的几种塑胶材料,某些场合塑胶材料不能满足需求。

2、    大部分3D打印机的精度仍然较低,而且打印耗材较贵,显然不适合大批量生产。

3、    3D打印较为费时,对操作者也需要一定的经验和三维建模能力,并未能做到快速上手。