New STM32F4 core - how powerfull is it?

TheSlowGrowth · May 2, 2014

Hi everybody!

I'm thinking about building a hybrid synth with the new STM32F4 core. With hybrid, I mean something like the Mutable Instruments Ambika: Digital Oscillators and modulators (envelopes, lfos) but analog filters.

I'm thinking about an 8-voice synthesizer with 3 envelopes, 3 lfos, 3 oscillators + noise per voice.

Now I'm wondering how powerfull the new STM32F4 core really is. If I wanted to run a total of 24 wavetable-oscillators, 24 envelopes, 24 lfos, and additional things like ringmodulators, noise, ... on it with a samplerate of at least 44.1kHz, would it manage the load?

As a comparison:

The Ambika uses a separate ATmega@20MHz for each voice. Its signal processing is lofi with mostly 8bits resolution, but some stuff is done at 12 or 16bit. A voice on the Ambika has 2 oscillators, 2 envelopes, 2 lfos and runs at 39kHz (if I read it right). So if I take that as a guide and scale it up - I would need at least 160MHz to run 8 voices in parallel. The DMA of the STM32 makes transferring data a bit lighter and the 32bit architecture makes calculations with 16bits an easy thing. So it could even fit a third oscillator per voice...

I know, the comparism to the 8bit ATMega is rubbish. But at least it could give an idea of the workload that comes with such a synthesizer design. I find it really hard to even estimate the power of the STM32F4.

what do you think?

Hawkeye · May 2, 2014

Hola,

with a bit of optimized code, it should be able to handle your wishes - built a software-mixing 8 channel modplayer (basically an eight channel-single oscillator wavetable player with bigger wavetables) back in the nineties running on sub-20mhz 16bit "dos" machines, which were far inferior to the processing power of the new 32-bit MIDIbox cores :smile:.

Here is a graphics demo i converted from the old "dos times", running on the old stm core 32, which is more than 60% slower than the new STM32F4.

So, it is doable, go ahead and enjoy! :-)

Many greets,

Peter

Edited May 2, 2014 by Hawkeye

TK. · May 2, 2014

Hi,

from my point of view, this is possible, but mainly depends on the effort you want to spend for the implementation.

Performance-Wise the STM32F4 is a beast, this is what several MIOS32 benchmarks are showing.

Typically STMF4 is 2 times faster than a LPC1769, and 3.5 times faster than a STM32F1

Compare against ATmega: see for example EEMBC benchmark comparison charts: http://www.eembc.org/benchmark/index.php

Typically STM32F4 is 5..6 times faster than a ATmega644

You are right, that by moving to a 32bit architecture additional performance improvement can be expected, and you can take advantage of it and increase the sample rate & accuracy.

If performance is still not sufficient, then consider to replace timing critical routines by hand optimized assembly code.

This is what Xavier did in his Preen2 FM project: http://ixox.fr/preenfm2/

Best Regards, Thorsten.

TheSlowGrowth · May 3, 2014

Hi Peter, hi Thorsten,

I'd really like to avoid assembly code - I want to keep portability as high as possible, because (if I ever finish it) this is going to be an open source synth and I want it to be open to modification.

I'd rather use a second core and deal with some complex inter-chip-communication than sacrifice portability.

It's good to see those benchmarks. You wrote its "5..6 times" faster. But where did you find those numbers?

For the ATmega644 it states a CoreMark (if thats the right number to look for) of 10.81. The STM32F417IGt6 (which should be from the same series as the one on the core) it states 469.17 â€¦ 565.73 depending on the compiler. Which means that the STM32F4 is 40..50 times faster than an ATmega644! Am I missing something?

The Discovery board is just 11â‚¬ + tax on mouser. I guess i'll just order a few and try it out :) (shipping is 20â‚¬ - pffff.)

Regards,

Johannes

TK. · May 3, 2014

Hi Johannes,

the numbers came from the EEMBC benchmarks linked above, but I wouldn't take this comparison too serious for your application.

E.g. if STM32F4 would only process 8bit values, the benefit isn't so high.

Or if a benchmark just toggles a pin, it could be that the performance is almost the same depending on the peripheral bus architecture ;-)

The truth is typically in the middle...

Anyhow, as you already mentioned above, STM32F4 provides features such as 32bit calculation and DMA, which are advantageous for the use case, and if it turns out that it isn't possible to replace all Atmel CPUs, just take a cheap second one... :smile:

There are some apps and tutorials in the respository which give you a good entry:

- Output audio via the I2S based DAC in 024_i2s_synth: http://svnmios.midibox.org/listing.php?repname=svn.mios32&path=%2Ftrunk%2Fapps%2Ftutorials%2F

- C++ based synth engine (w/o audio, only controllers like LFO, ENV, etc): http://svnmios.midibox.org/listing.php?repname=svn.mios32&path=%2Ftrunk%2Fapps%2Fsynthesizers%2Fmidibox_sid_v3%2F

- another C++ based CV engine: http://svnmios.midibox.org/listing.php?repname=svn.mios32&path=%2Ftrunk%2Fapps%2Fprocessing%2Fmidibox_cv_v2%2F

- simple project to output audio samples: http://svnmios.midibox.org/listing.php?repname=svn.mios32&path=%2Ftrunk%2Fapps%2Fsynthesizers%2FSD+card+sample+player%2F

We've no specific application which would really get use of the STM32F4 processing power while generating audio - you are in the pole position! :smile:

Best Regards, Thorsten.

TheSlowGrowth · May 4, 2014

Hi Thorsten,

thanks for all those links - I'll take a look at them!

One more question: Do you know how fast the floating point unit is, compared to the integer ALU? I mean: Can I afford to process all my audio and control signal as 32bit floating point data? I couldn't find any numbers on the actual processing speed of integer/float commands on the STM32F4.

Float obviously has the advantage of being less troublesome when it comes to modulation - e.g. if the maximum value for audio is 1.0 and the maximum for a control signal (from an envelope) is 1.0, I can simply multiply the current audio value with the envelope value and I'm done. With integers, it would be 32768 * 32768 and I would have to add a bitwise shift operation to get back into the desired range.

Regards,

Johannes

TK. · May 4, 2014

Hi Johannes,

I've no comparison available for integer vs. float.

But if the required shift operation is your only concern, I assume that fixed point is still faster.

A shift operation (regardless how many bits) takes only 1 cycle.

See also: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0439b/CHDDIGAC.html

I know that you don't like this tip, but anyhow: consider to use the special Cortex-M4 DSP instructions for fixed point operations (see also the bottom of the page that I linked above).

I guess that they can be embedded into C code by using intrinsics.

And in order to keep CPU independency, you could write another set of intrinsics which are doing the same operations with native C code.

Best Regards, Thorsten.

TheSlowGrowth · May 4, 2014

Hi Thorsten,

I just found this thread (german). In short:

float f2 = f * 2.29f;

Computation time (in cycles):

Without FPU (GCC): 41

FPU: 5

float f2 = f / 2.29f;

Without FPU (GCC): 155
FPU: 21

So yes, you are right: floating point performance is much slower than int+shift. In terms of code-readability I could introduce simple macros like "MulNormalized(a,b)". I could eventually replace them with DSP assembly code if it helps.

Speaking of DSP: I looked at those DSP instructions. Pretty neat! Basically SMMULR does my [(a*b)>>32]. Cool stuff! Thanks for all the valuable links and information!

Regards,
Johannes

Edited May 4, 2014 by TheSlowGrowth

Sign In

New STM32F4 core - how powerfull is it?

Recommended Posts

TheSlowGrowth

Hawkeye

TK.

TheSlowGrowth

TK.

TheSlowGrowth

TK.

TheSlowGrowth

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity