Man page - spettro(1)

Packages contas this manual

Manual

SPETTRO(1) General Commands Manual SPETTRO(1)

spettro - Show a scrolling spectrogram of music as it plays

spettro [option]... file...

Most spectrograms have a linear frequency axis, which is no good for music because the top half represents the top octave of the sound, the octave below occupies the next quarter of the screen, the following one the next eighth and so on, leaving most of the musical information crushed into the bottom few millimeters of the graph.

Spettro's logarithmic frequency axis makes each octave the same height, much like a conventional musical score and closer to the response of our ears, revealing a greater amount of musically-interesting detail to our eyes.

The playing position is the green line in the middle of the screen, so you can see musical events in arrival, hear them, and contemplate them as they recede to the left.

You can see/hear it playing a short piece at https://www.youtube.com/watch?v=fGRsLX0Ec1E

At the moment, you have to start spettro from the command line prompt, telling it the name of the audio file(s) that it should play and display.

You then use the keyboard and mouse to make everything else happen.

Pressing the space bar makes spettro play the file, scrolling the graph to the left in time as it plays the file. Press it again and it pauses, press it again and it continues playing. When it gets to the end of the piece, it stops; press Space and it starts playing again from the beginning.
If your keyboard has Play/Pause and Stop media buttons, these may also work.
Usually it plays all the audio files immediately and quits when it has done. This option makes it start up paused and pause at the end of each audio file when Space will make it replay the same file from the beginning and it will play the next audio file only when you hit n.
The p key should also do this and P should turn pause mode off.
Makes it play the audio files in a random order instead of in sequence.
The n key makes it skip to the next file. The [>>|] media key may also do this.
Mute mode makes no sound.

[<-] [->]
Left Arrow skips back one tenth of a screenful in the audio file. If a Shift key was held down, it skips back by a screenful, while holding a Ctrl key scrolls by one pixel; with both, by one second.
Right Arrow works similarly but skips forward in time instead of back.
The Home key moves you to the start of the piece and the End key to the end.
[^] [v]
The Up Arrow key moves you up the frequency axis by a tenth of the height of the graph; similarly, Down Arrow pans down to reveal more of the lower frequencies. With Shift, the view pans by the height of the graph, with Ctrl, by one pixel, with both, by one semitone.
Move up or down the frequency axis by the height of the graph (the same as Shift-UpArrowandShift-DownArrow).
This command-line option sets the initial playing position to time, which can be either a number of seconds into the piece, minutes:seconds or hours:minutes:seconds, optionally followed by a decimal point and a fraction of a second.
Move the green line one pixel left / right.

Zoom out or in by a factor of two on the time axis, so x makes twice as much of the sound visible, and X expands the central half of the graph to the whole width.
If you zoom in enough while playing the audio, spettro will eventually be unable to calculate fast enough to keep the screen properly updated.
Similarly, zoom out or in on the frequency axis, showing twice the frequency range as before and compressing the graph in the vertical direction, or zooming in by a factor of two so that what was the central half of the graph now fills the whole height. If Ctrl is held down, it zooms out or in by two pixels, revealing or hiding one more row of pixels at the top and the bottom of the graph.
With Ctrl held, Plus zooms in on both axes, like X and Y at the same time, and Minus zooms out in both directions.

The command-line options to set the zoom are:

The pixels-per-second value sets the initial value of the horizontal zoom. Specifically, it says how many pixels of screen each second of audio occupies. By default it starts at 100 pixels per second; higher values stretch the image wider and lower values display more of the song.
-n f or --min-freq f
Sets the frequency represented by the bottom pixel row, by default 27.5Hz (A0).
Sets the frequency represented by the top pixel row, by default 14080Hz (A9).

The minimum and maximum frequencies can be specified in Hertz or as note names such as A0 or A9 as well as C4# or C4+, both of which mean "middle C sharp", and B5b or B5-, which mean "B flat".
Note that, to get the # past the Unix shell, you need to type \# or enclose the note name in single or double quotation marks.

Starts up with the whole audio track exactly filling the display and hence with the current playing position half way through the piece. It also fixes the FFT frequency (unless you specified it with -f) so that each pixel column reflects the whole fragment of sound that it covers, not just a tiny fraction of a second part way through it.
For long tracks, this can require a lot of memory as it has to read the whole audio file into memory, uncompressed. Showing an hour of CD-quality stereo uses just under 2GB of RAM.

Cycles through the alternate color maps: the default colored heat map running from blue through red and yellow to white, grayscale (white on black) and for printing (grayscale, black on white).
The -ch, -cg and -cp, or --heat, --gray and --print or --colormap map command-line options select a color scheme.
Adjust the contrast by changing the range of loudnesses that the color scale covers. Z increases the contrast by reducing the dynamic range by 6dB (by 1dB if Ctrl is held), darkening the darker areas of the graph to hide background noise and making the main sonic events stand out. z decreases the contrast by increasing the dynamic range by 6dB (by 1dB if Ctrl is held), brightening the darker areas to reveal detail;
By default, the dynamic range starts at 96dB and the minimum it goes down to is 1 decibel, which shows only the very loudest points.
The -z r or --dyn-range r command line option sets the initial dynamic range to r decibels. Z because it's sort of the length of the Z-axis.
It's 'z' because the contrast is on the third axis of the graph.
Adjust the intensity (brightness) by changing the volume represented by the brightest pixel in the color map. i makes the graph 1dB darker, and I makes it 1dB brighter. With Ctrl, they adjust by 1/6th of a decibel.
When spettro encounters a pixel louder than the current maximum, it automatically lowers the brightness so that the loudest pixel is displayed at the brightest color (usually white). When this happens, all newly-painted pixel columns will be dimmer than the already-displayed ones, causing a color discontinuity when the new, louder pixels enter the displayed area. (It used to repaint previously-painted columns when this happens, but that turned out to be too slow.) To repaint the whole display at the new brightness, press Ctrl-L.
The -i dB or --maxdb dB flag lets you start with the brightest color representing a pixel energy of dB decibels. Values above 0 make the picture start out darker, and values below 0 make it start out brighter.
The current value of maxdb can be seen in the top right corner of the screen when the status line is being displayed (see "AXES AND OVERLAYS" below).

Spettro doesn't yet support changing the image's size while it is running (in fact, you can't "resize" its window with the mouse); instead, you can set the window's size with the command-line options:

Set the window's width to x pixels. The default is 640 and the maximum, limited by the SDL2 video driver that spettro uses, seems to be 4090.
Set the window's height to y pixels.
The default is 480 and the maximum, with SDL2, seems to be 4064.

Fullscreen mode fills the whole of the screen with the picture.

If spettro was compiled with SDL2, it switches to the full resolution of the desktop. EFL, instead, doesn't change the size of the underlying image and scales it to the size of the screen so the pixels and axes appear larger.

Switch between windowed and full-screen modes.
Makes it start up in full-screen mode.
Makes the window open minimized.

Spettro works by taking short samples of the audio file, centered on different moments in the audio file, and converting each sample to a column of pixels whose colors represent the energy in the sound at each frequency.

By default, the FFT frequency is 5Hz, which takes a fifth of a second sample for each column and enables it to distinguish frequencies that are 5Hz apart and to resolve about five distinct musical events per second.

A higher FFT frequency (using shorter samples) improves the graph's definition in the time direction but loses focus in the frequency direction, while a lower FFT frequency (using a longer sample for each column) improves detail in the frequency direction, but smears it in the time direction.

These keys change the size of the audio sample used for each column. f halves it and F doubles it.

The -f freq or --fft-freq freq command-line option tells it the FFT frequency to start with.

The other factor affecting the fine-grain quality of the image is the choice of window function used to smooth the edges of each sample before it is transformed into a spectrum.

Each window trades differently between frequency selectiveness and background noise, with the following list being approximately from the cleanest but least selective to the most selective but most noisy.

The five window functions can be selected by holding Ctrl down and pressing a letter, or with command line options.

Kaiser window, the default
Dolph window
Nuttall window
Blackman window
Hann window
Cycle forwards/backwards through the window functions in the order shown above.

For more info on the different window functions' peculiarities and merits, see Window function on Wikipedia or experiment with spettro and see for yourself.

Spettro can add a frequency scale to the left and right edges of the picture, and can overlay the graph with various sets of horizontal lines to show the positions of the 88 piano keys, the six open strings of a classical guitar or the staff lines of a conventional score.

Add a frequency axis on the left side of the graph and, on the right, the positions of the conventional notes from A0 to A9. Pressing a again makes them go away.
The -a or --frequency-axis command-line options make it start up showing the frequency scales.
When the zoom of the frequency axis only has space to show powers of ten in frequency, two unnumbered markers are displayed between them at the 2 and 5 positions (or 20 and 50, or 200 and 500 and so on).
Add a status line at the top of the display showing settings that determine what the graph looks like and, at the bottom, the time in seconds of the current playing time and, if they are on-screen, the start and end of the audio file and the left and right bar lines (see "BAR LINES" below) as well as a rectangle showing which portion of the whole audio file is being displayed.
The -A or --time-axis command-line options make it start up showing these.

The -T flag draws the axes with a tiny 3x5 font instead of the usual 7x9 one, leaving more space for the graph.

Overlay the spectrogram with 88 horizontal lines, some black, some white, showing where the keys of a grand piano are from A0(27.5Hz) to C7(4186Hz). Pressing k again removes them.
The -k or --piano command-line options make it start up showing these.
Overlay white lines showing the frequencies of the six open strings of a classical guitar, centered on:
E2(82.4Hz) A2(110Hz) D3(146.8Hz) G3(196.0Hz) B3(246.9Hz) E4 (329.6Hz)
Pressing g again removes them.
The -g or --guitar command-line options make it start up showing the guitar string lines.
Overlay two five-line staves like those of a conventional musical score.
From top to bottom, the staff lines are centered on:
F5(698.5Hz) D5(587.3Hz) B4(493.9Hz) G4(392.0Hz) E4(329.6Hz)
A3(220.0Hz) F3(174.6Hz) D3(146.8Hz) B2(123.5Hz) G2(98.0Hz)
Pressing s again removes them.
The -s or --score command-line options make it start up showing the staff lines.

Turning the staff lines on turns the guitar lines off and vice versa (because they overlap) but you can display the piano keys at the same time as either of them, to help identify notes above the top or below the bottom staff lines, or to see the fretted positions between the open guitar strings.

To help you pick out the rhythm, spettro can overlay vertical lines onto the graph to show where each bar of the music starts and ends.

To do this, mark two pixel columns to say where one bar starts and ends, and spettro will add bar lines evenly spaced throughout the rest of the piece.

To get a useful result, precise positioning of the bar lines is crucial and there are four ways to set them:

1.
With your eyes: While paused, you can position the graph so that the green line is at the start of a bar, press l, move the green line to the start of the next bar (probably with Ctrl-RightArrow) and press r.
2.
With your ears: While the music is playing, you can hit l as one bar starts and r when the next one does.
3.
With the mouse: You can click the left or right mouse button to position the left or right bar line where the tip of the mouse cursor is. You can also move the mouse while holding a button down to reposition the corresponding bar line.

for precise positioning with the first three methods, you probably want to work at a high time zoom (key X) as they are only accurate in time to one pixel.

4.
With command-line options -l and -r (or --left and --right) which set the time of the left and right bar lines expressed as minutes:seconds or hours:minutes:seconds into the track, optionally followed by a decimal point and a fraction of a second.

If you set the left bar line marker to the right of the right one (the "wrong way round") it will still work.

Once you have both bar line markers set, you can make it show lines dividing each bar into 2 to 12 beats by pressing the number keys from 2 to 9 or the function keys F2 to F12.

The 1 and F1 keys remove the beat lines, leaving just the bar lines and the 0 (zero) key removes both bar lines, but it remembers how many beat lines you had for the next time you position the bar lines.

The -b n (or --beats n) options set the initial number of beats per bar.

The Ctrl-O key dumps the sound between the left and right bar lines into a WAV file in the current directory, a simple way to crop segments out of larger files. The output file is named

spettro -l from -r to whatever.wav

where whatever is the name of the original audio file without its directory name or suffix and from and to are the time positions of the bar lines.

If either of the left or right bar lines is not set, it dumps from the beginning or until the end of the audio file respectively.

Make spettro quit/exit/close/terminate/finish/stop.

Ctrl-L redraws the display from the results already computed, while Ctrl-R recalculates every column from the original sound.

Crtl-L is useful if spettro has adjusted the brightness level, to make the brightness of the left half of the screen match the rest of it and Ctrl-R might be useful if the display gets garbled for some reason.

Spettro has a soft volume control. It multiplies the sound by this value before sending it to the audio-playing system, so you can increase the volume above 100%. There is no upper limit to the volume but, if the sound would have clipped, it will reduce the volume automatically if spettroP was compiled with SDL2. With EFL it just distorts.

+ -
Plus and Minus are spettro's volume-up and volume-down keys. If your keyboard has volume-up and volume-down media buttons, they may also work.
sets the initial soft volume level to n, where values above 1.0 make it louder than full volume and values lower less than 1.0 make it quieter than usual.
makes it gradually increase softvol so that it doubles in n seconds.

So if you want it to play all your music in a random order, normalizing the volume of each track to the maximum without clipping and gradually increasing it during the quieter passages (a simple compander), you can say:

spettro -S -v 400 -D 10 audio/*

Save ("output") a copy of the current image into a PNG file whose name is made up of the command-line options necessary to recreate the view that is being saved. Any visible axes are included but the green line is replaced by the appropriate spectral data.

The -o file.png (or --output) command-line option makes spettro start up, calculate the spectrogram, dump the image into a named file and quit without playing it or giving you the chance to press any buttons. The output can include the axes but doesn't include the green line.
With this option and, for example, -w 4000 -h 1000, you can create higher-definition images than your screen is capable of displaying.

To work round an as-yet unfathomed bug, it currently outputs everything except the very top row of pixels.

Print, on the console, information about the audio file and your current settings.

Set the number of FFT calculation threads (default: the same as the number of CPUs.)
Scroll the graph, at most, n times per second, instead of the usual 25 frames per second.
Show which version of spettro you are using, and the libraries that it was compiled with.
Show a summary of which key presses do what.
Show a summary of the command-line option flags.

A space-separated list of -options to apply before the ones on the command line. Command-line options override those in SPETTROFLAGS. My favorite is to export SPETTROFLAGS="-v400 -D10"
Says which screen column the green line should be in, usually half the display's width.

Spettro is under development at https:/gitlab.com/martinwguy/spettro, is known to work when compiled on Debian GNU/Linux and it should be portable to other Unices with SDL2 or Enlightenment and to Windows, Mac, Android, iOS and Tizen.

For a summary of known bugs, see the file BUGS in the source code.

Martin Guy <martinwguy@gmail.com>, January 2017 - January 2024.

20 January 2024