Human Voice
The most versatile instrument on earth is probably the human voice. It can produce almost any conceivable pitch, any timbre and any envelope. When sound designers discuss their work, it is rare that they describe their work as frequencies and envelopes. Most often, their vocabulary is onomatopoeic, i.e. mimicking the sounds they are working on. Also, from a perception point of view, humans recognise human voice in almost any cacophony. It is something we have learnt from, perhaps even before, birth!
What is characteristic about the human voice is formants. One way to describe formants is as a set of resonant filters with centre frequencies and bandwidths that are under continuous modification. A set of formants is typically what you hear when a human vocalises Ah-Eh-Ih-Oh-Uh at a constant pitch. You are hearing a fixed fundamental with formants being moved in relation to the fundamental frequency.
To understand these filter models, we have to look at human physiology of the vocal tract.

The vocal cords produce a sawtooth-like waveform. All the other parts of the vocal tract work as a number of filters. Each part can create its own resonance and bandwidth. From this we get the formants.


The lips, tongue and pharynx produce the fricatives, i.e. unpitched sounds such as k-t-p-g-[d-b]-s-h-sch-…
(c.f. McGurk effect!).
One special category of vocal sounds is, of course, whispering, when the vocal cords are not activated, only the flow of air, with the vocal tract filtering pink noise.