One Voice Circuit Described

The big telecom provider switched voice networks run on either ESS or Nortel DMS250 voice switches, using SS7 (Signaling System 7) as the intelligence to control and route the calls.  All the PSTN (Public Switched Telephone Network) calls run across 64 kbps switched connections (one DS0 of bandwidth).   The DS0 data rate allows a 4 kHz frequency range to be delivered.  It has long been discovered that more than once voice call can be sent over a DS0 connection - using various techniques of compression and better use of the silences that are part of voice conversations.  

However, the legacy switches and infrastructure are very expensive, and it would be cost-prohibitive to upgrade at this point.  In addition, virtually all the end user equipment is made for 64 kbps connections per voice call, and therefore, the methodology is entrenched.  Changes to this system are out of the question.  Instead, the movement underway is to move voice traffic onto other protocols, such as ATM, and Frame Relay transport - and the Internet.  This section will deal only with PSTN calls, by examining a single voice call.

The 4 kHz band for Analog Voice

The human ear can hear frequencies from 30 Hz up to 20 kHz, which is why most Stereo systems are rated for frequencies from 30 to 20 kHz. Voice frequencies, however, are generally confined between 300 and 3400 Hz, for a total range of 3100 Hz.

The public telephone system transmits from 0 to 4000 Hz, which includes sidebands (or "guardbands") on either side of the voice signal. The guardbands are from 0 to 300 Hz, and from 3400 to 4000 Hz. They contain little or no usable voice information. They are really only needed for older systems that use FDM (Frequency Division Mutiplexing). FDM is an analog in - analog out multiplexing scheme, and needs the guardbands to isolate adjacent voice signals from one another, and prevent bleedover between them. Regardless of whether or not the guardbands contain usable voice info or not - they are still part of the signal that is transmitted, for both older FDM systems, and current TDM digital. Even though a TDM system could operate without guardbands, again, the PSTN infrastructure has been built on the assumption that guard bands are needed and is too large a system to upgrade.   Therefore, 900 Hz are wasted, and the total voice signal occupies 4 kHz of analog bandwidth - not 3.1 kHz.

Terminology Problem

"It's an 8 k signal"  - a very common statement.  There is often confusion with analog data rates bandwidth vs digital bandwidth. People will often abbreviate, with no indication as to whether it is analog or digital. "8k" could mean 8 kHz (analog), or 8 kbps (digital). Hertz (Hz) is always analog, and bits per sec (bps) is always digital. The great majority of traffic is carried in digital form.  If you have no indication (such as written info where the author is no longer available to ask) - assume digital.

Why Digital Transmission is Preferred

Voice is always digitized before it rides across the PSTN. When transmitted, both analog and digitial signals lose amplitude and pick up noise - so they both must be amplified. However, with an analog signal it is difficult to differentiate the noise from the signal, and so both get amplified. With a digital signal, the signal levels are well defined, rectangular steps, and so instead of being simply amplified, they can be "regenerated" - restored exactly back into their original transmitted form.

NOTE: A stand-alone regenerator is rarely needed, since the multiplexers found in SLC 96s and remote terminals (located between the customer and CO) regenerate the signal. In addition, when the signal is received at the CO, the DXCs there also regenerate the signals. For digital signals riding on fiber, the only way to regenerate them is to convert them to electrical pulses, regenerate them, then convert them back to light pulses.

Digitizing the Voice Signal - PCM

At this point, the 4 kHz analog singnal needs to be converted to a digital signal, in order to ride over the PSTN. The telephone industry uses PCM (Pulse Code Modulation) to do this. PCM is an analog to digital (A/D) conversion, and consists of 3 steps:

1) Sampling (PAM) - sample the analog waveform at a constant rate. A sample is performed by recording the voltage level of the signal at that instant. This is called PAM (Pulse Amplitude Modulation) - where each sample is a PAM sample. In this case, we use "Nyquists Theorem", which states that an analog signal must be sampled at twice its bandwidth, in order to retain its basic characteristics once it is reconstructed back to analog. Therefore we need to sample a 4 kHz analog signal at a rate of 2x4000, or 8000 times per second.

The graph shows a typical voice signal being sampled at 8000 times per sec.  Since digital systems can only transmit numbers, the sampling of an analog signal is the first step in assigning a series of numbers to a signal.  The samples record the amplitude of the signal at each time interval.  

2) Quantizing - "quantify each sample" - assign a numerical value to the amplitude of each sample. It was decided to represent the 256 possible signal levels as going from negative 128 to positive 128. This gives the line an average DC voltage of approximately volts, which prevents DC current from flowing. Quantization has a "stairstep" effect on signals, since it has to round off sampled voltages to the nearest level - this is called "Quantizing Error", and introduces a small amount of noise. This noise is compounded with each A/D conversion, and so the maximum number of A/D conversions should never exceed 10.

From the example above, we have 10 samples, at amplitude levels as follows.  The "quantizing" error we talked about occurs because we can only use whole numbers.  For example, if at time interval 2, the amplitude was 128.3, it must be rounded off to 128.  This level of error is unnoticeable by the listener of the transmitted voice signal.

Time Interval   

Amplitude

0   

8

1   

64

2   

127

3   

0

4   

-105

5   

-74

6   

31

7   

64

8   

-38

9   

-28

10   

0

The rounding off causes some error.  An even greater error is caused by the stair step shape of the digital signal.  The stairstep effect looks to be extreme in the diagram.  However, keep in mind that the signal is being sampled 8000 times per second, which lessens the effect.  In addition, at the receiving end - the signal is converted back to analog, and is smoothed out by circuitry called a "filter" (which primarily uses a capacitor to smooth out edges and cause the signal to flow) :

3) Encoding - It was determined that all the signal levels should representable with a single byte, or 8 bits. One byte of data can represent 256 levels. Any one of the 256 signal levels can be represented by a single byte of data, often called an "8-bit word". Since we are sampling the analog signal at 8000 times per second, we need 8000 bytes per second to digitally represent the signal. One byte is 8 bits, so we use the following data rate to represent one voice signal :

8000 bytes/sec x 8 bits/byte = 64,000 bits/sec = 64 kBps

The data rate, 64 kbps is called a "DS-0", or DS0, which stands for "Digital Service, level 0". DS0 is the worldwide standard for digitizing one voice conversation with PCM.

NOTE: It becomes apparent that if you go from -128 to +128, there would actually be 257 quantized states, since 0 is included. However, one byte can only represent 256 levels, so apparently one of the levels is excluded (either +128 or -128).

Time Interval   

Sample Amplitude Decimal

Sample Amplitude Binary

0   

8

00000100

1   

64

01000000

2   

127

01111111

3   

0

00000000

4   

-105

11101001

5   

-74

11001010

6   

31

00011111

7   

64

01000000

8   

-38

10100110

9   

-28

10101000

10   

0

00000000

For a Decimal-to-Binary conversion table (0 to 255) - click here

NOTE:  for in-depth understanding of Binary (Bits and Bytes) - see the Hardware Section

In-Band Signaling

The term, "In-Band", simply means that the bits used to perform various signaling functions are included in the band reserved for the data. Data refers to the digitized information that is being transmitted from one end to the next (voice, or data), and signaling refers to bits that the switches examine for various states of the call (off-hook, answer supervision, etc.). AMI encoded signaling, 56k per channel, which is mostly used for voice, or switched 56. AMI signals usually use D4 framing (also called SF, or Super Frame). AMI encoding was originally developed for switched voice circuits - it uses "bit robbing" (robs one bit from each byte for switch signaling), which requires 8k of overhead for every 64k. So for each 64k, there is only 56k of data. - perfect for one voice circuit, or switched 56 data. AMI is always in channels of 56k