EXCLUSIVE! The month-long Streamyx Crisis that rattled Zamzamzairani Mohd Isa — the TM Malaysia Business CEO who commands operations at TM Wholesale, TM Retail and TM Net combined, and reports to Super CEO DAWO — is coming to a formal closure this morning.

When the select group of TM senior officers meet at the Olympia Hall at Wisma TM Annex1 (Cygal) this morning, a Streamyx Crisis Management model is expected to be presented and adopted. Hopefully, your Streamyx woes caused by TM Wholesale, the TM subsidiary that controls and cross-sells the entire backhaul and network infrastructure to Streamyx broadband operator TM Net, will not re-occur.

This November 16 meeting, according to Little Birds, will also nail home TM Wholesale’s dire needs of a Crisis Management Process, a Change Management Process and Improvement Action Plan.

According to the Streamyx Crisis Management model, which was drawn up by TM Wholesale team and finalised last week, the current month long Streamyx snarl-up is classified as a Crisis, one notch below Disaster.

StreamyxCrisis.jpg
Note that the commanding officer to restore Streamyx’s business continuity
is the COO of TM Wholesale

Marconi, Juniper in spolight

As the current Streamyx problem happened protractedly in the midst of nationwide DSLAM upgrades, which were coincidentally compounded by outages in several ATM (Asynchronous Transfer Mode) links, TM Wholesale has decided to haul up two key vendors, requiring them to rectify the problems with utmost urgency.

Screenshots was made to understand that TM’s General Manager in charge of Secure Network Operations Centre (SNOC), Rukiah Ahmad, has been instructed to notify ATM vendor, Marconi Malaysia, to explain fully on peculiar incidents related to abnormal ATM traffic utilisation and cell loss.

StreamyxCrisis_ATM_Peak.jpg

StreamyxCrisis_CPU_Peak.jpg
A comparison of peaks at ATM Trunk Utilisation and ATM CPU Utilisation
benchmarked against averages in October

At the same time, the SNOC GM has been instructed to notify Juniper Malaysia, requiring it to “explain and ascertain” the peculiar behaviour at two exchanges at Kelana Jaya (ERX KLJ03) and Cyberjaya (CBJ02) during the crisis.

StreamyxCrisis_CPU.jpg

Apart from the abnormal behaviour mentioned above, there is also an incident of High CPU Utilisation recorded in Cyberjaya (ASE CBJ10) on November 3, 2006.

Besides, there was an incident of SCP (Service Control Point) Switchover. Network security experts advising Screenshots say undetermined continuous internal process is the key suspect to have caused High CPU Utilisation and followed by a switchover.

However, there is another view that qualifies to say the particular High CPU Utilisation on this switch did not impact BRAS (Broadband Remote Access Server) session until the switchover occurred.

Nevertheless, SNOC head Rukiah Ahmad has been asked to re-study the entire escalation process for Streamyx degradation.

Screenshots was also made to understand that, at the behest of Zamzamzairani, a meeting was called on November 8 by TM Whole CEO, Baharum Salleh, to map out the immediate action plan and a viable model to thwart future crisis.

Meanwhile, Zamzamzairani seemed to have been convinced that all upgrading to Streamyx networks is to proceed as planned.

At the earlier phase of the Crisis, insiders say Zamzamzairani has considered to call off the upgrading of localised networks currently being imeplemented by Juniper Malaysia, where port speed at the localised exchanges and end-users’ sync speed, were increased from 1Mbps to 1.5Mbps for those on DSLAM, and from 1.5Mbps to 2Mbps for those on RDSLAM and RTDSLAM, respectively.

It is understood that some 60% of the DSLAM had been upgraded from 1M to 1.5M between October 7 to October 18, 2006

Meanwhile, TM Wholesale is expected to adhere to a more controlled approach for the remaining of the upgrading exercise, and all interested parties within TM Wholesale, TM Net and the in-sourced call centre, VADS, are expected to be kept informed as to the schedule and implementation areas to avoid similar crisis

The new mindset includes the immediate setting-up of a crisis team at the TM Net Operations GM level upon notification of any abnormal customer complaints.

Besides, the head of TM ANOC (Alternate Network Operations Control) has been tasked to expedite the setting up of the Broadband Management Centre.

VADS at wits’ end

It is noted VADS, the customer service operations in support to TM Net and Streamyx end-users, has been the party taking the full brunt of customer complaints throughout the crisis.

VADS was blamed for being unable to help Streamyx customers in trouble-shooting and rectify the various problems they faced from October through mid November.

However, people in the know of call centre operations told Screenshots that VADS was not supported with sufficient resources to zero in to the customers’ in-situ connection details.

For example, the ICOMS i.e. the convergent voice, video and data billing and customer care solution used by VADS), does not have access to other critical databases that could tell more about a customer’s localised parameters for fault detection. Using the present version of ICOMS, VADS is unable to pinpoint the precise faults that may vary from a determined exchange, DSLAM, or port.

Sources told Screenshots that, learning from the current crisis, VADS will now be enabled to provide analysis of the complaints into DSLAM and exchange areas for faster troubleshooting and restoration of Streamyx services.

In addition, VADS will also be given access to TM BMS (basic mappings of network installations) for faster identification of problematic DSLAM areas.

However, it is learnt that this additional access granted to VADS may be withdrawn once the crisis is overcome.

Screenshots to probe further

Since being alerted on October 18 about the current protracted Streamyx snarl-up, Screenshots has been talking to ISP industry vendors, consultants and specialists in network deployment, and even customer call centre operators, to probe the unreveiled.

When the information gathered in the research is analysed, the jigsaw puzzle takes shape. The most likely culprit is none other than TM Wholesale.

Several reliable sources told Screenshots that the current Streamyx problem was first detected as early as October 2, when there was a power trip at TM’s RADIUS (Remote Authentication Dial In User Service) at the Brickfields Data Centre.

The incident later developed into a complex state of problems at the end-user side — sporadic but nationwide — where two patterns of connection difficulty were detected: ( 1 ) DSL blinking; ( 2 ) DSL stable but cannot login.

Subsequently, the problems escalated from the Klang Valley to include areas classified by TM Wholesale as CBA (Critical Business Areas), namely Wilayah Persekutuan Kuala Lumpur, Wilayah Persekutuan Labuan (where Malaysia’s offshore banking is located), Putrajaya, Penang and Johor Selatan.

Screenshots was made to understand the problem of “DSL Blinking” was most likely to have been caused by the upgrading process currently being implemented by Juniper Malaysia, which has been noted to give rise to poor Signal-Noise Ratio (SNR) Margin.

Insiders in the network vendor industry told Screenshots that the recent port speed upgrading implementations have caused two major impacts on Streamyx Quality of Service (QoS):

  1. It affects Streamyx subscribers at boundary condition where increasing the speed will reduce SNR margin that can cause permanent or intermittent DSL blinking;
  2. The average transport traffic — as a consequence of port speed increase — was expected to go up up to 25%. This sudden increase can lead to network congestion, causing high latency between network elements and ATM cell loss.

    Under such a stressful environment, industry insiders say, any momentary failure of a network element can cause a sudden burst of authentication attempts that flood RADIUS (Remote Authentication Dial In User Service) and BRAS (Broadband Remote Access Server) ability to process Point-to-Point Protocol (PPP).

To rectify the problem, TM Whole sale has been advised to downgrade selectively,starting from October 19, subscribers on boundary condition that do not meet the following conditions:
1 ) SNR margin greater than 12 dB
2 ) Attenuation less than 48 dB

On the other hand, there seems to be a variety of most likely causes for the “DSL Stable but cannot login” problem, which includes:
1 ) RADIUS (Remote Authentication Dial In User Service) outage
2 ) RADIUS failed to authenticate users
3 ) High latency between network elements
4 ) BRAS (Broadband Remote Access Server) failure in processing Point-to-Point Protocol (PPP)
5 ) ATM (Asynchronous Transfer Mode) cell loss
6 ) Sudden burst of authentication attempt

All these have prompted for caution letters to be issued to the two major TM vendors, namely Marconi and Juniper Malaysia, requiring them to revert with detailed clarification at full speed.

Expect more expose if these two international-lined vendors and TM Wholesale don’t buck up soon.

And I particularly hate the fact that TM Malaysia Business, under CEO Zamzamzairani, has the audacity to keep us Streamyx subscribers in the dark for over one month throughout this Streamyx Crisis! If you don’t tell us why we have to pay full rate for your sub-standard service, we will find it out and tell the world — for ourselves and by ourselves.

TO BE CONTINUED…