Chapter One
Basics of Performance Measurement in UMTS Terrestrial Radio Access Network (UTRAN)
Performance measurement represents a new stage of monitoring data. In the past monitoring networks meant decoding messages and filtering which messages belong to the same call. Single calls were analysed and failures were often only found by chance. Performance measurement is an effective means of scanning the whole network at any time and systematically searching for errors, bottlenecks and suspicious behaviour.
Performance measurement procedures and appropriate equipment have already been introduced in GSM and 2.5G GSM/GPRS radio access networks as well as in core networks, however, compared to the performance measurement requirements of UTRAN those legacy requirements were quite simple and it was relatively easy to collect the necessary protocol data as well as to compute and aggregate appropriate measurement results.
Nowadays even Technical Standard 3GPP 32.403 (Telecommunication Management. Performance Management (PM). Performance Measurements - UMTS and Combined UMTS/GSM) contains only a minimum set of requirements that is not much more than the tip of an iceberg. The definitions and recommendations of 3GPP explained in this chapter do not cover a wide enough range of possible performance measurement procedures, some descriptions are not even good enough to base a software implementation, and in some cases they lead to completely wrong measurement results. To put it in a nutshell it looks like the specification of performance measurement requirements for UTRAN is still in an early phase. This first part of the book will explain what is already defined by 3GPP, which additional requirements are of interest and which prerequisites and conditions always have to be kept in mind, because they have an impact on many measurement results even if they are not especially highlighted.
By the way, in the author's humble opinion, the biggest error in performance measurement is the copy and paste error. This results from copying requirements instead of developing concepts and ideas of one's own. As a result this book will also not contain ready-to-use performance measurement definitions, but rather discuss different ideas and offer possible solutions for a number of problems without claiming to cover all possibilities and having the only solutions.
1.1 GENERAL IDEAS OF PERFORMANCE MEASUREMENT
Performance measurement is fairly unique. There are many parameters and events that can be measured and many measurements that can be correlated to each other. The number of permutations is infinite. Hence, the question is: what is the right choice?
There is no general answer except perhaps the following: A network operator will define business targets based on economical key performance indicators (KPIs). These business targets provide the guidance to define network optimisation targets. And from network optimisation targets technical KPI targets can be derived, which describe an aspired behaviour of the network. Based on this, step by step, services are offered by operators. On a very common level these are e.g. speech calls and packet calls. These services will be optimised and detected errors will be eliminated. All in all it is correct to say that the purpose of performance measurement is to troubleshoot and optimise the network (see Figure 1.1).
However, whatever network operators do, it is up to the subscriber to finally evaluate if a network has been optimised in a way that meets customers' expectations. A rising churn rate (i.e. number of subscribers cancelling a contract and setting up a new one with a competitor operator) is an indicator that there might also be something wrong in the technical field.
Fortunately there is very good news for all analysts and market experts who care about churn rates: it is very difficult to calculate a real churn rate. This is because most subscribers in mobile networks today are prepaid subscribers, and since many prepaid subscribers are people who temporarily stay abroad, and based on the fact that prepaid tariffs are often significantly cheaper than roaming tariffs, such subscribers become temporary customers, so to speak. Once they go back to their home countries their prepaid accounts remain active until their contracts expire. Therefore not every expired contract is a churn. The actual number of churns is expected to be much less, but how much less? Additional information is necessary to find out about this.
The fact that additional information is necessary to compute non-technical key performance indicators based on measurement results (in this case based on a counter that counts the number of cancelled and expired contracts) also applies to the computation of technical KPIs and key quality indicators (KQIs). See Figure 1.2.
The general concept of these indicators is that network elements and probes, which are used as service resource instances, are placed at certain nodes of the network infrastructure to pick up performance-related data, e.g. cumulative counters of protocol events. In constant time intervals or in near real time this performance-related data is transferred to higher level service assurance and performance management systems. A typical example for such a solution is Vallent Corporation's WatchMark[R] software that is fed with performance data sent by radio network controllers (RNCs), mobile switching centres (MSCs) and GPRS support nodes (GSNs). For this purpose, e.g. an RNC writes the values of its predefined performance counters into a predefined XML report form every 15 minutes. This XML report file is sent via a so-called northbound interface that complies with the Tele Management Forum (TMF) CORBA specification to WatchMark[R] or any other higher level network management system. Additional data such as traffic and tariff models are provided by other sources and finally a complete solution for business and service management is presented.
As pointed out in www.watchmark.com the overall solution:
... provides benefits across a service provider's entire customer base including pre-paid, post-paid and enterprise customers:
Service quality management provides an end-to-end visibility of service quality on the network to ensure that each service (e.g. MMS, WiFi, iMode, SMS and GPRS etc.) is functioning correctly for each user on the network.
Internal and 3rd Party service level agreements (SLAs) allow Service Providers to test, evaluate and monitor service levels within the organization to ensure that optimum service quality is delivered to customers.
Corporate SLAs enable Service Providers to establish specific agreements with their corporate customers where they undertake to deliver customized end-to-end levels of service quality.
However, there is one major problem with this concept: network elements that feed higher level network management systems with data are basically designed to switch connections. It is not the primary job of an RNC to measure and report performance-related data. The most critical part of mobile networks is the radio interface, and the UTRAN controlled by RNCs is an excellent place to collect data giving an overview of radio interface quality considering that drive tests that can do the same job are expensive (at least it is necessary to pay two people per day and a car for a single drive test campaign). Secondly, performance data measured during drive tests cannot be reported frequently and directly to higher layer network management systems. Therefore a great deal of important performance measurement data that could be of high value for service quality management is simply not available. This triggers the need for a new generation of measurement equipment that is able to capture terabytes of data from UTRAN interfaces, performs highly sophisticated filtering and correlation processes, stores key performance data results in databases and is able to display, export and import these measurement results using standard components and procedures.
Before starting to discuss the architecture of such systems it is beneficial to have a look at some definitions.
1.1.1 WHAT IS A KPI?
Key performance indicators can be found everywhere, not just in telecommunications. A KPI does not need to deal with only technical things. There are dozens of economical KPIs that can be seen every day, for example the Dow Jones Index and exchanges rates. The turnover of a company should not be called a KPI, because it is just a counter value, however, the gross margin is a KPI. Hence, what makes the difference between performance-related data and a KPI is the fact that a KPI is computed using a formula.
There are different kinds of input for a KPI formula: cumulative counter values, constant values, timer values seem to be the most important ones. Also KPI values that have been already computed are often seen in new KPI formulas.
Most KPI formulas are simple. The difficulties are usually not in the formula itself, but e.g. in the way that data is first filtered and then collected. This shall be demonstrated by using a simple example. Imagine a KPI called NBAP Success Rate. It indicates how many NBAP (Node B application part) procedures have been completed successfully and how many have failed.
NBAP is a protocol used for communication between Node B (the UMTS base station) and its CRNC (controlling radio network controller). To compute a NBAP Success Rate a formula needs to be defined. In 3GPP 25.433 standard for Node B Application Part (NBAP) protocol or in technical books dealing with the explanation of UMTS signalling procedures (e.g. Kreher and Ruedebusch, 2005) it is described that in NBAP there are only three kinds of messages: Initiating Message, Successful Outcome and Unsuccessful Outcome (see Figure 1.3).
Following this a NBAP Success Rate could be defined as shown in Equation (1.1):
NBAP Success Rate = [summation] NBAP Successful Outcome/[summation] NBAP Initiating Message x 100% (1.1)
This looks good, but will lead to incorrect measurement results, because an important fact is not considered. There are two different classes of NBAP messages. In class 1 NBAP procedures the Initiating Message is answered with a Successful Outcome or Unsuccessful Outcome message, which is known in common protocol theory as acknowledged or connection-oriented data transfer. Class 2 NBAP procedures are unacknowledged or connectionless. This means only an Initiating Message is sent, but no answer is expected from the peer entity.
Since most NBAP messages monitored on the Iub interface belong to unacknowledged class 2 procedures (this is especially true for all NBAP common/dedicated measurement reports) the NBAP Success Rate computed using the above defined formula could show a value of less than 10%, which is caused by a major KPI definition/implementation error.
Knowing the difference between NBAP class 1 and class 2 procedures a filter criteria needs to be defined that could be expressed as follows:
NBAP Class 1 Success Rate = [summation] NBAP Successful Outcome/[summation] NBAP Class 1 Initiating Message x 100% (1.2)
An exact definition is usually not expressed in formulas, but more often by fully explaining in writing the KPI definition. A couple of examples can be found in Chapter 2 of this book. The lesson learnt from the NBAP Success Rate example is that one cannot compare KPIs based on their names alone. KPIs even cannot be compared based on their formulas. When KPIs are compared it is necessary to know the exact definition, especially the filter criteria used to select input and - as explained in next chapter - the aggregation levels and parameter correlations.
Never trust the apparently endless lists of names of supported KPIs that can be found in marketing documents of network and measurement equipment manufacturers. Often these lists consist of simple event counters. Therefore, it must be kept in mind that additional data is always necessary as well as simple counter values to compute meaningful KPIs and KQIs.
1.1.2 KPI AGGREGATION LEVELS AND CORRELATIONS
KPIs can be correlated to each other or related to elements in the network topology. The correlation to a certain part of the network topology is often called the aggregation level.
Imagine a throughput measurement. The data for this measurement can be collected for instance on the Iub interface, but can then be aggregated on the cell level, which means that the measurement values are related to a certain cell. This is meaningful because several cells share the same Iub interface and in the case of softer handover they also share the same data stream transported in the same Iub physical transport bearer that is described by AAL2 SVC address (VPI/VCI/CID). So it may happen that a single data stream on the Iub interface is transmitted using two radio links in two or three different cells. If the previously mentioned throughput measurement is used to get an impression of the load in the cell it is absolutely correct to correlate the single measurement result with all cells involved in this softer handover situation.
To demonstrate the correlation between mobile network KPIs an example of car KPIs shall be used (see Figure 1.4). The instruments of a car cockpit show the most important KPIs for the driver while driving. Other performance-relevant data can be read in the manual, e.g. volume of the fuel tank.
The first KPI is the speed, computed by the distance driven and the period of time taken. Another one is the maximum driving distance, which depends on the maximum volume of fuel in the tank. Maybe the car has an integrated computer that delivers more sophisticated KPIs, such as fuel usage depending on current speed, and the more fuel needed to drive a certain distance influences the maximum driving distance. In other words, there is a correlation between fuel usage and the maximum driving distance.
Regarding mobile telecommunication networks like UTRAN similar questions are raised. A standard question is: How many calls can one UTRA cell serve?
Network equipment manufacturers' fact sheets give an average number used for traffic planning processes, e.g. 120 voice calls (AMR 12.2 kbps). There are more or less calls if different services such as 384 kbps data calls or different AMR codecs with lower data transmission rates are used. The capacity of a cell depends on the type of active services and the conditions on the radio interface, especially on the level of interference. Hence, it makes sense to correlate interference measurements with the number of active calls shown per service. This combination of RF measurements requires sophisticated KPI definitions and measurement applications. The first step could start with the following approach: Count the number of active connections per cell and the number of services running on those active connections in the cell.
Before continuing with this example it is necessary to explain the frame conditions of this measurement, looking at where these counters can be pegged under which conditions and how data can be filtered to display counter subsets per cell and per service.
(Continues...)
Excerpted from UMTS Performance Measurementby Ralf Kreher Copyright © 2006 by Ralf Kreher. Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.