Performance and Redundancy Configuration
The EngageIP platform is designed for scalability. As LogiSense customers grow their business, they expect their back office system to handle the workloads and usage that come with a larger volume of customers, data and transactions. This document is intended to provide an overview of how EngageIP is designed from a scalability perspective to address the needs of LogiSense customers.
Deployment
There are 3 key components to an EngageIP deployment:
EngageIP Web Server | The EngageIP Web Server is Windows IIS based and hosts the EngageIP web portal. This is the administrator portal through which an administrator configures the EngageIP system. |
EngageIP App Server | The EngageIP App Server hosts application services such as subscription billing and real time rating that power the EngageIP system. The App Server runs multiple services; each service executes processing for a key EngageIP processing block. |
EngageIP Database Server | This server powers the Microsoft SQL database. The database contains configuration data, tables and business data that needs to be persisted. |
There are various deployment topologies that can be supported based on the scalability, performance and redundancy needs of the end customer. The sections below list some of the permutations that are available along with an analysis of the pros and cons.
Basic Non-Redundant Configuration
The figure below illustrates a basic deployment. The EngageIP Web and App servers are installed as virtual machines on a physical host running Windows Server 2012. Microsoft SQL Server 2012 or 2014 is installed on the same hardware.
Details for minimum hardware requirements for a single EngageIP installation can be found in the Requirements Section under "Basic Configuration". The non-redundant deployment topology as illustrated in Figure 1 in combination with hardware in the table below should permit the system to scale up to 100 million UDR’s per month and up to 200,000 records.
Enhanced Configuration
As service providers scale, they will need to process greater number of UDR records and larger numbers of user packages. In such a scenario it is desirable to have the web server, app server and database on separate Hyper-V Hosts. As long as processing power is adequate, the web server and app server can be placed on their own virtual hosts. It is strongly recommended that the Database server run on it’s own physical box.
Details for minimum hardware requirements for a Enhanced Multi-System EngageIP Install can be found in the Requirements Section under the header of "Enhanced Configuration". These specs should allow a deployment to scale up to 750 million records per month and up a million user packages per month.
Enhanced Configuration with Active-Standby Redundancy
While this basic deployment scales well, it lacks redundancy and does not scale. If the virtual machine hosting the app server were to crash, the EngageIP system would go offline until the VM recovers. To support deployment redundancy, it is recommended to have multiple redundant app, web server and database instances. This is illustrated in Figure 3 below.
As illustrated in the figure, multiple redundant app and web server VM’s are deployed on the same box. If App Server A crashes, App Server B will take over. Similarly, if Web Server A crashes, Web Server B will take over. If desired, up to ‘N’ redundant nodes can be setup depending on desired level of up-time and availability. However, adding additional nodes will increase complexity and overhead especially if they are set up on the same machine. Please consult your LogiSense solutions consultant for specific guidelines on how many redundant nodes are sufficient for your needs. Multiple DB’s are set up as part of a SQL Server cluster in order to support Database level redundancy. SQL Server clusters can be virtualized; please consult your Microsoft SQL documentation for best practices with regards to SQL server clustering and virtualization. Since this is an active-passive configuration, the processing requirements would be equivalent to the enhanced configuration with no redundancy.
Inter-Node Redundancy (Active Standby)
There are two possible drawbacks with the system in Figure 3. The first is the fact that the app and web servers are virtualized environments on a single physical box. If the box were to crash, the entire system would go offline. The other potential drawback is the fact that there might be contention for CPU and memory resources on the machine. If this is a concern in your environment, it is recommended to split the redundant nodes to run on multiple physical boxes as shown in Figure 4 below. This type of setup avoids the possibility of downtime in the case of a hardware failure. A variation of this theme can be accomplished for creating a georedundant deployment. Larger service providers may require georedundant nodes to handle failure scenarios when an entire physical location is impacted such as in the case of a large power outage. In such a scenario, the second node can be placed at a different location.
Multi-Node Redundancy with Load Balancing (Active Active)
Multiple servers can also be set up to implement a load balanced system where web traffic is split across multiple systems. This is typically done by adding a load balancer between the client and the app servers in order to distribute application traffic across multiple hosts. Load balancing on EngageIP is session based. Load balancers improve the overall performance of the system by decreasing the burden on the individual web and app servers. A load balancer continuously monitors the back end servers and redirects the traffic to the server with the highest availability. They ensure reliability and availability by only sending requests to servers that can respond in a timely manner.
There are commercially available load balancers that can be deployed. LogiSense does not make recommendations on specific software or hardware for load balancers as these are typically agnostic to the EngageIP implementation. The LogiSense solutions department can be consulted to provide guidance and make recommendations on load balancer options.
While load balancing helps reduce contention across each server instance, it will not address contention within a server itself. While most EngageIP services have relatively low overhead, the billing and rating service can be performance intensive depending on the data processing needs of the deployment. While the rating service can typically handle millions of UDR’s per day, there is a danger of starving the other services and creating bottlenecks on the app server with this level of processing. If this is a concern, the rating service can be split onto its own Virtual Machine that may nor may not be on a separate physical box. This allows CPU cores to be dedicated to the rating engine. The figure below illustrates that scenario.
Backup and Restore
Backups can be initiated when the system is online. It should be noted however that automatic transaction log backup on the database is not supported. Transaction log backups can be performed via a timer based script. They can be scheduled in intervals of 15 min. It is also recommended that backups be initiated on a dedicated LAN so that backup traffic can be redirected to a separate network.
Full backups can be scheduled during off-peak hours (for example nightly) in order to prevent performance impact during working hours when the backup procedure is initiated. However, backups done once per day may also mean that the system could experience up to 24 hours of data loss.
Horizontal Scaling of the Rating Engine
The EngageIP Rating Engine is architected to scale horizontally to handle high throughput requirements. At a high level this can be accomplished via a horizontal scaling approach, with load distributed across an elastically scalable number of rating nodes. Multiple records are processed and aggregation when needed can be performed on the data volume and occurrence counts. For example, if a device generates N records per day with Y kb per record, scaling can be accomplished by reducing the N records into 1 record that has a total count of N and NxY kb. The benefits of this approach arise when the aggregation logic is performed on separate processing units from the rating logic.
There are various mechanisms for accomplishing this. One method uses aggregation into an intermediate database which is flushed at a pre-configured interval. The reduced data is guided into the EngageIP Rating Engine and processed accordingly.
EngageIP Web Service
The EngageIP Web Service uses Internet Information Services (IIS) for Windows Server. IIS has a scalable and open architecture that is ready to handle the most demanding tasks. The web service exposes an API layer which is leveraged by the EngageIP Admin Portal and 3rd party portals (e.g. a self service portal). Configuration changes that are initiated through these portals are passed in through the API layer and into the web service DLLs. Further processing is done on the web service. The web service frequently accesses the database to read or write configuration data.
While IIS is designed to be scalable it is important to keep the following considerations in mind prior to performance optimization.
IIS should be hosted on a dedicated virtual machine, this will ensure it gets the dedicated resources it needs. Failure to do this would result in both the app and web service competing for the same resources (high load transactions on the app service would starve the web service and vice versa)
Consideration should be given to increasing the number of CPU cores dedicated to the web server virtual machine to scale performance
Be aware that contention may occur due to database operations. If a significant number of transactions are batched to the database, you might get bottlenecks while waiting for database operations to complete. This can be somewhat mitigated on the web service side by avoiding bulk calls and updates where applicable but in many cases contention might be unavoidable. For these cases, please refer to the proceeding section on database service optimization for additional details.
EngageIP SQL Database
EngageIP is designed to work with Microsoft SQL Server 2012 and 2014 standard and enterprise editions. It is strongly recommended to have the database installed on separate hardware from the app and web servers.
SQL Server Version
A typical deployment leverages the standard edition. EngageIP currently supports Microsoft SQL Standard and Enterprise edition 2012 and 2014.
For the 2014 (standard and enterprise) editions, compatibility mode 120 is the recommended edition for newer deployments. Larger deployments should strongly consider using enterprise edition in order to provide an increased level of vertical scaling. Please consult your solutions consultant for details on which edition of SQL Server you should be using. Service providers that run standard edition SQL when they should be running enterprise edition can expect significant performance bottlenecks on the database end. Microsoft provides a useful matrix illustrating the differences between versions which can be accessed at the following link.
Enterprise Edition 2014 contains everything included in the Standard Edition including many additional features that are required for large deployments. For example, as in Standard Edition, Enterprise supports databases of up to 524 petabytes. However, Enterprise takes scalability and performance a step further by supporting the maximum amount of RAM and number of core processors that the host system offers. While standard edition is limited to 4 sockets or 16 cores, Enterprise edition can scale out to the max number of cores supported by the operating system. Likewise, standard edition is limited to 128 GB of memory per database engine whereas enterprise edition supports the OS maximum. As a result, Enterprise Edition provides an increased level of vertical scaling than standard edition which may be of interest to larger customers.
The Enterprise Edition includes such scalability features as table and index partitioning, data compression, partition table parallelism and online indexing. An admin can create database snapshots, always on availability groups, multi subnet clusters and mirrored backups. Online indexing is one Enterprise feature in particular that can provide significant performance improvements to EngageIP. When tables get large, fragmentation occurs and SQL server requires index rebuilds. The ONLINE option provided with Enterprise edition allows concurrent user access to the underlying table or clustered index data and any associated non-clustered indexes during these index operations. For example, while a clustered index is being rebuilt by one user, that user and others can continue to update and query the underlying data. When you perform data definition language (DDL) operations offline, such as building or rebuilding a clustered index; these operations hold exclusive locks on the underlying data and associated indexes. This prevents modifications and queries to the underlying data until the index operation is complete.
Irrespective of whether you are using standard or enterprise edition, database configuration is an important aspect of ensuring that the database is fully performant. This requires the expertise of a skilled Database Administrator. For on-prem customers, LogiSense performs routine database maintenance on installation and on an ongoing basis. These include tuning values like min and max server memory, max degree of parallelism and parallelism cost thresholds. Furthermore, statistics may need to be regenerated and indexes regenerated on a continued basis. LogiSense can provide best practices for database configuration and maintenance. Please talk to your LogiSense representative for more details on this.