RE: [LINUX:815] Re: Yedek sunucu

umite@migros.com.tr
Tue, 15 Sep 1998 12:57:43 +0300


This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------ =_NextPart_000_01BDE090.6E77EEA0
Content-Type: text/plain;
charset="iso-8859-9"
Content-Transfer-Encoding: quoted-printable

Merhaba birkac gundur sehir disinda oldugumdan simdi yazabildim

kusura bakmayin.

> > merhaba
> >
> > > :) :)
> > > Zaten bende bunu nasil yaparim diye sormustum.
> > >
> > Neden kendinizi Linux ile sinirliyorsunuz
> > bu isi NT veya Novell 5.xx kolayca yapabilir
>
> acaba =F6nerdi=F0iniz nt ya da novell de "add a new machine" gibi bir
> se=E7enek
> falan m=FD var konfigurasyon yaparken.
Tam olarak o komut olmasada asagidaki linkten takip edip
nasil yapilacagini adim adim gorebilirsiniz.

http://www.microsoft.com/ntserverenterprise/mscsdemo/

Elimdeki dokumanlardanda FAQ ' i ekte gonderiyorum belki
isinize yarar.

Ayrica bknz.
=09
http://www.microsoft.com/ntserverenterprise/basics/features/clustering/c

lustarchit.asp

> adam back up nas=FDl al=FDr=FDm falan diye sormuyo. her duruma
kar=FE=FD
> yedek bir makina sistemde nas=FDl olur diye soruyo. nas=FDl (=F6rnek
olarak
> sunucu makinada her ihtimale kar=FE=FD yedek bir cpu bulunuyor ve
ar=FDza
> durumunda devreye giriyor ya da raid ile disk ar=FDzas=FD durumunda
yans=FD
> diski kullan=FDl=FDyorsa) makina b=FCt=FCn=FCyle giderse ne olacak
sorusuna
> =E7=F6z=FCm
> bulmaya =E7al=FD=FE=FDyor.
Aciklamaniz icin tesekkur edrim bende bu sekilde yorumlamistim.
Novvel 4.0 dan bu yana yedek makina diye tabir ettiginiz
isi
yerine getiriyor.

> teoride olsa computer science and information
> technology(prentice-hall) kitab=FDnda giri=FE derecesinde baz=FD
bilgiler
> var
> ama deneyimli arkada=FElar=FDn pratikte bunu linux ta nas=FDl
yapar=FDz'a
> =E7=F6z=FCm
> bulmalar=FD gerekiyor.
>
> =E7a=F0lar,
>
> >
> Islerinizde basar'lar
> <<clusterfaq>>

------ =_NextPart_000_01BDE090.6E77EEA0
Content-Type: text/html;
name="clusterfaq.htm"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
filename="clusterfaq.htm"
Content-Description: clusterfaq

<!DOCTYPE HTML PUBLIC "-//W3C//DTD W3 HTML//EN">
Clustering FAQ

 

Frequently Asked Questions About Microsoft Cluster Server

 

Topic Index

Cluster Basics

Intro to Microsoft Cluster Server

High Availability

Manageability

Scalability

Application & Service Support

Hardware Validation

Servers

Storage

Interconnect

Networking

Software Licensing

Deployment

Troubleshooting

Developer Issues

Comparisons to Other Products

 

Cluster Basics

What is a server "cluster"?

A group of independent servers managed as a single system for higher availability, easier manageability, and greater scalability.

What does it take to create a server cluster?

The minimum requirements for a server cluster are (a) two servers connected by a network, (b) a method for each server to access the other's disk data, and (c) special cluster software like Microsoft Cluster Server (MSCS). The special software provides services such as failure detection, recovery, and the ability to manage the servers as a single system.

What are the benefits of server clustering?

There are three primary benefits to server clustering: improved availability, easier manageability, and more cost-effective scalability. Using Microsoft Cluster Server (MSCS) as an example:

    • Availability: MSCS can automatically detect the failure of an application or server, and quickly restart it on a surviving server. Users only experience a momentary pause in service.
    • Manageability: MSCS lets administrators quickly inspect the status of all cluster resources, and easily move workload around onto different servers within the cluster. This is useful for manual load balancing, and to perform "rolling updates" on the servers without taking important data and applications offline.
    • Scalability: "Cluster-aware" applications can use the MSCS services via its Application Programming Interface (API) to do dynamic load-balancing and scale across multiple servers within a cluster.

What are clusters used for?

Customer surveys indicate that MSCS clusters will be used as highly available multipurpose platforms, mirroring the current uses of Windows NT Server. Surveyed customers suggested that the most common uses of MSCS clusters will be mission critical database management, file/intranet data sharing, messaging, and general business applications.

When a cluster is recovering from a server failure, how does the surviving server get access to the failed server's disk data?

There are basically three techniques that clusters use to make disk data available to more than one server:

    • Shared disks: The earliest server clusters permitted every server to access every disk. This originally required expensive cabling and switches, plus specialized software and applications. (The specialized software that mediates access to shared disks is generally called a Distributed Lock Manager, or DLM.) Today, standards like SCSI have eliminated the requirement for expensive cabling and switches. However, shared-disk clustering still requires specially modified applications. This means it is not broadly useful for the wide variety of applications deployed on the millions of servers sold each year. Shared -disk clustering also has inherent limits on scalability since DLM contention grows geometrically as you add servers to the cluster. Examples of shared-disk clustering solutions include Digital® VAX® Clusters, and Oracle® Parallel Server®.
    • Mirrored disks: A more flexible alternative is to let each server have its own disks, and to run software that "mirrors" every write from one server to a copy of the data on at least one other server. This is a great technique for keeping data at a disaster recovery site in synch with a primary server. There are a large number of disk mirroring solutions available today; examples for the Windows NT Server environment are available from Network Specialists (NSI®), Octopus®, Veritas®, and Vinca®. Many of these mirroring vendors also offer cluster-like high availability extensions that can switch workload over to a different server using a mirrored copy of data. However, mirrored-disk failover solutions cannot deliver the scalability benefits of clusters. It is also arguable that they can never deliver as high a level of availability and manageability as shared-disk clustering since there is always a finite amount of time during the mirroring operation in which the data at both servers is not 100% identical.
    • "Shared nothing": In response to the limitations of shared-disk clustering, modern cluster solutions employ a "shared nothing" architecture in which each server owns its own disk resources (that is, they share "nothing" at any point in time.) In the event of a server failure, a shared-nothing cluster has software that can transfer ownership of a disk from one server to another. This provides the same high level of availability as shared-disk clusters, and potentially higher scalability since it does not have the inherent bottleneck of a DLM. Best of all, it works with standard applications since there's no special disk access requirements. Examples of shared-nothing clustering solutions include Tandem® NonStop®, Informix® Online/XPS®, and Microsoft Cluster Server.

 

Back to Topic Index

 

Intro to Microsoft Cluster Server

What is "Wolfpack"?

"Wolfpack" was the code name for Microsoft Cluster Server.

What is Microsoft Cluster Server (MSCS)?

MSCS is a built-in feature of Windows NT Server, Enterprise Edition. It is software that supports the connection of two servers into a "cluster" for higher availability and easier manageability of data and applications. MSCS can automatically detect and recover from server or application failures. It can be used to move server workload to balance utilization and to provide for planned maintenance without downtime. And, over time, MSCS will also become a platform for highly scalable, cluster aware applications.

How many servers can be in an MSCS cluster?

The initial release of MSCS will be supported on clusters with 2 servers. A future version referred to as MSCS "Phase 2" will support larger clusters, and will include enhanced services to simplify the creation of highly scalable, cluster aware applications.

When will MSCS be available?

MSCS has been in development since 1995, and has been in beta test since December of 1996. The initial release of MSCS will be in version 4.0 of Microsoft Windows NT Server, Enterprise Edition. This product is currently in beta test and will be released once customers indicate that it's ready. Microsoft currently expects that to happen this summer. The "Phase 2" version of MSCS -- enhanced to support larger clusters and highly scalable applications -- is expected to enter beta test in 1998.

What other companies were involved in the development of MSCS?

Microsoft worked closely with leading hardware vendors, software vendors, and customers in the specification and development of MSCS and its Application Programming Interface (API). These other companies participated via five different programs:

    • Strategic alliances: Microsoft formed strategic alliances with two of the key pioneers in clustering technology: Digital Equipment Corporation® (in 1995) and Tandem Computers® (in 1996). In both of these alliances, patent portfolios were cross-licensed, and Microsoft gained access to proven clustering expertise and technology, plus a strong partner committed to helping extend that technology to benefit customers of Windows NT Server.
    • Early Adopter vendors: Starting with the announcement of the MSCS project in October 1995 and extending through the beta test program, Microsoft worked closely with six leading system vendors who provided support, expertise, and sample cluster configurations to support the development of MSCS. The Early Adopter system vendors were Compaq Computer Corporation®, Digital Equipment, Hewlett-Packard®, IBM®, NCR®, and Tandem Computers.
    • Open Process: Whenever Microsoft extends the Win32 API, as it did with MSCS, it enlists the participation of vendors and customers in its "Open Process". This is a series of confidential design previews and specification reviews which assures the resulting API is robust, complete, and usable by a broad segment of the industry. Over 60 organizations participated in the MSCS Open Process sessions, which took place between January and July of 1996.
    • SDK previews: Microsoft first provided early copies of the MSCS Software Development Kit to the 60+ Open Process organizations in September of 1996, and distributed a more advanced preview SDK to over 2,000 developers at the November 1996 Microsoft Professional Developers Conference.
    • Beta test program: MSCS Beta 1 was shipped in December 1996 to 350 customer and vendor sites. Beta 2 shipped in April 1997 to over 750 sites. And Beta 3 of MSCS was shipped as an embedded feature of Windows NT Server, Enterprise Edition 4.0 Beta 2 in July 1997 to over 2,100 sites. Each of these betas was also available to thousands of additional developers and customers via Microsoft Developers Network (MSDN) Level III.

In what languages will MSCS be available?

Microsoft Windows NT Server, Enterprise Edition 4.0, included MSCS 1.0, will be offered in English, French, German, Japanese, and Spanish.

Through what channels will Windows NT Server, Enterprise Edition be available?

Microsoft Windows NT Server/E will be available to customers through all standard channels: resellers, retail, OEM, and the Microsoft Select licensing program.

What versions of Windows NT Server will MSCS support?

MSCS software will only be available as a built-in feature of Windows NT Server, Enterprise Edition.

Will MSCS be extended beyond Windows NT Server to Windows NT Workstation?

There is currently no plan to extend cluster support to Windows NT Workstation. MSCS software has been designed and written to closely integrate with the architecture and features of Windows NT Server, including its server-oriented networking and directory services capabilities.

What clients can connect to an MSCS cluster?

Any client that can connect to Windows NT Server via TCP/IP will work with MSCS. This includes MS-DOS®, Windows 3.x, Windows 95, Windows NT, Apple® Macintosh®, and UNIX®. MSCS does not require any special software on the client for transparent recovery of services that connect to clients via standard IP protocols.

 Back to Topic Index

 

High Availability

How does MSCS provide high availability?

MSCS uses software "heartbeats" to detect failed applications or servers. In the event of a server failure, it employs a "shared nothing" clustering architecture that automatically transfers ownership of resources (such as disk drives and IP addresses) from a failed server to a surviving server. It then re-starts the failed server's workload on the surviving server. All of this -- from detection to re-start -- typically takes under a minute. If an individual application fails (but the server does not), MSCS will typically try to re-start the application on the same server; if that fails, it moves the application's resources and re-starts it on the other server. The cluster administrator can use a graphical console to set various recovery policies such as dependencies between applications, whether or not to re-start an application on the same server, and whether or not to automatically "failback" (re-balance) workloads when a failed server comes back online.

Can MSCS provide "zero downtime"?

No. MSCS can dramatically reduce planned and unplanned downtime. However, even with MSCS, a server could still experience downtime from the following events:

    • MSCS failover time: If MSCS recovers from a server or application failure, or if it is used to move applications from one server to another, the application(s) will be unavailable for a non-zero period of time (typically under a minute.)
    • Failures which MSCS can't recover: … such as loss of a disk not protected by RAID, loss of power when a UPS isn't used, or loss of a site when there's no fast-recovery disaster recovery plan. In other words, there are types of failure that MSCS does not protect against, but most of these can be survived with minimal downtime if precautions are taken in advance.
    • Server maintenance that requires downtime: MSCS can keep applications and data online through for many types of server maintenance, but not all. For example: completely upgrading both servers in a cluster, or installing a new version of an application which has a new on-disk data format that requires reformatting pre-existing data.

Microsoft recommends that clusters be used as one element in customers' overall program to provide high integrity and high availability for their mission-critical server-based data and applications.

Is MSCS failover transparent to users?

MSCS does not require any special software on client computers, so the user experience during failover depends on the nature of the client side of their client-server application. Client reconnection is often transparent, because MSCS has restarted the applications, file shares, etc. at exactly the same IP address.

If a client is using "state-less" connections such as a standard browser connection, then they would be unaware of a failover if it occurred between server requests. If a failure occurs while a client is connected to the failed resource, then the client will receive whatever standard notification is provided by the client side of their application when the server side becomes unavailable. This might be, for example, the standard "Abort, Retry, or Cancel?" prompt you get when using the Windows Explorer to download a file at the time a server or network goes down. In this case, client reconnection is not automatic (the user must choose "Retry"), but the user is fully informed of what's happening and has a simple, well-understood method of re-establishing contact with the server. Of course, in the meantime, MSCS is busily re-starting the service or application so that, when the user chooses "Retry", it re-appears as if it never went away.

For client-side applications which have "state-full" connections to the server, a new logon is typically required following a server failure. In many cases, this approach is required for security purposes. For example, this is how SAP R/3 works -- if the server connection is lost, the user is prompted to logon again to make sure it's the same user accessing the application.

Even with state-full connections, it's possible for an application to automatically re-connect following a failover. For example, when Microsoft demonstrated SAP R/3 failover at Microsoft Scalability Day in New York City on May 20, it was accessed via an Active browser application that had automatically (and securely) cached the user's ID and password from the initial logon. Thus, when the server connection was momentarily lost during the failover demo, the client application automatically logged on again using the cached ID and password. This was done using standard IP connections, running a simple Visual Basic program within an HTML document via the Microsoft ActiveX technology.

When a server comes back online following a failure, is there any human intervention required to get it back "up and running", or is the heartbeat enough for the other server to include it once again?

No manual intervention is required. When a server running Microsoft Cluster Server, say "Server A", boots, it starts the MSCS service automatically. MSCS in turn checks the interconnect (and network if necessary) to find the other server in its cluster, say "Server B". If Server A finds Server B, then Server A re-joins the cluster and Server B updates it with current cluster status info. Server A then initiates "failback", moving back failed-over workload from Server B to Server A at an appropriate time.

What is "failback", and how does it work in MSCS?

"Failback" is the ability to automatically re-balance the workload in a cluster when a failed server comes back online. This is a standard feature of MSCS. For example, say "Server A" has crashed and its workload failed-over to "Server B". When Server A re-boots, it automatically finds Server B and re-joins the cluster. It then checks to see if any of the cluster groups running on Server B would "prefer" to be running on Server A. If so, it automatically moves those groups from Server B to Server A as soon as the time is right. Failback properties -- that is, which groups can failback, which is their preferred server, and during what hours the time is "right" for failback -- are all set from the cluster administration console.

Can the servers in an MSCS cluster be located at separate locations for recovery from site disasters?

Not at this time. All of the cluster configurations currently being considered for validation use SCSI connections to storage resources, which limits the distance between clustered servers to the distance supported by standard SCSI. This is typically no more than 25 meters, though there are SCSI extender technologies that can potentially stretch the connection up to 1,000 meters.

Note that Windows NT Server customers already have several choices for software that can mirror data to remote disaster recovery sites, including solutions from N.S.I.®, Octopus®, Veritas®, and Vinca®. Most of these vendors have already announced that their disaster site mirroring solutions will also work with MSCS clusters.

Can MSCS restore registry keys for an application from one server to the other when doing failover?

Yes. Recovery of an application's registry information is a configurable feature that is available to the Generic Application and Generic Service resource types. Basically, you tell it what registry keys to log and recover, and that’s all there is to it. This capability should be used if the application or service stores volatile information in specific registry keys. If this is done, when the resource comes online on another node, it will have the same registry information as the previously online resource.

When an application re-starts on another server following a failure, does it re-start from a copy of the application?

No. The new server (say, "Server 2") would start the application from the same physical disks as Server 1, since ownership of the application's disks on the shared SCSI bus had been moved from Server 1 to Server 2 as one of the first steps in the failover process. This approach assures that the application always re-starts from its last known state, as recorded on its disk drives (and, if you use the available option, as recorded in its registry keys.)

Can MSCS restore an application's "state" at the time of its failure rather than requiring a complete restart?

MSCS can restore the state of an application's registry keys, but any other state information must be managed and restored by the application. Applications need to provide some model for persistence to insure that state can be recaptured. For example, Microsoft SQL Server uses transaction logs to provide this assurance. If a server running Microsoft SQL Server crashes, upon restart the application uses its transaction logs to bring the database back to a known state. With a cluster, just as with a single server, good application design and the use of ACID (Atomic, Consistent, Isolated, and Durable) transaction properties are important.

What is the granularity of resource failover?

MSCS supports failover of "virtual servers", which usually correspond to applications, web sites, print queues, or file shares (including their disk spindles, files, IP addresses, etc.). MSCS also provides cluster-wide services that are simultaneously available on all servers in the cluster, including cluster administration, performance monitoring, event viewing, a cluster name, and cluster time synchronization.

What is a "quorum disk" and how does it help MSCS provide high availability?

It's a disk spindle that MSCS uses to determine whether or not another server is up or down. Technically, it's a resource which can only be owned by one server at a time, and for which servers can negotiate for ownership. Negotiating for the quorum drive allows MSCS to avoid "split brain" situations where both servers are active and think the other server is down. (This can happen when, for example, the cluster interconnect is lost and network response time is problematic.) The use of a quorum resource is one of the sophisticated algorithms that Microsoft got by working with pioneers in clustering such as Digital and Tandem.

 Back to Topic Index

 

Manageability

How does MSCS improve the manageability of servers?

MSCS gives administrators a graphical console from which they can monitor and manage all of the resources in a cluster as if it was a single system. Using the familiar standards of a Microsoft Windows® graphical user interface, an administrator can use the cluster console to:

    • audit the status of all servers and applications in the cluster
    • setup new applications, file shares, print queues, etc. for high availability
    • administer the recovery policies for applications and resources
    • take applications offline, bring them back online, and move them from one server to another.

The ability to graphically move workload from one server to another with only a momentary pause in service (typically less than a minute) means administrators can easily unload servers for planned maintenance without taking important data and applications offline for long periods of time.

Does MSCS provide administrators with a "single system image"?

Yes. MSCS provides administrators a single graphical console to manage all of the applications and resources in a cluster. The MSCS console presents cluster resources by physical server, and by "virtual server" (or "cluster group".) This allows administrators to centrally manage the cluster as a collection of virtual application-oriented servers, or as a collection of physical resources when appropriate.

Can MSCS be remotely managed?

Yes. An authorized user can run the MSCS administration console from any Windows NT Workstation or Windows NT Server on the network. In the version of MSCS accompanying Windows NT Server, Enterprise Edition 5.0, the cluster administration console will be a "snap-in" to the Microsoft Management Console, providing scriptable, remoteable access, including access via Internet protocols from a browser.

How does MSCS help administrators do "rolling upgrades" of their servers?

With MSCS, server administrators no longer have to do all their maintenance within those rare windows of opportunity when no users are online. Instead, they can simply wait until a convenient off-peak time when one of the servers in the cluster has enough horsepower for all of the cluster workload. They then point-and-click to move all the workload onto one server, and they're ready to perform maintenance on the unloaded server. Once the maintenance is complete and tested, they bring that server back online and it automatically re-joins the cluster, ready for work. When convenient, the administrator repeats the process to perform maintenance on the other server in the cluster. This ability to keep applications and data online while performing server maintenance is often referred to as doing "rolling upgrades" to your servers.

Will Microsoft support "rolling upgrades" of future server products using MSCS clusters?

It is Microsoft's goal to support "rolling upgrades" between releases of Microsoft server software using MSCS clusters. However, we cannot commit to this for all releases of all products. Persistent storage formats must occasionally change to accommodate new capabilities, and changes in persistent storage occasionally require applications to be taken offline while storage or indices are restructured. Microsoft will commit to always provide smooth upgrades between releases of all our products, and we'll use MSCS to provide seamless rolling upgrades whenever possible.

 Back to Topic Index

 

Scalability

How will MSCS enhance server scalability?

The manageability benefits of the initial version of MSCS will simplify many of the processes currently used to improve scalability, such as upgrading server hardware and installing new versions of applications.

A future version of MSCS, "Phase 2", will support clusters containing large numbers of servers, and will provide enhanced abilities that simplify the creation of highly scalable, cluster aware applications.

The Microsoft cluster strategy whitepaper said MSCS is already architected for multiple nodes. Has MSCS been tested on multi-node clusters? If so, why is Microsoft waiting to deliver multi-node support?

Yes, Microsoft and other vendors have tested MSCS clusters with more than 2 servers. These clusters "work" in that they are stable and the administrator's console provides basic management for the multi-server environment. However, the algorithms and features in the current software must be extended and thoroughly tested on larger clusters before customers can reliably use a multi-node MSCS cluster for production work, or gain enhanced cluster benefits. In addition, Microsoft will have to extend the cluster hardware validation procedures to accommodate the additional requirements of multi-node clusters.

Microsoft has architected MSCS for multi-node support in preparation for the coming "Phase 2" version. Today's multi-node tests have proven the architecture is correct. However, there are two key reasons Microsoft is limiting the initial release to 2-server clusters:

    1. Customers survey show that 80% of the demand for clusters is to improve the availability of mission-critical data and applications. Two-server clusters satisfy this overwhelming customer requirement. Focusing on this customer requirement allowed Microsoft to focus its efforts, and the efforts of other vendors, on delivering very high quality high availability clustering solutions in the initial release.
    2. One of the key requirements for developing scalable, cluster-aware applications is a globally accessible, programmable naming service that clients use to locate cluster resources. The enhanced Directory Services of Windows NT Server 5.0 will be an excellent cluster naming service, so it was decided to develop MSCS "Phase 2" support for large, scalable clusters using the Active Directory of Windows NT Server 5.0.

How will MSCS help do load balancing?

"Load balancing" is the ability to move work from a very busy server to a less-busy server. MSCS will support load balancing in four ways over time:

    1. Manual load balancing: With the initial release of MSCS, the person administering a cluster will be able to use the cluster console to point-and-click whole cluster groups (i.e. related applications and resources) from a loaded server to a less-loaded server. They can easily determine when server loads justify load balancing using the built-in Performance Monitor of Windows NT Server.
    2. Automatic cluster group load balancing: A future release of MSCS will allow administrators to specify performance-related failover policies for cluster groups, using the graphical cluster administration console. This would be similar, for example, to the way "fail back" policies are set in the initial release of MSCS. The administrator will go to a "Load Balancing" tab in the Properties window for a cluster group, and use point-and-click and fill-in-the-blank actions to specify which values of which Performance Monitor counters should trigger load-balancing failover.
    3. Automatic workload balancing in "cluster aware" applications: Over time, some software vendors will use the evolving services of MSCS to create a new generation of cluster-aware applications which automatically spread their workload over multiple servers in a cluster to achieve higher scalability. Examples that have already been publicly discussed include future versions of Microsoft SQL Server, Oracle Parallel Server, and Tandem NonStop SQL/MX.
    4. Automatic transaction load balancing via Microsoft Transaction Server: Today's Microsoft Transaction Server provides multi-threading as a "free" service that automatically improves the scalability of component-based applications running on single servers. A future release of Microsoft Transaction Server will similarly provide automatic distribution of transaction processing loads across the servers in a cluster as a "free" service. This will be the easiest way for corporate developers and application vendors to achieve cluster-enhanced scalability.

Should cluster aware applications developed for MSCS use a shared-disk or shared-nothing architecture for greatest scalability?

Microsoft recommends a shared-nothing architecture for cluster-aware applications because of its greater scalability potential. With shared-disk applications, copies of the application running on two or more servers in the cluster share concurrent read/write access to a single set of disk files, mediating ownership of the files using a "distributed lock manager" (DLM). A shared-nothing application, on the other hand, avoids the potential bottleneck of shared resources and a DLM by partitioning or replicating the data so that each server in the cluster works primarily with its own data and disk resources. In theory, MSCS can support either type of application. However, Microsoft has no plans at this time to include a DLM in the MSCS cluster services, so vendors would have to develop or license a DLM to implement a shared-disk application on MSCS. Microsoft has chosen to use the shared-nothing architecture for future versions of the BackOffice applications because of that architecture’s greater potential for cluster-enabled scalability.

Will MSCS ever have a Distributed Lock Manager (DLM)?

Microsoft will not include a distributed lock manager in the first release of MSCS. Enhancements in future releases will be determined based on customer requirements.

When will Microsoft offer a parallel version of Microsoft SQL Server that runs on multiple servers at the same time for automatic load balancing and scalability?

The next major release after Microsoft SQL Server 7.0 is planned to offer cluster-enabled scalability on MSCS clusters. It will use a scalable "shared nothing" architecture to spread a single database across multiple servers. A whitepaper on the strategy for Microsoft SQL Server on clusters can be downloaded from http://microsoft.com/sql/WhitePapers.htm. Although this is an important direction for Microsoft SQL Server, it must be kept in perspective: it will only be needed by a small percent of customers. Cluster-enabled scalability will only be needed by extremely large enterprise applications which are (a) too large to run on a single high-end SMP server (e.g., 8-processor SMP with 4GB of RAM), and (b) cannot be partitioned to run on a distributed network using Microsoft Transaction Server.

What are Microsoft's plans for supporting Distributed Message Passing (DMP)?

Distributed Message Passing is one of the intracluster communications techniques that are planned for Phase 2 of MSCS. (Another is I/O shipping.) Applications will be able to access MSCS DMP services through extensions to the Cluster API. MSCS in turn will host the DMP services over a variety of interconnect technologies including new low-latency drivers based on the Virtual Interface (VI) Architecture. The result will be a standard infrastructure for supporting a new generation of scalable, cluster-aware applications.

 Back to Topic Index

 

Application & Service Support

What types of applications and services will benefit from MSCS clustering?

There are 3 types of server applications that will benefit from MSCS clusters:

    • "In the box" services of Windows NT Server, Enterprise Edition: File shares, print queues, Internet/intranet sites managed by Microsoft Internet Information Server, Microsoft Message Queue Server services, and Microsoft Transaction Server services.
    • Generic applications: MSCS includes a point-and-click wizard for setting up any well-behaved server application for basic error detection, automatic recovery, and operator-initiated management (e.g., move from one server to the other.) A "well behaved" server application is one which keeps a recoverable state on shared SCSI disk(s), and whose client can gracefully handle a pause in service of up to a minute as the application is automatically re-started by MSCS.
    • Cluster-aware applications: Software vendors will test and support their application products on MSCS. Over time, vendors will provide MSCS-based enhancements, from simpler setup and faster failover, to cluster-enabled scalability and load balancing.

What software vendors will offer cluster-aware applications for MSCS?

Software vendors that have already announced plans to offer products for MSCS clusters include Baan, Cheyenne, Computer Associates (CA/Unicenter TNG), HP (ClusterView), IBM (DB2), NetIQ, Octopus, Oracle (Oracle 7 Failsafe), SAP, Vinca, and, of course, Microsoft (Microsoft SQL Server, Enterprise Edition, and Exchange Server, Enterprise Edition.) For an up-to-date list of announced products that support MSCS, refer to the Microsoft Windows NT Server, Enterprise Edition Solutions Directory at http://microsoft.com/ntserver/info/ntsedir.htm.

Will Microsoft validate or logo software products that work with MSCS?

Microsoft will not have a validation program for MSCS-based software products at first. It is expected that once MSCS clusters are deployed in volume and there are sufficient examples of cluster-aware application products to evaluate, Microsoft will extend its Microsoft BackOffice logo program to include, at a minimum, validation of support for basic failover operation on an MSCS cluster.

What are Microsoft's plans for supporting Microsoft SQL Server on MSCS clusters?

Microsoft will support the Enterprise Edition of Microsoft SQL Server on MSCS clusters. Microsoft SQL Server, Enterprise Edition version 6.5 will provide "active/active" cluster support in the second half of 1997 (i.e., both servers can be running SQL Server, with each server supporting its own databases). Microsoft SQL Server 7.0, currently in beta test, will include additional cluster-aware enhancements that provide for faster recovery in the event of a server or application failure. The version of Microsoft SQL Server that follows Release 7.0 will include new features for shared-nothing scalability on MSCS clusters (i.e., a single database will be able to span multiple servers).

What are Microsoft's plans for supporting Microsoft Exchange Server on MSCS clusters?

Microsoft will support the Enterprise Edition of Microsoft Exchange Server on MSCS clusters. The "Osmium" release of Exchange Server, Enterprise Edition will provide "active/passive" failover on an MSCS cluster. This means Exchange/E "Osmium" will be able to run on one server in the cluster at a time, and MSCS will be able to automatically re-start Exchange on the other server following an application or server failure. Future versions of Exchange will be enhanced for active/active failover (i.e., ability to run Exchange simultaneously on both servers.)

Can the standard versions of Microsoft SQL Server 6.5 or Exchange Server 5.0 be setup for failover on a cluster using the "generic application" capability of MSCS?

Technically proficient customers who want to test Microsoft SQL Server 6.5 or Exchange Server 5.0 on a cluster may do so using the generic application capability of MSCS. However, the setup can be complex, and will not be supported by Microsoft support services. Therefore, customers should only do so for testing purposes, not for production deployments. Microsoft SQL Server, Enterprise Edition version 6.5, and the "Osmium" release of Exchange Server, Enterprise Edition will feature a simplified cluster setup procedure, and will be fully supported for failover on MSCS clusters.

Will Microsoft SNA Server benefit from MSCS?

No, because Microsoft SNA Server already provides a hot failover capability independent of MSCS.

Will Microsoft Proxy Server benefit from MSCS?

No, because the current version of Microsoft Proxy Server has its own capability for chaining together multiple servers for high availability and scalability.

Will Microsoft Systems Management Server benefit from MSCS?

No, MSCS will not provide high availability for the current release of Microsoft Systems Management Server. Microsoft intends to provide cluster-enabled high availability for Systems Management Server in a future release.

Can MSCS failover an NT Server Directory (Domain) Controller?

No, because it is already possible to have backup directory service controllers for high availability. Servers in an MSCS cluster may be either primary or backup directory controllers for Windows NT Directory Services.

Can MSCS failover a WINS (Windows Internet Name Service) server?

No, because it is already possible to have backup WINS servers for high availability.

Can MSCS failover Remote Access Services (RAS)?

Remote Access Services cannot benefit from MSCS at this time since there is no standard method for doing software failover of modem connections. For higher reliability of dial-up connections, you can use the RAS Multi-Link capability first introduced in Windows NT Server 4.0.

Can MSCS failover Microsoft Distributed File System (Dfs) directories?

Not in Windows NT Server, Enterprise Edition 4.0. The version of Dfs in Windows NT Server 5.0 will provide directory replication for fault tolerance. When used on the Enterprise Edition of Windows NT Server 5.0, Dfs will also work with MSCS failover for fast recovery from server crashes.

What versions of Oracle will benefit from MSCS clusters?

Oracle has announced that Oracle Failsafe 2.0 will be available at no extra cost with Oracle 7 databases. It provides "active/passive" database failover on MSCS clusters (i.e., can run on one server at a time, and failover to the other server in the event of an application or server failure). For more information, refer to http://www.oracle.com/NT/solution/clusters/index.html.

 

Does Tandem NonStop SQL/MX use MSCS?

Tandem NonStop SQL/MX uses MSCS clustering services when running on a two-server cluster. NonStop SQL/MX uses its own single-application clustering services when running on a cluster with more than two servers. Customers who want high availability plus database scalability up to the performance provided by two high-end SMP servers, will benefit by running NonStop SQL/MX on MSCS to gain the additional benefits of high availability for other services and applications on the cluster. Customers who require additional scalability would use the built-in single-application cluster services of NonStop SQL/MX, trading off general availability services for the ability to scale on more than two servers.

 Back to Topic Index

 

Hardware Validation

How is MSCS cluster hardware validated?

Complete cluster configurations (i.e., 2 servers, a storage solution, and an interconnect) are tested and validated using an MSCS Cluster Hardware Compatibility Test that will be available for download from the Microsoft web site when MSCS releases. Anyone with an appropriate lab setup can run the test. The test procedure takes at least 2 weeks, and one-half of a full-time-equivalent Microsoft Certified Professional. The result of a successful test is an encrypted file that is returned to Microsoft. Upon validation of the test results, Microsoft will post the tested configuration on a Cluster Hardware Compatibility List on its web site.

Are there restrictions on who can validate configurations, or to how many configurations they can validate for MSCS?

There is no limit to the number of cluster configurations anyone can validate once Microsoft Cluster Server is released. The Cluster Hardware Compatibility Test will be available for download from the Microsoft web site. Anyone with the expertise and proper lab setup will be able to download the test, run it, and submit the encrypted results file to Microsoft. Once Microsoft validates the results, the validated configuration will be added to the Cluster Hardware Compatibility List on the Microsoft web site. Because of the lab setup, personnel, and time required to validate a cluster configuration, it is likely that system vendors, component vendors, and system integration firms will primarily do validations.

Where will Microsoft post the Hardware Compatibility List (HCL) for MSCS?

The Cluster HCL will be posted when MSCS releases. It will be found from the Windows NT Server web site at http://www.microsoft.com/ntserver/info/hwcompatibility.htm.

Where will Microsoft post the Cluster Hardware Compatibility Test (HCT) for MSCS?

The Cluster HCT will be posted when MSCS releases. It will be found from the Microsoft web site at http://www.microsoft.com/hwtest.

What are the general requirements for MSCS cluster hardware?

The most important criteria for MSCS hardware is that it be listed on the Microsoft Cluster Hardware Compatibility List, indicating it has passed the MSCS Cluster Hardware Compatibility Test. Microsoft will only support MSCS when used on a validated cluster configuration. Validation is only available for complete configurations that were tested together, not on individual components.

A cluster configuration is composed of 2 servers, storage, and networking. Here are the general requirements for MSCS cluster hardware for Windows NT Server, Enterprise Edition 4.0:

Servers

    • Two PCI-based machines running Windows NT Server, Enterprise Edition. MSCS can run on Intel and compatible systems (Pentium 90 or higher processor), or RISC-based system with an Alpha processor. However, you cannot mix Intel Architecture and RISC servers in the same cluster.
    • Each server needs at least 64MB of RAM; at least 500MB of available hard disk space; a CD-ROM drive; Microsoft Mouse or compatible pointing device; and a VGA, Super VGA, or video graphics adapter compatible with Windows NT Server 4.0.

Storage

    • Each server needs to be attached to a shared, external SCSI bus that is separate from the system disk bus. The SCSI adapters need to be PCI. Applications and data are stored on one or more disks attached to this bus. There must be enough storage capacity on this bus for all of the applications running in the cluster environment. This configuration allows MSCS to migrate the applications between machines.
    • Microsoft recommends hardware RAID for all disks on the shared SCSI bus, to eliminate disk drives as a potential single point of failure. This means using either a RAID storage unit, or a SCSI host adapter that implements RAID across "dumb" disks.

Network

    • Each server needs at least two network cards. Typically, one is the public/corporate net and the other is a private net between the two nodes. The net adapters need to be PCI.
    • A static IP address is needed for each group of applications that move as a unit between nodes. MSCS can project the identity of multiple servers from a single cluster by using multiple IP addresses and computer names.

 Back to Topic Index

 

Servers

What system vendors will offer MSCS cluster configurations?

All of the following system vendors have announced plans to offer MSCS-based clusters: Amdahl®, Compaq®, Data General®, Dell®, Digital Equipment Corporation®, Fujitsu®, Hitachi®, Hewlett-Packard®, IBM®, NCR®, Olivetti®, Siemens Nixdorf®, Stratus®, Tandem®, and Unisys®. Prior to the release of MSCS, a list of hardware vendor announcements relative to MSCS clusters can be found in the Microsoft Windows NT Server, Enterprise Edition Solutions Directory at http://microsoft.com/ntserver/info/ntsedir.htm. Following the release of MSCS in Windows NT Server, Enterprise Edition, the list of supported cluster configurations will be in the Cluster Hardware Compatibility List found from the Microsoft Windows NT Server web site at http://www.microsoft.com/ntserver/info/hwcompatibility.htm.

Is it necessary that both servers within a cluster be identical?

The Cluster Hardware Compatibility Test does not require that both servers in a validated configuration be identical. MSCS runs on Windows NT Server, Enterprise Edition so a validated MSCS cluster can potentially contain any two servers that are validated to run that version of Windows NT. (One exception: you cannot mix Alpha and Intel Architecture processors in the same cluster.) Note that MSCS hardware validation will apply to a complete cluster configuration – 2 servers, an interconnect, and a storage solution – so it is unlikely that system vendors will validate clusters containing servers from more than one system manufacturer. However, it is conceivable that system integrators or component vendors might validate mixed-vendor clusters in response to customer demand.

Will MSCS run on our existing servers?

This depends on whether or not your existing servers have been validated within a complete cluster configuration. There will be a hardware validation process for MSCS clusters, just as there is for other Microsoft system software. An MSCS validation will test a complete cluster configuration, including specific models of servers, storage systems, and cluster interconnect. Customers concerned about whether servers they buy today will work in MSCS clusters in the future should question their hardware vendor about the vendor’s plans to validate MSCS cluster configurations.

Do you expect customers to implement clusters on their existing equipment?

This is potentially possible, and could eventually become quite common, but most of the initial customers will probably acquire new cluster systems. The process for MSCS will validate complete cluster configurations – i.e., servers, storage, interconnect – not just individual components. Thus, if customers are already using selected servers and/or storage subsystems that have been validated within a complete MSCS cluster configuration, then they would be able to implement a cluster with those components by adding the rest of the hardware included in the validated configuration.

 Back to Topic Index

 

Storage

What storage connection techniques will MSCS support?

MSCS is architected to work with standard Windows NT Server storage drivers, so it can potentially support any of the current or anticipated storage interconnections available through Win32 or Windows Driver Model. However, all of the cluster configurations currently being considered for MSCS validation use standard PCI-based SCSI connections (including SCSI over fiber.)

Will MSCS support fiber disk connections in addition to SCSI?

Yes, once there are standard fiber disk drivers for Windows NT Server. In reality this doesn't fundamentally change the way MSCS uses disks. Fiber connections will still be using SCSI devices, but they will be hosted on a Fibre Channel bus instead of a SCSI bus. Conceptually, this is encapsulating the SCSI commands within Fibre Channel. Therefore, the SCSI commands upon which MSCS relies (Reserve/Release and Bus Reset) will still function as they do over standard (i.e., non-fiber) SCSI.

Does MSCS prefer one type of SCSI signaling over the other (i.e., differential versus single-ended)?

MSCS works best with differential SCSI with the 'Y' cables. The termination should be outside the systems so that losing power in the system does not cause the termination on the SCSI bus to be lost. Also, note that good drives in good electrical/mechanical enclosures make this work better as well.

Will MSCS support RAID on disks in a cluster?

Yes. Hardware RAID may be used to protect disks connected to the shared multi-initiator SCSI bus. Other disks in the cluster may be protected by either hardware RAID or by the built-in software RAID ("FTDISK") capability of Windows NT Server.

Why doesn't MSCS support Windows NT Server software RAID ("FTDISK") for disks connected to the shared SCSI bus?

The current FTDISK capability in Windows NT Server provides excellent, cost-effective protection of disks connected to a single server. However, its architecture is not well suited to some situations that can occur when doing failover of disk resources connected to two servers via multi-initiator SCSI. Microsoft plans to enhance FTDISK in a future release to address this issue. In the meantime, disks connected to a Windows NT Server machine via multi-initiator SCSI can be fully protected by widely available hardware RAID.

Which hardware RAID devices will MSCS support?

Support for any particular RAID device will depend on its inclusion in a validated cluster configuration.

Will MSCS support PCI RAID controllers?

Selected PCI RAID controllers may be validated within an MSCS cluster configuration. Some of these controllers store information about the state of the array on the card -- not on the drives themselves -- so it's possible that the cards in the two servers might not be in synch at the moment a failover occurs. For this reason, RAID controllers that store information in the controller will not work with MSCS. MSCS will only be validated with RAID solutions that store the meta-data for RAID sets on the disks themselves so that it is independent of the controllers.

Are there any plans to support a shared solid state drive?

No shared solid state drives have yet been tested, but there is nothing that would preclude their use. As long as the SCSI 2 reserve/release and bus reset functions are available, these devices should work with MSCS.

Is it possible to add hard drives to an MSCS cluster without rebooting?

It depends on whether the drive cabinet supports this, since Windows NT will not do so until the Windows NT 5.0 release. There are examples of RAID cabinets validated for Windows NT that support changing volumes on the fly (with RAID parity.)

 Back to Topic Index

 

Interconnect

What is a cluster "interconnect"?

It is recommended that MSCS clusters have a private network between the servers in the cluster. This private network is generally called an "interconnect", or a "system area network" (SAN). The interconnect is used for cluster-related communications. Carrying this communication over a private network provides dependable response time, which can enhance cluster performance. It also enhances reliability by providing an alternate communication path between the servers. This assures MSCS services will continue to function even if one of the servers in the cluster loses its network connections.

What type of information is carried over the cluster interconnect?

The interconnect in an MSCS cluster will potentially carry the following five types of information:

    • Server "heartbeats": These tell MSCS that another server is up and running.
    • Replicated state information: MSCS does this so that every server in the cluster knows which cluster groups and resources are running on every other server.
    • Cluster commands: MSCS software on one server can issue a command to the MSCS software on another server. For example, when moving an application, MSCS actually tells its current server to take it offline, and then tells the new server to bring it online.
    • Application commands: A cluster-aware application might use the interconnect to communicate among copies of the application running on multiple servers. This is generally referred to as "function shipping".
    • Application data: A cluster-aware application might use the interconnect to transfer data between servers. This is generally called "input/output (I/O) shipping".

Can a cluster have more than one interconnect?

An MSCS cluster can only have a single private network, but MSCS will automatically revert to a public network connection for heartbeat and other cluster communications should it ever lose the heartbeat over the interconnect. Also, note that some vendors offer high-performance interconnect products that include redundant paths for fault tolerance.

What type of network is required for an MSCS cluster interconnect?

A validated MSCS cluster configuration can use as its interconnect virtually any network technology that is validated for Windows NT Server. This includes, for example, 10BaseT ethernet, 100BaseT ethernet, and specialized interconnect technologies such as Tandem® ServerNet®.

When is it necessary to have a high performance interconnect such as 100BaseT Ethernet or Tandem ServerNet?

Interconnect performance can potentially affect cluster performance under two scenarios: (1) the cluster is running thousands of cluster groups and/or resources, or (2) the cluster is running a scalable, cluster-aware application that uses the interconnect to transfer high volumes of transactions and/or data. In either of these cases, customers should choose a cluster configuration with a higher-speed interconnect such as 100BaseT, or Tandem ServerNet.

Cluster-aware applications that use MSCS to achieve very high levels of scalability will most likely become common in the MSCS "Phase 2" timeframe. Thus higher-speed interconnects are likely to become more important in larger, Phase 2 clusters.

There has been a lot of talk about "man in the middle" and "replay" attacks on machines connected across the Internet. Will MSCS clusters be vulnerable to this same type of attack if someone illegally connects to the interconnect between the servers?

No. MSCS employs packet signing for intracluster communications to protect against replay attacks.

 

When will MSCS support interconnects based on the Virtual Interface Architecture?

Microsoft expects to support interconnects based on the VI Architecture specification in Phase 2 of MSCS, which is scheduled for beta test in 1998.

 Back to Topic Index

 

Networking

Will MSCS support the failover of IP addresses?

Yes.

Will MSCS support other network protocols such as IPX?

No other protocols are planned at this time.

How does MSCS do IP failover?

MSCS has the ability to failover (move) an IP address from one cluster node to another. The ability to failover an IP address depends on two things: 1) support for dynamic registration and deregistration of IP addresses, and 2) the ability to update the physical network address translation caches of other systems attached to the subnet on which an address is registered.

Dynamic address (de)registration is already implemented in Windows NT Server to support leasing IP addresses using the Dynamic Host Configuration Protocol (DHCP). To bring an IP Address resource online, the MSCS software issues a command to the TCP/IP driver to register the specified address. A similar command exists to deregister an address when the corresponding MSCS resource is taken offline.

The procedure for updating the address translation caches of other systems on a LAN is contained in the Address Resolution Protocol (ARP) procedure, which is implemented by Windows NT Server. ARP is an IETF standard, RFC 826. RFC 826 can be obtained on the Internet from ftp://ds.internic.net/rfc/rfc826.txt.

How does MSCS update router tables when doing IP failover?

As part of its automatic recovery procedures, MSCS will issue IETF standard ARP "flush" commands to routers to flush the machine addresses (MACs) related to IP addresses that are being moved to a different server.

How does the Address Resolution Protocol (ARP) cause systems on a LAN to update their tables that translate IP addresses to physical machine (MAC) addresses?

The ARP specification states that all systems receiving an ARP request must update their physical address mapping for the source of the request. (The source IP address and physical network address are contained in the request.) As part of the IP address registration process, the Windows NT TCP/IP driver broadcasts an ARP request on the appropriate LAN several times. This request asks the owner of the specified IP address to respond with its physical network address. By issuing a request for the IP address being registered, Windows NT Server can detect IP address conflicts; if a response is received, the address cannot be safely used. When it issues this request, though, Windows NT Server specifies the IP address being registered as the source of the request. Thus, all systems on the network will update their ARP cache entries for the specified address, and the registering system becomes the new owner of the address.

Note that if an address conflict does occur, the responding system can send out another ARP request for the same address, forcing the other systems on the subnet to update their caches again. Windows NT Server does this when it detects a conflict with an address that it has successfully registered.

MSCS uses ARP broadcasts to re-set MAC addresses, but ARP broadcasts don't pass routers. So what about clients behind the routers?

If the clients were behind routers, they would be using the router(s) to access the subnet where the MSCS servers were located. Accordingly, the clients would use their router (gateway) to pass the packets to the routers through whatever route (OSPF, RIP, etc) is designated. The end result is that their packet is forwarded to a router on the same subnet as the MSCS cluster. This router's ARP cache is consistent with the MAC address(es) that have been modified during a failover. Packets thereby get to the correct Virtual server, without the remote clients ever having seen the original ARP broadcast.

Can an MSCS cluster be connected to different IP subnets? (This is possible with a single Windows NT server, even with a single NIC, by binding different IP addresses to the NIC and by letting Windows NT Server route between them.) For example, can MSCS support the following configuration:

Yes, MSCS permits servers in a cluster to be connected to multiple subnets. MSCS supports physical multi-homing no differently than Windows NT Server does. The scenario shown in the picture above is perfectly acceptable. The two external subnets (1&2) could connect the same clients (redundant fabrics) or two different sets of clients. In this scenario, one of the external subnets (#1 or #2) would also have to be a backup for intracluster communication (i.e., backup the private subnet #3), in order to eliminate all single points of failure that could split the cluster.

Note that MSCS will not support a slightly different scenario: NodeA on Subnet1, NodeB on Subnet2, with Subnet1 & Subnet2 connected by a router. This is because there is no way for MSCS to failover an IP address resource between two different subnets.

Can MSCS use a second Network Interface Card (NIC) as a hot backup to a primary NIC?

MSCS can only do this for the cluster interconnect. That is, it provides the ability to use an alternate network for the cluster interconnect if the primary network fails. This eliminates an interconnect NIC from being a single point of failure. There are vendors who offer fault tolerant NICs for Windows NT Server, and these can be used for the NICs that connect the servers to the client network.

How do you specify to MSCS which NIC to use for the interconnect, and which NIC(s) to use as backup interconnects?

The MSCS setup allows administrators to specify the exact role that a NIC provides to the cluster. There are three possible roles for each NIC in a cluster:

    • Use for all Communications (Cluster and Client)
    • Use only for internal cluster communications (Cluster only)
    • Use only for Client Access

The typical MSCS cluster will have one NIC on each server designated for internal communications (cluster only), and one or more other NICs designated for all communications (cluster and client.) In that case, the cluster-only NIC is the primary interconnect, and the "all communications" NIC(s) server as backup interconnects if the primary ever fails.

Examples of client-only NICs include a LAN/WAN/Internet connection where it would be ineffective/impolite to do heartbeats and cluster traffic.

Can MSCS work with "smart switches" that maintain a 1-to-1 mapping of MAC addresses to IP addresses? These switches are quite common in VLAN configurations in which the level 2 network fabric uses level 3 address information for switching packets. These switches only cache one IP address for each MAC address. Such layering "violation" allows switch vendors to do better lookups and use existing routing protocols to distribute host routes plus MAC addresses. Will MSCS be continually forcing these devices to flush and reset their MAC-to-IP maps due to its use of multiple IPs per MAC, plus the ARP flushes when doing IP failover?

MSCS can work with these switches, but it might affect their performance. If customers experience this problem, there are two possible solutions: (1) have a router sit between the cluster and the switch, or (2) disable the "smarts" on the smart switches.

 Back to Topic Index

 

Software Licensing

How will Microsoft license MSCS?

MSCS is a built-in feature of Windows NT Server, Enterprise Edition (Windows NT Server/E), so customers must license Windows NT Server/E for both servers in a cluster.

Are Client Access Licenses required for accessing an MSCS cluster?

The question of whether a Client Access License (CAL) is required is unaffected by whether a server is standalone or in an MSCS cluster. For example, the standard Microsoft End User License Agreement for Windows NT Server requires a CAL for each client that access the shared file services of Windows NT Server. This is true whether the client is accessing a file share on a standalone server, or on an MSCS cluster. Put another way: there is no special CAL requirement related to accessing an MSCS cluster.

How will applications be licensed on MSCS clusters?

Each application vendor will determine their own licensing policies for applications running on MSCS clusters. Microsoft's current policy for server application licensing will still apply to MSCS clusters: an application must be separately licensed for each server on which it is installed. In an MSCS cluster, if an application is to run on both servers, or even if it only runs on one server at a time but must be installed on both servers to permit failover, then the application must be licensed for both servers.

How will Microsoft Client Access Licenses for BackOffice applications be handled on MSCS clusters?

If the customer is using "per-seat" Client Access Licenses for the application, then those licenses apply when a client is accessing the application on either server in the cluster. If the customer is using "per-server" (or "concurrent use") Client Access Licenses for the application, then each machine in the cluster should have a sufficient number of per-server Client Access Licenses for the expected peak load of the application on that machine. (Note that "per-server" Client Access Licenses do not "failover" from one machine in the cluster to the other.)

 Back to Topic Index

 

Deployment

What support services are available for MSCS?

MSCS will be eligible for support from all of Microsoft’s customer support resources, including Enterprise Phone Support, Premier Support Technical Account Managers, and Microsoft Consulting Services. In addition, MSCS customers will be able to acquire training from Microsoft Authorized Training and Education Centers (ATECs), support services from the system vendors providing MSCS-validated cluster configurations, and value-added services from Microsoft Solution Providers that choose to offer MSCS-related services.

Will Microsoft extend the Microsoft Certified Professional (MCP) program to include certification of cluster-related skills?

Microsoft will not include cluster-related certification in the MCP program in 1997. Cluster-related certification is being considered for future updates to the program.

In the two-server cluster configuration, should the second server be a "hot standby", or can the two servers be running separate jobs up until the time when one fails and the other takes over?

MSCS provides true "active/active clustering", which means every machine in the cluster is available to do real work, and each machine in the cluster is also available to recover the resources and workload of any other machine in the cluster. Thus, there is no need to have a wasted, idle server standing by waiting for a failure. Of course, a customer might choose to run a light workload or a non-critical function that can be easily pre-empted on one of the machines in an MSCS cluster if they want to make sure there’s sufficient processing power available for recovery of performance-sensitive workload.

Besides clustering, what else should be done to provide highly available Windows NT Server services?

MSCS complements other high-availability techniques such as data mirroring, RAID disk protection, uninterruptible power supplies, and duplicated hardware such as fans and network interface cards. The availability role of MSCS is to automatically restore user access to data and services following the failure of individual applications or servers. MSCS and other high-availability technology should be used in concert with prudent IT administration procedures for data backup and disaster-site recovery to ensure continuous availability of mission-critical IT resources.

Will client software have to be updated to take advantage of an MSCS cluster?

No. MSCS does not require any special software on the client for transparent recovery of services that connect to clients via standard IP protocols, such as web sites or Windows file shares. Note that, since server resources and applications can potentially be unavailable for up to a minute or so during MSCS recovery procedures, the client component of a client/server application should ideally be able to gracefully handle pauses in service. However, that characteristic is already common in Microsoft client software, browsers, and most modern packaged applications.

Does an application need to be installed separately on both servers in a cluster?

Yes, typically each application that is part of a cluster group must be installed separately on both nodes so that it can be started on either node during a failover. Typically, this is done by (1) "failing over" the application's disks on the shared SCSI bus to the first server, (2) installing the application on the first server using those disks for application files, (3) failing the disks over to the other server, and (4) repeating the installation process on the second server, using the same disks.

Suppose there are several services running on 1 node (say, IIS, SQL, and Exchange). On the failure of that node, can you setup the cluster so that only 1 service fails over to the 2nd node?

Yes. Only the services you setup in the MSCS cluster administration console will failover. If you only setup one service to failover, then the other two will not failover.

Should servers in a cluster be directory service (domain) controllers?

Domain controllers already have their own high availability backup capability, so there are no additional restrictions or issues related to clusters. For example, without an MSCS cluster:

    • If you have a Primary Domain Controller (PDC) and a Backup Domain Controller (BDC) and one of them fails, the other is still available to process logons.
    • If you have 2 BDCs and one of them fails, the other is still available.
    • If you have a single PDC and it fails, then you have no domain controller.

All of this is true if the servers are in a cluster. MSCS neither adds nor subtracts from the current high availability capabilities of Windows NT Directory Services.

Should servers in an MSCS cluster use the Microsoft Distributed File System (Dfs)?

All of the distributed services of Windows NT Server – including the Microsoft Distributed File System, NT Directory Services, security services, remote administration, etc. – are important building blocks for creating manageable, secure, easily utilized networks of servers. Servers tightly connected within an MSCS cluster benefit from these distributed services just as do servers loosely connected by a network.

How can MSCS help do load balancing between web servers?

The two most common techniques used to load-balance between multiple mirrors of a web site are Network Address Translation (NAT) routing, and DNS round-robin routing. Cisco and other vendors sell routers that use NAT as well as some sort of load balancing. A site has one URL and one IP address. If a server goes down, the router sees this and stops sending requests to the web server. This offers good performance and easy manageability, but these NAT routers can be expensive.

An easier, less expensive technique is to use simple round-robin DNS to split requests among a number of Web servers that all have the same data on them. A site has one URL, but several IP addresses, and loads are randomly distributed across all of the IP addresses. A problem with round-robing DNS is that, if a server goes down, someone typically has to manually remove the IP address from the DNS round robin list.

MSCS can complement round-robin routing by eliminating the need to manually remove failed IP addresses from the round-robin list. You setup an MSCS cluster running IIS on each server with each site's web files on the shared SCSI bus. You synchronize the data between the two sites. If one of the servers fails, the virtual root of the failed machine is transferred to the other server in the cluster along with its IP addresses, so both sites continue to serve customers. And, once the failed server resumes operation, MSCS can automatically "fail back" its virtual root to re-balance the workload.

I need to create many file shares. Is there an alternative to doing them one at a time through the MSCS New Resource Wizard?

One answer would be to write a resource DLL patterned after the SMB Share sample to manage the shares. It would use API calls to create the shares when coming online and "destroy" the shares when going offline.

What are the criteria for running a resource in a separate cluster resource monitor?

The tradeoff is extra isolation from application/resource failures, versus more consumption of server resources by MSCS. You should run a resource in a separate resource monitor when testing a new resource DLL. This assures that, if the resource DLL compromises the resource monitor, it won't affect the core cluster services of MSCS.

Should the quorum disk be on a separate physical disk?

The quorum disk does not have to be on a separate physical disk. You can use the quorum disk for applications, also. However, if you want to allocate a specific volume for this role you can do so. This will, in some cases, marginally improve failover time.

Would a shared solid state drive provide higher availability than standard disk drives?

Perhaps. Solid state drives reduce the seek and rotational latency that is associated with conventional DASD. This performance can be leveraged by applications to minimize possibilities of data loss by essentially writing through the cache without totally destroying system performance. But even in such a case, there remains the possibility for cable, operator, and other failures that can result in inconsistent data. No matter how quickly the data is written to the media, there is a window of vulnerability. For this reason, applications still need to provide some model for persistence to insure that state can be recaptured. A good example of this is the transaction semantics used by database management systems to maintain the integrity of their on-disk data.

 Back to Topic Index

 

Trouble Shooting

When diagnosing problems that appear to be cluster-related, how can I determine what is happening in the cluster services?

For problem reporting with the initial release of MSCS, you must use the "cluster log". (Future releases will make greater use of the Windows NT Server Event Monitor.) To turn on the cluster log, you should set an environment variable called in the system environment for your system that sets clusterlog to some path on your system. For example have the environment variable set clusterlog to %windir%\cluster\cluster.log and then reboot. When the cluster service starts, it will log failure reasons and other info in the clusterlog file. That way it will be easier to diagnose the problem.

Should the cluster administration console be connected to the cluster name, or to a node name?

Connect using the node name instead of the cluster name, as documented in the Cluster administrator's guide. If you connected to the cluster name you would utilize the RPC service to the cluster endpoint mapper. Since this gets failed over, your RPC session for cluster admin has to wait to timeout, which can take a relatively long time. When you connect using the node name, the cluster does not thrash in the event of such a failure. Instead, it simply arbitrates for ownership of the quorum device. After this is settled, one cluster node remains, where the appropriate failover services are running. You can then reconnect the cluster administration console to the surviving server.

From CMD shell on one server, if you try to access a drive owned by the other server you get "Incorrect function". Why?

MSCS is a "shared nothing" environment, meaning that disk resources are owned by only one server at any point in time. "Incorrect function" is the message you get when trying to do local access to disks that are owned by a different server.

How come stopping the server service on either cluster node does not cause failover? It appears that the cluster software is not monitoring the server service but just the local cluster objects directly, not via the server service.

MSCS does not explicitly check for the server service, but it does monitor the LanManServer. Therefore, with SMB shares, it will fail these over in the event that the LanManServer service failed or was stopped. If you want MSCS to monitor and restart the ser ver service also, you can easily do so using the admin wizard to set it up as a "generic service".

A corporate network failure didn’t cause failover of any resources. The Cluster Admin tool fails with an error dialog stating that the cluster service has stopped. How do the cluster nodes identify when a net failure occurs?

This is a case where clustering by itself cannot eliminate every potential single point of failure in a system. Just as highly available clusters should employ hardware RAID to protect against loss of physical disk drives, they should also include dual-path SCSI and redundant NICs to protect against loss of a single SCSI controller or network interface card.

How do you move the quorum resource to another disk?

This is done in Cluster Admin by selecting the cluster and right clicking. One of the three tabs is Quorum Resource, which allows you to modify this entry.

If the heartbeat link is down and both machines are performing quorum, how should the machine that cannot reserve the SCSI bus react? In the normal case, should only the machine that can reserve the SCSI bus survive and the other machine go down?

First, both nodes cannot have the quorum resource. However, both nodes can be operating in the cluster if one node has the quorum resource and the second node joins the cluster. When a partition is discovered, both nodes arbitrate for the quorum resource. One node wins the arbitration (if they are still partitioned) and the other node loses. The loser shuts down the cluster service, the winner fails over all groups and continues to operate.

What's the recommended procedure if you want to run CHKDSK on a disk connected to the shared SCSI bus of a cluster?

CLUSSVC has start options where the service can be started without quorum logging. This is either at a command prompt or from the service panel, with the -noquorumlogging option. At that point, the storage devices on the shared SCSI bus can be checkdsk'd.

 Back to Topic Index

 

Developer Issues

Customers and software vendors are interested in developing DLL's to make applications "cluster aware". Is there any documentation, sample code, etc. to assist them in the process?

Yes, there is a Software Development Kit (SDK) for MSCS. The MSCS SDK has an SMB file share example DLL (with code). Developers can take this as a template and fill in their own application specific code in the specific routines (Online, Offline, Is Alive, Looks Alive, etc.)

How will Microsoft distribute the MSCS Software Development Kit (SDK)?

During the beta program, the beta SDK for MSCS will be distributed on the Beta CD that accompanies Microsoft Developers Network (MSDN) Level III, and on the Beta Evaluation CD provided to organizational customers who have executed Microsoft Select volume licensing agreements. Following release of Windows NT Server/E, the MSCS SDK will be distributed in the Platform SDK via MSDN Level III.

MSCS SDK documentation says, "Registry replication is a configurable feature that is available to the Generic Application and Generic Service resource types. Basically, you tell it what registry key to watch/replicate and that’s all there is to it. If the application/service stores volatile information in a specific registry key, then the key should be declared in the properties section of the resource so that it may be replicated. If this is done, when the resource comes online on another node, it will have the same registry information as the previously online resource. Application/service registry keys, by default, are not replicated or stored within the cluster database." Why should anyone use the cluster API's to write registry keys to the cluster database? When should one use one over the other?

If you're just going to use a generic application resource, then you should just use registry checkpointing. However, using the generic application resource type has some limitations. For example:

    • You can't do active/active, which will limit load balancing if you also want failover (more about this below.)
    • When you go offline, it simply terminates the process. If you have a GUI application, you may only get 300ms to clean up.
    • The application isn't configurable via the cluster administrator's tools.

Alternatively, you can write a resource DLL for the application. At that point you face additional issues. First of all, if you're talking about user configurable parameters, they should be using private properties associated with the resource type. It gives a common method by which admin tools can query and set the parameters for a given resource. These property requests are ultimately handled by the resource DLL. That leads to the question of why the resource DLL and application should use the cluster database.

Each resource has its own section of the cluster database, as opposed to the general per-application focus of the Windows NT registry. This becomes an issue if you want your resource to be more granular that just your application. For example, if your resource is your database server, you can only run the server on one node at a time. On the other hand, if your resource is databases presented by that server, then you can have the database server running on both nodes. (For example, one node might have a payroll database, while the other will have an orders database.) If one node goes down, the server on the other node can pick up the database that no longer has a host. This is the active/active configuration mentioned above. To do this your settings need to be per-resource, not per application.

Also, registry checkpointing is only done when the resource is running. If you make any settings changes via a separate admin tool when the resource isn't online, those changes won't get propagated.

With service resources there is an option to have part of the registry entries fail over to the secondary node. Since all file share information is stored in the registry, can this be used as an alternate way to provide file share failover?

No. Share information is stored in the registry, but that doesn't mean modifying the registry is the correct way to create shares. One problem would be that you have to reboot for the registry changes to result in the creation of a share. There also remains the problem of what you do when you fail over. If both machines are set up with shares pointing to a drive on the shared bus, one machine is going to have shares referring to a device the machine can't access.

What mechanisms are advised with respect to Named Pipes and Semaphores in a cluster application environment for process-to-process communication (for example, registry settings changed on one node of the cluster, how are they updated at the other node, etc.)?

Since the main issue is the transfer of inner transactional state information you could use the transacted registry feature of MSCS to get registry information over to the other node in case of a failover or, even better, make your transactions small enough so they can be replayed easily. Use Microsoft Transaction Server to get the best support for your (D)COM objets.

The MSCS SDK references the file MSCLUS.DLL. What is this and where is it located?

MSCLUS.DLL is the COM interface to the CLUSAPI. Because it is close to completion, it was included in the initial MSCS SDK documentation. However, it was not completed in time to ship with the original release of Windows NT Server, Enterprise Edition 4.0. Microsoft plans to release it via web and MSDN distribution in the 2nd half of 1997.

If you have cluster calls in an application, what do you need to do to make your application work in a non-cluster environment as well?

Right now you should ensure that you can install on a cluster as well as on a single machine. Note that MSCS does not yet support an application level channel through the cluster. The Cluster SDK gives you an idea of what you can do today to get aware of a cluster and what you can do with it.

 Back to Topic Index

 

Comparison to Other Products

What other solutions are currently available to facilitate high availability in a Microsoft Windows NT Server environment? What are the major differences between MSCS and these other solutions?

Clustering and high availability software for Windows NT Server is currently available from a variety of vendors including Amdahl, Compaq, Data General, Digital, Fujitsu, IBM, Marathon, NCR, Netframe, N.S.I., Octopus, Stratus, Tandem, Unisys, Veritas, and Vinca. The features, benefits, pricing, and hardware requirements of these products vary considerably. Each has its own unique strengths, and each is currently providing value to satisfied customers.

However, these same vendors have also participated in the Open Process design reviews for MSCS, and many have already announced plans to offer MSCS-based clustering solutions. Why? Because enterprise customers want broadly available, cross-platform solutions like MSCS that enhance flexibility, reduce lock-in, expand their choices, and drive competitive pricing. In addition, MSCS is unique in its ability to deliver all of the following benefits:

    • Integrated: MSCS is designed to integrate with existing systems (networks, data, applications, and platforms) and to enjoy wide industry support. It also features tight integration with Windows NT Server plus open interfaces for value-added extensions to assure integration with future server technologies and cluster-aware solutions.
    • Comprehensive: MSCS protects against application failures, protects against server failures, protects against planned downtime, and provides single-system-image management. It also features a cross-platform API that will be used by software developers to extend its capabilities, and to create a new generation of highly scalable, cluster-aware applications.
    • Easy: No scripting or programming is required. All cluster setup and management can be done from anywhere on the network using a graphical administration console that fully exploits familiar Microsoft Windows interface standards. It's also easy for developers to extend the value of MSCS via its Win32-based Cluster API, and easy to deploy their solutions on cluster hardware from dozens of vendors.

In 1995 Microsoft announced it had licensed clustering technology from Digital Equipment. In 1996, Microsoft announced it had licensed clustering technology from Tandem. How closely is MSCS related to the Digital and/or Tandem clustering products?

Both Digital Clusters for NT Server and Tandem Cluster Availability Solution share many of the key benefits of MSCS including active/active clusters, automatic failover/failback, and graphical administration. However, they are all three different products, written & supported by different vendors. MSCS benefited from the proven clustering technology of both Digital and Tandem. Microsoft developers built on that foundation, adding tight integration with Windows NT Server distributed services, support for industry networking and storage standards, plus a dramatically new level of ease-of-use for administrators and developers. Digital and Tandem, plus other leading system vendors, supported the development of MSCS and will offer solutions based on MSCS so that their customers can benefit from its advances and wide industry support.

MSCS provides failover for individual applications and for whole servers. Other high availability solutions only provide failover for servers. What are the implications of this difference?

Many of the simpler failover products currently available for Windows NT Server can only recognize and recover from complete server failures. MSCS, on the other hand, is a true clustering solution that can also monitor individual applications and resources. This allows MSCS to automatically recognize and recover from more failure conditions, and provides administrators with greater flexibility in managing the workload within a cluster.

Simple failover products monitor a single "heartbeat" per server. MSCS can monitor server heartbeats PLUS up to two different types of heartbeat for each application and resource: a quick "looks alive" heartbeat, plus an optional "is alive" heartbeat that can perform a more extensive check to detect subtle failure conditions. These heartbeats are very efficient and typically have no appreciable impact on cluster performance. However, the person administering a cluster can easily change the polling rate for any of these heartbeats at any time using the MSCS graphical administrator’s console.

With the increasing performance of standard server hardware, many customers today are running mixed workloads, rather than having dedicated single-purpose servers. Unlike simple failover products that can only manage entire physical servers, MSCS simplifies the management of mixed workloads with its concept of "cluster groups": a collection of applications and resources that, together, constitute a single business process, or a "virtual server". MSCS lets administrators establish different failover policies and priorities for each cluster group so that mixed workloads are recovered correctly in the event of an application or server failure. MSCS also lets administrators easily adjust server workload within a cluster by moving individual business processes (i.e., cluster groups) between servers with a simple point-and-click action from the graphical MSCS administrator’s console.

N.S.I., Octopus, Vinca and some other vendors offer high availability solutions that use mirrored disks rather than shared SCSI like MSCS. What are the criteria customers should use when comparing mirrored-disk solutions to MSCS?

When you compare the relative strengths of a mirrored-disk failover solution to a true clustering solution like MSCS, it's obvious that they actually complement each other. The strengths of MSCS for high availability sites are:

    • Performance: MSCS is true "active/active" clustering, meaning both servers are running their own workload, and both can also take over the workload of the other in the event of a failure. Most mirroring solutions assume one-way failover, from a primary server to a stand-by or backup server.
    • Availability: MSCS can monitor and recover individual applications. Most mirroring solutions only monitor for server failures. Also, MSCS's ability to move ownership of shared SCSI disks assures that applications always start up with exactly the same disk-based data as at the moment of failure. Mirrored solutions have a finite amount of time during each mirroring operating in which the local and remote disks are out-of-synch.

The strengths of mirrored-disk failover solutions for live backup and disaster recovery are:

    • Live backup: Mirrored solutions create a near-real-time second copy of data, possibly at a remote disaster recovery site.
    • Distance: Mirrored-disk failover solutions have virtually no distance limitation between the primary server and the recovery server. In an MSCS cluster, the two servers can typically be no further apart than allowed by the shared SCSI bus.

Clusters like MSCS are a preferred solution for providing highly available services in data centers and other mission-critical sites. Mirrored-disk failover solutions can complement clusters by providing live backup of important data plus automatic failover to disaster recovery sites in the event of a total site failure. It is because of these complimentary roles that vendors such as N.S.I., Octopus, and Vinca have already announced plans to offer data mirroring and remote site recovery that works with MSCS clusters.

How does a clustering solution like MSCS differ from a "fault tolerant" or "non stop" server?

MSCS clusters offer high availability. The term "fault tolerant" is generally used to describe technology that offers a higher level of resilience and recovery. "Fault tolerant" servers typically use a high degree of hardware redundancy plus specialized software to provide near-instantaneous recovery from any single hardware or software fault. Examples of fault tolerant servers include Tandem NonStop and Marathon Endurance 4000 (which is based on Windows NT Server.) These solutions cost significantly more than a clustering solution, because you must pay for redundant hardware that waits idly for a fault from which to recover. Fault tolerant servers are used for applications that support very high value, high rate transactions such as check clearinghouses, Automated Teller Machines (ATMs), or stock exchanges.

How does MSCS compare to Marathon Endurance 4000?

Both MSCS and Endurance 4000 are designed to provide high reliability for standard Windows NT Server applications running on standard hardware. However, they are optimized for different types of customer applications, and are not competitive alternatives. MSCS is a high-availability clustering product, while Endurance 4000 is what's generally referred to as a "fault tolerant" product. There are basically three differences between these products:

    • An MSCS cluster recovers from failures in typically less than a minute. An Endurance 4000 system recovers much faster, typically in less than a second.
    • An MSCS cluster is composed of two servers, both of which are available to run separate workloads. An Endurance 4000 system is composed of 4 servers that perform at the level of a single server. The additional processing power in an Endurance system is used for redundancy and overhead to provide its very fast recovery times.
    • Because of the dramatic difference in available processing power, an Endurance 4000 system is far more expensive than an MSCS cluster in terms of "bang for the buck".

MSCS will be the preferred solution for applications and data that can afford to be unavailable for up to a minute at a time. Marathon Endurance 4000 or other fault-tolerant solutions would be a preferred solution for applications that must sustain very high value, high throughput transactions without pause.

Digital VAX clusters would allow multiple nodes to boot from a single node's OS. Does MSCS offer this ability?

No, since that approach would compromise the availability and scalability benefits of the shared-nothing architecture used by MSCS. Instead, MSCS will rely on system management tools such as Microsoft System Management Server to automate the installation and maintenance of software on distributed and clustered servers.

What is the relationship between Tandem ServerNet=99 and Microsoft Cluster Server?

Tandem ServerNet is a high performance, high reliability communications technology that can be used by MSCS as the "interconnect" (i.e. private network) between the servers in a cluster. Microsoft expects Tandem ServerNet to be a popular interconnect choice for high performance clusters, both because of its advanced technology, and because of the number of system vendors which have licensed ServerNet from Tandem. As a convenience to customers, Microsoft will package Tandem's software drivers for ServerNet with the MSCS feature of Windows NT Server, Enterprise Edition.

Does Oracle Parallel Server=99 for Windows NT Server use MSCS?

No. Oracle Parallel Server (OPS) contains its own clustering services. This means that an OPS cluster only provides high availability for OPS.

 

Back to Topic Index

© 1997 Microsoft Corporation. All rights reserved. Terms of Use.


3D"Go

 

------ =_NextPart_000_01BDE090.6E77EEA0--