------ =_NextPart_000_01BDE090.6E77EEA0
Content-Type: text/plain;
charset="iso-8859-9"
Content-Transfer-Encoding: quoted-printable
Merhaba birkac gundur sehir disinda oldugumdan simdi yazabildim
kusura bakmayin.
> > merhaba
> >
> > > :) :)
> > > Zaten bende bunu nasil yaparim diye sormustum.
> > >
> > Neden kendinizi Linux ile sinirliyorsunuz
> > bu isi NT veya Novell 5.xx kolayca yapabilir
>
> acaba =F6nerdi=F0iniz nt ya da novell de "add a new machine" gibi bir
> se=E7enek
> falan m=FD var konfigurasyon yaparken.
Tam olarak o komut olmasada asagidaki linkten takip edip
nasil yapilacagini adim adim gorebilirsiniz.
http://www.microsoft.com/ntserverenterprise/mscsdemo/
Elimdeki dokumanlardanda FAQ ' i ekte gonderiyorum belki
isinize yarar.
Ayrica bknz.
=09
http://www.microsoft.com/ntserverenterprise/basics/features/clustering/c
lustarchit.asp
> adam back up nas=FDl al=FDr=FDm falan diye sormuyo. her duruma
kar=FE=FD
> yedek bir makina sistemde nas=FDl olur diye soruyo. nas=FDl (=F6rnek
olarak
> sunucu makinada her ihtimale kar=FE=FD yedek bir cpu bulunuyor ve
ar=FDza
> durumunda devreye giriyor ya da raid ile disk ar=FDzas=FD durumunda
yans=FD
> diski kullan=FDl=FDyorsa) makina b=FCt=FCn=FCyle giderse ne olacak
sorusuna
> =E7=F6z=FCm
> bulmaya =E7al=FD=FE=FDyor.
Aciklamaniz icin tesekkur edrim bende bu sekilde yorumlamistim.
Novvel 4.0 dan bu yana yedek makina diye tabir ettiginiz
isi
yerine getiriyor.
> teoride olsa computer science and information
> technology(prentice-hall) kitab=FDnda giri=FE derecesinde baz=FD
bilgiler
> var
> ama deneyimli arkada=FElar=FDn pratikte bunu linux ta nas=FDl
yapar=FDz'a
> =E7=F6z=FCm
> bulmalar=FD gerekiyor.
>
> =E7a=F0lar,
>
> >
> Islerinizde basar'lar
> <<clusterfaq>>
------ =_NextPart_000_01BDE090.6E77EEA0
Content-Type: text/html;
name="clusterfaq.htm"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
filename="clusterfaq.htm"
Content-Description: clusterfaq
<!DOCTYPE HTML PUBLIC "-//W3C//DTD W3 HTML//EN">
|
Intro to Microsoft Cluster Server
What is a server "cluster"? A group of independent servers managed as a single system for higher availability, easier manageability, and greater scalability. What does it take to create a server cluster? The minimum requirements for a server cluster are (a) two servers connected by a network, (b) a method for each server to access the other's disk data, and (c) special cluster software like Microsoft Cluster Server (MSCS). The special software provides services such as failure detection, recovery, and the ability to manage the servers as a single system. What are the benefits of server clustering? There are three primary benefits to server clustering: improved availability, easier manageability, and more cost-effective scalability. Using Microsoft Cluster Server (MSCS) as an example:
What are clusters used for? Customer surveys indicate that MSCS clusters will be used as highly available multipurpose platforms, mirroring the current uses of Windows NT Server. Surveyed customers suggested that the most common uses of MSCS clusters will be mission critical database management, file/intranet data sharing, messaging, and general business applications. When a cluster is recovering from a server failure, how does the surviving server get access to the failed server's disk data? There are basically three techniques that clusters use to make disk data available to more than one server:
Back to Topic Index
Intro to Microsoft Cluster Server What is "Wolfpack"? "Wolfpack" was the code name for Microsoft Cluster Server. What is Microsoft Cluster Server (MSCS)? MSCS is a built-in feature of Windows NT Server, Enterprise Edition. It is software that supports the connection of two servers into a "cluster" for higher availability and easier manageability of data and applications. MSCS can automatically detect and recover from server or application failures. It can be used to move server workload to balance utilization and to provide for planned maintenance without downtime. And, over time, MSCS will also become a platform for highly scalable, cluster aware applications. How many servers can be in an MSCS cluster? The initial release of MSCS will be supported on clusters with 2 servers. A future version referred to as MSCS "Phase 2" will support larger clusters, and will include enhanced services to simplify the creation of highly scalable, cluster aware applications. When will MSCS be available? MSCS has been in development since 1995, and has been in beta test since December of 1996. The initial release of MSCS will be in version 4.0 of Microsoft Windows NT Server, Enterprise Edition. This product is currently in beta test and will be released once customers indicate that it's ready. Microsoft currently expects that to happen this summer. The "Phase 2" version of MSCS -- enhanced to support larger clusters and highly scalable applications -- is expected to enter beta test in 1998. What other companies were involved in the development of MSCS? Microsoft worked closely with leading hardware vendors, software vendors, and customers in the specification and development of MSCS and its Application Programming Interface (API). These other companies participated via five different programs:
In what languages will MSCS be available? Microsoft Windows NT Server, Enterprise Edition 4.0, included MSCS 1.0, will be offered in English, French, German, Japanese, and Spanish. Through what channels will Windows NT Server, Enterprise Edition be available? Microsoft Windows NT Server/E will be available to customers through all standard channels: resellers, retail, OEM, and the Microsoft Select licensing program. What versions of Windows NT Server will MSCS support? MSCS software will only be available as a built-in feature of Windows NT Server, Enterprise Edition. Will MSCS be extended beyond Windows NT Server to Windows NT Workstation? There is currently no plan to extend cluster support to Windows NT Workstation. MSCS software has been designed and written to closely integrate with the architecture and features of Windows NT Server, including its server-oriented networking and directory services capabilities. What clients can connect to an MSCS cluster? Any client that can connect to Windows NT Server via TCP/IP will work with MSCS. This includes MS-DOS®, Windows 3.x, Windows 95, Windows NT, Apple® Macintosh®, and UNIX®. MSCS does not require any special software on the client for transparent recovery of services that connect to clients via standard IP protocols. Back to Topic
Index
High Availability How does MSCS provide high availability? MSCS uses software "heartbeats" to detect failed applications or servers. In the event of a server failure, it employs a "shared nothing" clustering architecture that automatically transfers ownership of resources (such as disk drives and IP addresses) from a failed server to a surviving server. It then re-starts the failed server's workload on the surviving server. All of this -- from detection to re-start -- typically takes under a minute. If an individual application fails (but the server does not), MSCS will typically try to re-start the application on the same server; if that fails, it moves the application's resources and re-starts it on the other server. The cluster administrator can use a graphical console to set various recovery policies such as dependencies between applications, whether or not to re-start an application on the same server, and whether or not to automatically "failback" (re-balance) workloads when a failed server comes back online. Can MSCS provide "zero downtime"? No. MSCS can dramatically reduce planned and unplanned downtime. However, even with MSCS, a server could still experience downtime from the following events:
Microsoft recommends that clusters be used as one element in customers' overall program to provide high integrity and high availability for their mission-critical server-based data and applications. Is MSCS failover transparent to users? MSCS does not require any special software on client computers, so the user experience during failover depends on the nature of the client side of their client-server application. Client reconnection is often transparent, because MSCS has restarted the applications, file shares, etc. at exactly the same IP address. If a client is using "state-less" connections such as a standard browser connection, then they would be unaware of a failover if it occurred between server requests. If a failure occurs while a client is connected to the failed resource, then the client will receive whatever standard notification is provided by the client side of their application when the server side becomes unavailable. This might be, for example, the standard "Abort, Retry, or Cancel?" prompt you get when using the Windows Explorer to download a file at the time a server or network goes down. In this case, client reconnection is not automatic (the user must choose "Retry"), but the user is fully informed of what's happening and has a simple, well-understood method of re-establishing contact with the server. Of course, in the meantime, MSCS is busily re-starting the service or application so that, when the user chooses "Retry", it re-appears as if it never went away. For client-side applications which have "state-full" connections to the server, a new logon is typically required following a server failure. In many cases, this approach is required for security purposes. For example, this is how SAP R/3 works -- if the server connection is lost, the user is prompted to logon again to make sure it's the same user accessing the application. Even with state-full connections, it's possible for an application to automatically re-connect following a failover. For example, when Microsoft demonstrated SAP R/3 failover at Microsoft Scalability Day in New York City on May 20, it was accessed via an Active browser application that had automatically (and securely) cached the user's ID and password from the initial logon. Thus, when the server connection was momentarily lost during the failover demo, the client application automatically logged on again using the cached ID and password. This was done using standard IP connections, running a simple Visual Basic program within an HTML document via the Microsoft ActiveX technology. When a server comes back online following a failure, is there any human intervention required to get it back "up and running", or is the heartbeat enough for the other server to include it once again? No manual intervention is required. When a server running Microsoft Cluster Server, say "Server A", boots, it starts the MSCS service automatically. MSCS in turn checks the interconnect (and network if necessary) to find the other server in its cluster, say "Server B". If Server A finds Server B, then Server A re-joins the cluster and Server B updates it with current cluster status info. Server A then initiates "failback", moving back failed-over workload from Server B to Server A at an appropriate time. What is "failback", and how does it work in MSCS? "Failback" is the ability to automatically re-balance the workload in a cluster when a failed server comes back online. This is a standard feature of MSCS. For example, say "Server A" has crashed and its workload failed-over to "Server B". When Server A re-boots, it automatically finds Server B and re-joins the cluster. It then checks to see if any of the cluster groups running on Server B would "prefer" to be running on Server A. If so, it automatically moves those groups from Server B to Server A as soon as the time is right. Failback properties -- that is, which groups can failback, which is their preferred server, and during what hours the time is "right" for failback -- are all set from the cluster administration console. Can the servers in an MSCS cluster be located at separate locations for recovery from site disasters? Not at this time. All of the cluster configurations currently being considered for validation use SCSI connections to storage resources, which limits the distance between clustered servers to the distance supported by standard SCSI. This is typically no more than 25 meters, though there are SCSI extender technologies that can potentially stretch the connection up to 1,000 meters. Note that Windows NT Server customers already have several choices for software that can mirror data to remote disaster recovery sites, including solutions from N.S.I.®, Octopus®, Veritas®, and Vinca®. Most of these vendors have already announced that their disaster site mirroring solutions will also work with MSCS clusters. Can MSCS restore registry keys for an application from one server to the other when doing failover? Yes. Recovery of an application's registry information is a configurable feature that is available to the Generic Application and Generic Service resource types. Basically, you tell it what registry keys to log and recover, and that’s all there is to it. This capability should be used if the application or service stores volatile information in specific registry keys. If this is done, when the resource comes online on another node, it will have the same registry information as the previously online resource. When an application re-starts on another server following a failure, does it re-start from a copy of the application? No. The new server (say, "Server 2") would start the application from the same physical disks as Server 1, since ownership of the application's disks on the shared SCSI bus had been moved from Server 1 to Server 2 as one of the first steps in the failover process. This approach assures that the application always re-starts from its last known state, as recorded on its disk drives (and, if you use the available option, as recorded in its registry keys.) Can MSCS restore an application's "state" at the time of its failure rather than requiring a complete restart? MSCS can restore the state of an application's registry keys, but any other state information must be managed and restored by the application. Applications need to provide some model for persistence to insure that state can be recaptured. For example, Microsoft SQL Server uses transaction logs to provide this assurance. If a server running Microsoft SQL Server crashes, upon restart the application uses its transaction logs to bring the database back to a known state. With a cluster, just as with a single server, good application design and the use of ACID (Atomic, Consistent, Isolated, and Durable) transaction properties are important. What is the granularity of resource failover? MSCS supports failover of "virtual servers", which usually correspond to applications, web sites, print queues, or file shares (including their disk spindles, files, IP addresses, etc.). MSCS also provides cluster-wide services that are simultaneously available on all servers in the cluster, including cluster administration, performance monitoring, event viewing, a cluster name, and cluster time synchronization. What is a "quorum disk" and how does it help MSCS provide high availability? It's a disk spindle that MSCS uses to determine whether or not another server is up or down. Technically, it's a resource which can only be owned by one server at a time, and for which servers can negotiate for ownership. Negotiating for the quorum drive allows MSCS to avoid "split brain" situations where both servers are active and think the other server is down. (This can happen when, for example, the cluster interconnect is lost and network response time is problematic.) The use of a quorum resource is one of the sophisticated algorithms that Microsoft got by working with pioneers in clustering such as Digital and Tandem. Back to Topic
Index
Manageability How does MSCS improve the manageability of servers? MSCS gives administrators a graphical console from which they can monitor and manage all of the resources in a cluster as if it was a single system. Using the familiar standards of a Microsoft Windows® graphical user interface, an administrator can use the cluster console to:
The ability to graphically move workload from one server to another with only a momentary pause in service (typically less than a minute) means administrators can easily unload servers for planned maintenance without taking important data and applications offline for long periods of time. Does MSCS provide administrators with a "single system image"? Yes. MSCS provides administrators a single graphical console to manage all of the applications and resources in a cluster. The MSCS console presents cluster resources by physical server, and by "virtual server" (or "cluster group".) This allows administrators to centrally manage the cluster as a collection of virtual application-oriented servers, or as a collection of physical resources when appropriate. Can MSCS be remotely managed? Yes. An authorized user can run the MSCS administration console from any Windows NT Workstation or Windows NT Server on the network. In the version of MSCS accompanying Windows NT Server, Enterprise Edition 5.0, the cluster administration console will be a "snap-in" to the Microsoft Management Console, providing scriptable, remoteable access, including access via Internet protocols from a browser. How does MSCS help administrators do "rolling upgrades" of their servers? With MSCS, server administrators no longer have to do all their maintenance within those rare windows of opportunity when no users are online. Instead, they can simply wait until a convenient off-peak time when one of the servers in the cluster has enough horsepower for all of the cluster workload. They then point-and-click to move all the workload onto one server, and they're ready to perform maintenance on the unloaded server. Once the maintenance is complete and tested, they bring that server back online and it automatically re-joins the cluster, ready for work. When convenient, the administrator repeats the process to perform maintenance on the other server in the cluster. This ability to keep applications and data online while performing server maintenance is often referred to as doing "rolling upgrades" to your servers. Will Microsoft support "rolling upgrades" of future server products using MSCS clusters? It is Microsoft's goal to support "rolling upgrades" between releases of Microsoft server software using MSCS clusters. However, we cannot commit to this for all releases of all products. Persistent storage formats must occasionally change to accommodate new capabilities, and changes in persistent storage occasionally require applications to be taken offline while storage or indices are restructured. Microsoft will commit to always provide smooth upgrades between releases of all our products, and we'll use MSCS to provide seamless rolling upgrades whenever possible. Back to Topic
Index
Scalability How will MSCS enhance server scalability? The manageability benefits of the initial version of MSCS will simplify many of the processes currently used to improve scalability, such as upgrading server hardware and installing new versions of applications. A future version of MSCS, "Phase 2", will support clusters containing large numbers of servers, and will provide enhanced abilities that simplify the creation of highly scalable, cluster aware applications. The Microsoft cluster strategy whitepaper said MSCS is already architected for multiple nodes. Has MSCS been tested on multi-node clusters? If so, why is Microsoft waiting to deliver multi-node support? Yes, Microsoft and other vendors have tested MSCS clusters with more than 2 servers. These clusters "work" in that they are stable and the administrator's console provides basic management for the multi-server environment. However, the algorithms and features in the current software must be extended and thoroughly tested on larger clusters before customers can reliably use a multi-node MSCS cluster for production work, or gain enhanced cluster benefits. In addition, Microsoft will have to extend the cluster hardware validation procedures to accommodate the additional requirements of multi-node clusters. Microsoft has architected MSCS for multi-node support in preparation for the coming "Phase 2" version. Today's multi-node tests have proven the architecture is correct. However, there are two key reasons Microsoft is limiting the initial release to 2-server clusters:
How will MSCS help do load balancing? "Load balancing" is the ability to move work from a very busy server to a less-busy server. MSCS will support load balancing in four ways over time:
Should cluster aware applications developed for MSCS use a shared-disk or shared-nothing architecture for greatest scalability? Microsoft recommends a shared-nothing architecture for cluster-aware applications because of its greater scalability potential. With shared-disk applications, copies of the application running on two or more servers in the cluster share concurrent read/write access to a single set of disk files, mediating ownership of the files using a "distributed lock manager" (DLM). A shared-nothing application, on the other hand, avoids the potential bottleneck of shared resources and a DLM by partitioning or replicating the data so that each server in the cluster works primarily with its own data and disk resources. In theory, MSCS can support either type of application. However, Microsoft has no plans at this time to include a DLM in the MSCS cluster services, so vendors would have to develop or license a DLM to implement a shared-disk application on MSCS. Microsoft has chosen to use the shared-nothing architecture for future versions of the BackOffice applications because of that architecture’s greater potential for cluster-enabled scalability. Will MSCS ever have a Distributed Lock Manager (DLM)? Microsoft will not include a distributed lock manager in the first release of MSCS. Enhancements in future releases will be determined based on customer requirements. When will Microsoft offer a parallel version of Microsoft SQL Server that runs on multiple servers at the same time for automatic load balancing and scalability? The next major release after Microsoft SQL Server 7.0 is
planned
to offer cluster-enabled scalability on MSCS clusters. It
will use a
scalable "shared nothing" architecture to spread
a single
database across multiple servers. A whitepaper on the
strategy for
Microsoft SQL Server on clusters can be downloaded from
http://microsoft.com/sql/WhitePapers.htm. Although this is an important direction for
Microsoft SQL
Server, it must be kept in perspective: it will only be
needed by a
small percent of customers. Cluster-enabled scalability
will only be
needed by extremely large enterprise applications which are
(a) too
large to run on a single high-end SMP server (e.g.,
8-processor SMP
with 4GB of RAM), and (b) cannot be partitioned to run on a
distributed network using Microsoft Transaction Server.
What are Microsoft's plans for supporting Distributed
Message
Passing (DMP)?
Distributed Message Passing is one of the intracluster
communications techniques that are planned for Phase 2 of
MSCS.
(Another is I/O shipping.) Applications will be able to
access MSCS
DMP services through extensions to the Cluster API. MSCS in
turn
will host the DMP services over a variety of interconnect
technologies including new low-latency drivers based on the
Virtual
Interface (VI) Architecture. The result will be a standard
infrastructure for supporting a new generation of scalable,
cluster-aware applications.
Application & Service Support
What types of applications and services will benefit
from MSCS
clustering? There are 3 types of server applications that will benefit from MSCS clusters:
What software vendors will offer cluster-aware applications for MSCS? Software vendors that have already announced plans to
offer
products for MSCS clusters include Baan, Cheyenne, Computer
Associates (CA/Unicenter TNG), HP (ClusterView), IBM (DB2),
NetIQ,
Octopus, Oracle (Oracle 7 Failsafe), SAP, Vinca, and, of
course,
Microsoft (Microsoft SQL Server, Enterprise Edition, and
Exchange
Server, Enterprise Edition.) For an up-to-date list of
announced
products that support MSCS, refer to the Microsoft Windows
NT
Server, Enterprise Edition Solutions Directory at http://microsoft.com/ntserver/info/ntsedir.htm.
Will Microsoft validate or logo software products that
work with
MSCS?
Microsoft will not have a validation program for
MSCS-based
software products at first. It is expected that once MSCS
clusters
are deployed in volume and there are sufficient examples of
cluster-aware application products to evaluate, Microsoft
will
extend its Microsoft BackOffice logo program to include, at
a
minimum, validation of support for basic failover operation
on an
MSCS cluster.
What are Microsoft's plans for supporting Microsoft SQL
Server on
MSCS clusters?
Microsoft will support the Enterprise Edition of
Microsoft SQL
Server on MSCS clusters. Microsoft SQL Server, Enterprise
Edition
version 6.5 will provide "active/active" cluster
support
in the second half of 1997 (i.e., both servers can be
running SQL
Server, with each server supporting its own databases).
Microsoft
SQL Server 7.0, currently in beta test, will include
additional
cluster-aware enhancements that provide for faster recovery
in the
event of a server or application failure. The version of
Microsoft
SQL Server that follows Release 7.0 will include new
features for
shared-nothing scalability on MSCS clusters (i.e., a single
database
will be able to span multiple servers).
What are Microsoft's plans for supporting Microsoft
Exchange
Server on MSCS clusters?
Microsoft will support the Enterprise Edition of
Microsoft
Exchange Server on MSCS clusters. The "Osmium"
release of
Exchange Server, Enterprise Edition will provide
"active/passive" failover on an MSCS cluster.
This means
Exchange/E "Osmium" will be able to run on one
server in
the cluster at a time, and MSCS will be able to
automatically
re-start Exchange on the other server following an
application or
server failure. Future versions of Exchange will be
enhanced for
active/active failover (i.e., ability to run Exchange
simultaneously
on both servers.)
Can the standard versions of Microsoft SQL Server 6.5 or
Exchange
Server 5.0 be setup for failover on a cluster using the
"generic application" capability of MSCS?
Technically proficient customers who want to test
Microsoft SQL
Server 6.5 or Exchange Server 5.0 on a cluster may do so
using the
generic application capability of MSCS. However, the setup
can be
complex, and will not be supported by Microsoft support
services.
Therefore, customers should only do so for testing
purposes, not for
production deployments. Microsoft SQL Server, Enterprise
Edition
version 6.5, and the "Osmium" release of Exchange
Server,
Enterprise Edition will feature a simplified cluster setup
procedure, and will be fully supported for failover on MSCS
clusters.
Will Microsoft SNA Server benefit from MSCS?
No, because Microsoft SNA Server already provides a hot
failover
capability independent of MSCS.
Will Microsoft Proxy Server benefit from MSCS?
No, because the current version of Microsoft Proxy
Server has its
own capability for chaining together multiple servers for
high
availability and scalability.
Will Microsoft Systems Management Server benefit from
MSCS?
No, MSCS will not provide high availability for the
current
release of Microsoft Systems Management Server. Microsoft
intends to
provide cluster-enabled high availability for Systems
Management
Server in a future release.
Can MSCS failover an NT Server Directory (Domain)
Controller?
No, because it is already possible to have backup
directory
service controllers for high availability. Servers in an
MSCS
cluster may be either primary or backup directory
controllers for
Windows NT Directory Services.
Can MSCS failover a WINS (Windows Internet Name Service)
server?
No, because it is already possible to have backup WINS
servers
for high availability.
Can MSCS failover Remote Access Services (RAS)?
Remote Access Services cannot benefit from MSCS at this
time
since there is no standard method for doing software
failover of
modem connections. For higher reliability of dial-up
connections,
you can use the RAS Multi-Link capability first introduced
in
Windows NT Server 4.0.
Can MSCS failover Microsoft Distributed File System
(Dfs)
directories?
Not in Windows NT Server, Enterprise Edition 4.0. The
version of
Dfs in Windows NT Server 5.0 will provide directory
replication for
fault tolerance. When used on the Enterprise Edition of
Windows NT
Server 5.0, Dfs will also work with MSCS failover for fast
recovery
from server crashes.
What versions of Oracle will benefit from MSCS clusters?
Oracle has announced that Oracle Failsafe 2.0 will be
available
at no extra cost with Oracle 7 databases. It provides
"active/passive" database failover on MSCS
clusters (i.e.,
can run on one server at a time, and failover to the other
server in
the event of an application or server failure). For more
information, refer to Tandem NonStop SQL/MX uses MSCS clustering services when
running
on a two-server cluster. NonStop SQL/MX uses its own
single-application clustering services when running on a
cluster
with more than two servers. Customers who want high
availability
plus database scalability up to the performance provided by
two
high-end SMP servers, will benefit by running NonStop
SQL/MX on MSCS
to gain the additional benefits of high availability for
other
services and applications on the cluster. Customers who
require
additional scalability would use the built-in
single-application
cluster services of NonStop SQL/MX, trading off general
availability
services for the ability to scale on more than two servers.
Hardware Validation How is MSCS cluster hardware validated?
Complete cluster configurations (i.e., 2 servers, a
storage
solution, and an interconnect) are tested and validated
using an
MSCS Cluster Hardware Compatibility Test that will be
available for
download from the Microsoft web site when MSCS releases.
Anyone with
an appropriate lab setup can run the test. The test
procedure takes
at least 2 weeks, and one-half of a full-time-equivalent
Microsoft
Certified Professional. The result of a successful test is
an
encrypted file that is returned to Microsoft. Upon
validation of the
test results, Microsoft will post the tested configuration
on a
Cluster Hardware Compatibility List on its web site.
Are there restrictions on who can validate
configurations, or to
how many configurations they can validate for MSCS?
There is no limit to the number of cluster
configurations anyone
can validate once Microsoft Cluster Server is released. The
Cluster
Hardware Compatibility Test will be available for download
from the
Microsoft web site. Anyone with the expertise and proper
lab setup
will be able to download the test, run it, and submit the
encrypted
results file to Microsoft. Once Microsoft validates the
results, the
validated configuration will be added to the Cluster
Hardware
Compatibility List on the Microsoft web site. Because of
the lab
setup, personnel, and time required to validate a cluster
configuration, it is likely that system vendors, component
vendors,
and system integration firms will primarily do validations.
Where will Microsoft post the Hardware Compatibility
List (HCL)
for MSCS?
The Cluster HCL will be posted when MSCS releases. It
will be
found from the Windows NT Server web site at Where will Microsoft post the Cluster Hardware
Compatibility Test
(HCT) for MSCS?
The Cluster HCT will be posted when MSCS releases. It
will be
found from the Microsoft web site at What are the general requirements for MSCS cluster
hardware?
The most important criteria for MSCS hardware is that it
be
listed on the Microsoft Cluster Hardware Compatibility
List,
indicating it has passed the MSCS Cluster Hardware
Compatibility
Test. Microsoft will only support MSCS when used on a
validated
cluster configuration. Validation is only available for
complete
configurations that were tested together, not on individual
components.
A cluster configuration is composed of 2 servers,
storage, and
networking. Here are the general requirements for MSCS
cluster
hardware for Windows NT Server, Enterprise Edition 4.0:
Servers
Storage
Network
Servers What system vendors will offer MSCS cluster
configurations? All of the following system vendors have announced plans
to offer
MSCS-based clusters: Amdahl®, Compaq®, Data
General®,
Dell®, Digital Equipment Corporation®,
Fujitsu®,
Hitachi®, Hewlett-Packard®, IBM®, NCR®,
Olivetti®, Siemens Nixdorf®, Stratus®,
Tandem®, and
Unisys®. Prior to the release of MSCS, a list of
hardware vendor
announcements relative to MSCS clusters can be found in the
Microsoft Windows NT Server, Enterprise Edition Solutions
Directory
at http://microsoft.com/ntserver/info/ntsedir.htm. Following the release of MSCS in Windows NT
Server,
Enterprise Edition, the list of supported cluster
configurations
will be in the Cluster Hardware Compatibility List found
from the
Microsoft Windows NT Server web site at http://www.microsoft.com/ntserver/info/hwcompatibility.htm.
Is it necessary that both servers within a cluster be
identical?
The Cluster Hardware Compatibility Test does not require
that
both servers in a validated configuration be identical.
MSCS runs on
Windows NT Server, Enterprise Edition so a validated MSCS
cluster
can potentially contain any two servers that are validated
to run
that version of Windows NT. (One exception: you cannot mix
Alpha and
Intel Architecture processors in the same cluster.) Note
that MSCS
hardware validation will apply to a complete cluster
configuration
– 2 servers, an interconnect, and a storage solution
–
so it is unlikely that system vendors will validate
clusters
containing servers from more than one system manufacturer.
However,
it is conceivable that system integrators or component
vendors might
validate mixed-vendor clusters in response to customer
demand.
Will MSCS run on our existing servers?
This depends on whether or not your existing servers
have been
validated within a complete cluster configuration. There
will be a
hardware validation process for MSCS clusters, just as
there is for
other Microsoft system software. An MSCS validation will
test a
complete cluster configuration, including specific models
of
servers, storage systems, and cluster interconnect.
Customers
concerned about whether servers they buy today will work in
MSCS
clusters in the future should question their hardware
vendor about
the vendor’s plans to validate MSCS cluster
configurations.
Do you expect customers to implement clusters on their
existing
equipment?
This is potentially possible, and could eventually
become quite
common, but most of the initial customers will probably
acquire new
cluster systems. The process for MSCS will validate
complete cluster
configurations – i.e., servers, storage, interconnect
–
not just individual components. Thus, if customers are
already using
selected servers and/or storage subsystems that have been
validated
within a complete MSCS cluster configuration, then they
would be
able to implement a cluster with those components by adding
the rest
of the hardware included in the validated configuration.
Storage What storage connection techniques will MSCS support?
MSCS is architected to work with standard Windows NT Server storage drivers, so it can potentially support any of the current or anticipated storage interconnections available through Win32 or Windows Driver Model. However, all of the cluster configurations currently being considered for MSCS validation use standard PCI-based SCSI connections (including SCSI over fiber.) Will MSCS support fiber disk connections in addition to SCSI? Yes, once there are standard fiber disk drivers for Windows NT Server. In reality this doesn't fundamentally change the way MSCS uses disks. Fiber connections will still be using SCSI devices, but they will be hosted on a Fibre Channel bus instead of a SCSI bus. Conceptually, this is encapsulating the SCSI commands within Fibre Channel. Therefore, the SCSI commands upon which MSCS relies (Reserve/Release and Bus Reset) will still function as they do over standard (i.e., non-fiber) SCSI. Does MSCS prefer one type of SCSI signaling over the other (i.e., differential versus single-ended)? MSCS works best with differential SCSI with the 'Y' cables. The termination should be outside the systems so that losing power in the system does not cause the termination on the SCSI bus to be lost. Also, note that good drives in good electrical/mechanical enclosures make this work better as well. Will MSCS support RAID on disks in a cluster? Yes. Hardware RAID may be used to protect disks connected to the shared multi-initiator SCSI bus. Other disks in the cluster may be protected by either hardware RAID or by the built-in software RAID ("FTDISK") capability of Windows NT Server. Why doesn't MSCS support Windows NT Server software RAID ("FTDISK") for disks connected to the shared SCSI bus? The current FTDISK capability in Windows NT Server provides excellent, cost-effective protection of disks connected to a single server. However, its architecture is not well suited to some situations that can occur when doing failover of disk resources connected to two servers via multi-initiator SCSI. Microsoft plans to enhance FTDISK in a future release to address this issue. In the meantime, disks connected to a Windows NT Server machine via multi-initiator SCSI can be fully protected by widely available hardware RAID. Which hardware RAID devices will MSCS support? Support for any particular RAID device will depend on its inclusion in a validated cluster configuration. Will MSCS support PCI RAID controllers? Selected PCI RAID controllers may be validated within an MSCS cluster configuration. Some of these controllers store information about the state of the array on the card -- not on the drives themselves -- so it's possible that the cards in the two servers might not be in synch at the moment a failover occurs. For this reason, RAID controllers that store information in the controller will not work with MSCS. MSCS will only be validated with RAID solutions that store the meta-data for RAID sets on the disks themselves so that it is independent of the controllers. Are there any plans to support a shared solid state drive? No shared solid state drives have yet been tested, but there is nothing that would preclude their use. As long as the SCSI 2 reserve/release and bus reset functions are available, these devices should work with MSCS. Is it possible to add hard drives to an MSCS cluster without rebooting? It depends on whether the drive cabinet supports this, since Windows NT will not do so until the Windows NT 5.0 release. There are examples of RAID cabinets validated for Windows NT that support changing volumes on the fly (with RAID parity.) Back to Topic
Index
Interconnect What is a cluster "interconnect"? It is recommended that MSCS clusters have a private network between the servers in the cluster. This private network is generally called an "interconnect", or a "system area network" (SAN). The interconnect is used for cluster-related communications. Carrying this communication over a private network provides dependable response time, which can enhance cluster performance. It also enhances reliability by providing an alternate communication path between the servers. This assures MSCS services will continue to function even if one of the servers in the cluster loses its network connections. What type of information is carried over the cluster interconnect? The interconnect in an MSCS cluster will potentially carry the following five types of information:
Can a cluster have more than one interconnect? An MSCS cluster can only have a single private network, but MSCS will automatically revert to a public network connection for heartbeat and other cluster communications should it ever lose the heartbeat over the interconnect. Also, note that some vendors offer high-performance interconnect products that include redundant paths for fault tolerance. What type of network is required for an MSCS cluster interconnect? A validated MSCS cluster configuration can use as its interconnect virtually any network technology that is validated for Windows NT Server. This includes, for example, 10BaseT ethernet, 100BaseT ethernet, and specialized interconnect technologies such as Tandem® ServerNet®. When is it necessary to have a high performance interconnect such as 100BaseT Ethernet or Tandem ServerNet? Interconnect performance can potentially affect cluster performance under two scenarios: (1) the cluster is running thousands of cluster groups and/or resources, or (2) the cluster is running a scalable, cluster-aware application that uses the interconnect to transfer high volumes of transactions and/or data. In either of these cases, customers should choose a cluster configuration with a higher-speed interconnect such as 100BaseT, or Tandem ServerNet. Cluster-aware applications that use MSCS to achieve very high levels of scalability will most likely become common in the MSCS "Phase 2" timeframe. Thus higher-speed interconnects are likely to become more important in larger, Phase 2 clusters. There has been a lot of talk about "man in the middle" and "replay" attacks on machines connected across the Internet. Will MSCS clusters be vulnerable to this same type of attack if someone illegally connects to the interconnect between the servers? No. MSCS employs packet signing for intracluster communications to protect against replay attacks. When will MSCS support interconnects based on the Virtual Interface Architecture? Microsoft expects to support interconnects based on the VI Architecture specification in Phase 2 of MSCS, which is scheduled for beta test in 1998. Back to Topic Index
Networking Will MSCS support the failover of IP addresses? Yes. Will MSCS support other network protocols such as IPX? No other protocols are planned at this time. How does MSCS do IP failover? MSCS has the ability to failover (move) an IP address from one cluster node to another. The ability to failover an IP address depends on two things: 1) support for dynamic registration and deregistration of IP addresses, and 2) the ability to update the physical network address translation caches of other systems attached to the subnet on which an address is registered. Dynamic address (de)registration is already implemented in Windows NT Server to support leasing IP addresses using the Dynamic Host Configuration Protocol (DHCP). To bring an IP Address resource online, the MSCS software issues a command to the TCP/IP driver to register the specified address. A similar command exists to deregister an address when the corresponding MSCS resource is taken offline. The procedure for updating the address translation caches of other systems on a LAN is contained in the Address Resolution Protocol (ARP) procedure, which is implemented by Windows NT Server. ARP is an IETF standard, RFC 826. RFC 826 can be obtained on the Internet from ftp://ds.internic.net/rfc/rfc826.txt. How does MSCS update router tables when doing IP failover? As part of its automatic recovery procedures, MSCS will issue IETF standard ARP "flush" commands to routers to flush the machine addresses (MACs) related to IP addresses that are being moved to a different server. How does the Address Resolution Protocol (ARP) cause systems on a LAN to update their tables that translate IP addresses to physical machine (MAC) addresses? The ARP specification states that all systems receiving an ARP request must update their physical address mapping for the source of the request. (The source IP address and physical network address are contained in the request.) As part of the IP address registration process, the Windows NT TCP/IP driver broadcasts an ARP request on the appropriate LAN several times. This request asks the owner of the specified IP address to respond with its physical network address. By issuing a request for the IP address being registered, Windows NT Server can detect IP address conflicts; if a response is received, the address cannot be safely used. When it issues this request, though, Windows NT Server specifies the IP address being registered as the source of the request. Thus, all systems on the network will update their ARP cache entries for the specified address, and the registering system becomes the new owner of the address. Note that if an address conflict does occur, the responding system can send out another ARP request for the same address, forcing the other systems on the subnet to update their caches again. Windows NT Server does this when it detects a conflict with an address that it has successfully registered. MSCS uses ARP broadcasts to re-set MAC addresses, but ARP broadcasts don't pass routers. So what about clients behind the routers? If the clients were behind routers, they would be using the router(s) to access the subnet where the MSCS servers were located. Accordingly, the clients would use their router (gateway) to pass the packets to the routers through whatever route (OSPF, RIP, etc) is designated. The end result is that their packet is forwarded to a router on the same subnet as the MSCS cluster. This router's ARP cache is consistent with the MAC address(es) that have been modified during a failover. Packets thereby get to the correct Virtual server, without the remote clients ever having seen the original ARP broadcast. Can an MSCS cluster be connected to different IP subnets? (This is possible with a single Windows NT server, even with a single NIC, by binding different IP addresses to the NIC and by letting Windows NT Server route between them.) For example, can MSCS support the following configuration:
Yes, MSCS permits servers in a cluster to be connected
to
multiple subnets. MSCS supports physical multi-homing no
differently
than Windows NT Server does. The scenario shown in the
picture above
is perfectly acceptable. The two external subnets (1&2)
could
connect the same clients (redundant fabrics) or two
different sets
of clients. In this scenario, one of the external subnets
(#1 or #2)
would also have to be a backup for intracluster
communication (i.e.,
backup the private subnet #3), in order to eliminate all
single
points of failure that could split the cluster.
Note that MSCS will not support a slightly
different
scenario: NodeA on Subnet1, NodeB on Subnet2, with Subnet1
&
Subnet2 connected by a router. This is because there is no
way for
MSCS to failover an IP address resource between two
different
subnets.
Can MSCS use a second Network Interface Card (NIC) as a
hot
backup to a primary NIC?
MSCS can only do this for the cluster interconnect. That
is, it
provides the ability to use an alternate network for the
cluster
interconnect if the primary network fails. This eliminates
an
interconnect NIC from being a single point of failure.
There are
vendors who offer fault tolerant NICs for Windows NT
Server, and
these can be used for the NICs that connect the servers to
the
client network.
How do you specify to MSCS which NIC to use for the
interconnect,
and which NIC(s) to use as backup interconnects?
The MSCS setup allows administrators to specify the
exact role
that a NIC provides to the cluster. There are three
possible roles
for each NIC in a cluster:
The typical MSCS cluster will have one NIC on each
server
designated for internal communications (cluster only), and
one or
more other NICs designated for all communications (cluster
and
client.) In that case, the cluster-only NIC is the primary
interconnect, and the "all communications" NIC(s)
server
as backup interconnects if the primary ever fails.
Examples of client-only NICs include a LAN/WAN/Internet
connection where it would be ineffective/impolite to do
heartbeats
and cluster traffic.
Can MSCS work with "smart switches" that
maintain a
1-to-1 mapping of MAC addresses to IP addresses? These
switches are
quite common in VLAN configurations in which the level 2
network
fabric uses level 3 address information for switching
packets. These
switches only cache one IP address for each MAC address.
Such
layering "violation" allows switch vendors to do
better
lookups and use existing routing protocols to distribute
host routes
plus MAC addresses. Will MSCS be continually forcing these
devices
to flush and reset their MAC-to-IP maps due to its use of
multiple
IPs per MAC, plus the ARP flushes when doing IP failover?
MSCS can work with these switches, but it might affect
their
performance. If customers experience this problem, there
are two
possible solutions: (1) have a router sit between the
cluster and
the switch, or (2) disable the "smarts" on the
smart
switches.
Software Licensing How will Microsoft license MSCS? MSCS is a built-in feature of Windows NT Server, Enterprise Edition (Windows NT Server/E), so customers must license Windows NT Server/E for both servers in a cluster. Are Client Access Licenses required for accessing an MSCS cluster? The question of whether a Client Access License (CAL) is required is unaffected by whether a server is standalone or in an MSCS cluster. For example, the standard Microsoft End User License Agreement for Windows NT Server requires a CAL for each client that access the shared file services of Windows NT Server. This is true whether the client is accessing a file share on a standalone server, or on an MSCS cluster. Put another way: there is no special CAL requirement related to accessing an MSCS cluster. How will applications be licensed on MSCS clusters? Each application vendor will determine their own licensing policies for applications running on MSCS clusters. Microsoft's current policy for server application licensing will still apply to MSCS clusters: an application must be separately licensed for each server on which it is installed. In an MSCS cluster, if an application is to run on both servers, or even if it only runs on one server at a time but must be installed on both servers to permit failover, then the application must be licensed for both servers. How will Microsoft Client Access Licenses for BackOffice applications be handled on MSCS clusters? If the customer is using "per-seat" Client Access Licenses for the application, then those licenses apply when a client is accessing the application on either server in the cluster. If the customer is using "per-server" (or "concurrent use") Client Access Licenses for the application, then each machine in the cluster should have a sufficient number of per-server Client Access Licenses for the expected peak load of the application on that machine. (Note that "per-server" Client Access Licenses do not "failover" from one machine in the cluster to the other.) Back to Topic
Index
Deployment What support services are available for MSCS? MSCS will be eligible for support from all of Microsoft’s customer support resources, including Enterprise Phone Support, Premier Support Technical Account Managers, and Microsoft Consulting Services. In addition, MSCS customers will be able to acquire training from Microsoft Authorized Training and Education Centers (ATECs), support services from the system vendors providing MSCS-validated cluster configurations, and value-added services from Microsoft Solution Providers that choose to offer MSCS-related services. Will Microsoft extend the Microsoft Certified Professional (MCP) program to include certification of cluster-related skills? Microsoft will not include cluster-related certification in the MCP program in 1997. Cluster-related certification is being considered for future updates to the program. In the two-server cluster configuration, should the second server be a "hot standby", or can the two servers be running separate jobs up until the time when one fails and the other takes over? MSCS provides true "active/active clustering", which means every machine in the cluster is available to do real work, and each machine in the cluster is also available to recover the resources and workload of any other machine in the cluster. Thus, there is no need to have a wasted, idle server standing by waiting for a failure. Of course, a customer might choose to run a light workload or a non-critical function that can be easily pre-empted on one of the machines in an MSCS cluster if they want to make sure there’s sufficient processing power available for recovery of performance-sensitive workload. Besides clustering, what else should be done to provide highly available Windows NT Server services? MSCS complements other high-availability techniques such as data mirroring, RAID disk protection, uninterruptible power supplies, and duplicated hardware such as fans and network interface cards. The availability role of MSCS is to automatically restore user access to data and services following the failure of individual applications or servers. MSCS and other high-availability technology should be used in concert with prudent IT administration procedures for data backup and disaster-site recovery to ensure continuous availability of mission-critical IT resources. Will client software have to be updated to take advantage of an MSCS cluster? No. MSCS does not require any special software on the client for transparent recovery of services that connect to clients via standard IP protocols, such as web sites or Windows file shares. Note that, since server resources and applications can potentially be unavailable for up to a minute or so during MSCS recovery procedures, the client component of a client/server application should ideally be able to gracefully handle pauses in service. However, that characteristic is already common in Microsoft client software, browsers, and most modern packaged applications. Does an application need to be installed separately on both servers in a cluster? Yes, typically each application that is part of a cluster group must be installed separately on both nodes so that it can be started on either node during a failover. Typically, this is done by (1) "failing over" the application's disks on the shared SCSI bus to the first server, (2) installing the application on the first server using those disks for application files, (3) failing the disks over to the other server, and (4) repeating the installation process on the second server, using the same disks. Suppose there are several services running on 1 node (say, IIS, SQL, and Exchange). On the failure of that node, can you setup the cluster so that only 1 service fails over to the 2nd node? Yes. Only the services you setup in the MSCS cluster administration console will failover. If you only setup one service to failover, then the other two will not failover. Should servers in a cluster be directory service (domain) controllers? Domain controllers already have their own high availability backup capability, so there are no additional restrictions or issues related to clusters. For example, without an MSCS cluster:
All of this is true if the servers are in a cluster. MSCS neither adds nor subtracts from the current high availability capabilities of Windows NT Directory Services. Should servers in an MSCS cluster use the Microsoft Distributed File System (Dfs)? All of the distributed services of Windows NT Server – including the Microsoft Distributed File System, NT Directory Services, security services, remote administration, etc. – are important building blocks for creating manageable, secure, easily utilized networks of servers. Servers tightly connected within an MSCS cluster benefit from these distributed services just as do servers loosely connected by a network. How can MSCS help do load balancing between web servers? The two most common techniques used to load-balance between multiple mirrors of a web site are Network Address Translation (NAT) routing, and DNS round-robin routing. Cisco and other vendors sell routers that use NAT as well as some sort of load balancing. A site has one URL and one IP address. If a server goes down, the router sees this and stops sending requests to the web server. This offers good performance and easy manageability, but these NAT routers can be expensive. An easier, less expensive technique is to use simple round-robin DNS to split requests among a number of Web servers that all have the same data on them. A site has one URL, but several IP addresses, and loads are randomly distributed across all of the IP addresses. A problem with round-robing DNS is that, if a server goes down, someone typically has to manually remove the IP address from the DNS round robin list. MSCS can complement round-robin routing by eliminating the need to manually remove failed IP addresses from the round-robin list. You setup an MSCS cluster running IIS on each server with each site's web files on the shared SCSI bus. You synchronize the data between the two sites. If one of the servers fails, the virtual root of the failed machine is transferred to the other server in the cluster along with its IP addresses, so both sites continue to serve customers. And, once the failed server resumes operation, MSCS can automatically "fail back" its virtual root to re-balance the workload. I need to create many file shares. Is there an alternative to doing them one at a time through the MSCS New Resource Wizard? One answer would be to write a resource DLL patterned after the SMB Share sample to manage the shares. It would use API calls to create the shares when coming online and "destroy" the shares when going offline. What are the criteria for running a resource in a separate cluster resource monitor? The tradeoff is extra isolation from application/resource failures, versus more consumption of server resources by MSCS. You should run a resource in a separate resource monitor when testing a new resource DLL. This assures that, if the resource DLL compromises the resource monitor, it won't affect the core cluster services of MSCS. Should the quorum disk be on a separate physical disk? The quorum disk does not have to be on a separate physical disk. You can use the quorum disk for applications, also. However, if you want to allocate a specific volume for this role you can do so. This will, in some cases, marginally improve failover time. Would a shared solid state drive provide higher availability than standard disk drives? Perhaps. Solid state drives reduce the seek and rotational latency that is associated with conventional DASD. This performance can be leveraged by applications to minimize possibilities of data loss by essentially writing through the cache without totally destroying system performance. But even in such a case, there remains the possibility for cable, operator, and other failures that can result in inconsistent data. No matter how quickly the data is written to the media, there is a window of vulnerability. For this reason, applications still need to provide some model for persistence to insure that state can be recaptured. A good example of this is the transaction semantics used by database management systems to maintain the integrity of their on-disk data. Back to Topic
Index
Trouble Shooting When diagnosing problems that appear to be
cluster-related, how
can I determine what is happening in the cluster services?
For problem reporting with the initial release of MSCS, you must use the "cluster log". (Future releases will make greater use of the Windows NT Server Event Monitor.) To turn on the cluster log, you should set an environment variable called in the system environment for your system that sets clusterlog to some path on your system. For example have the environment variable set clusterlog to %windir%\cluster\cluster.log and then reboot. When the cluster service starts, it will log failure reasons and other info in the clusterlog file. That way it will be easier to diagnose the problem. Should the cluster administration console be connected to the cluster name, or to a node name? Connect using the node name instead of the cluster name, as documented in the Cluster administrator's guide. If you connected to the cluster name you would utilize the RPC service to the cluster endpoint mapper. Since this gets failed over, your RPC session for cluster admin has to wait to timeout, which can take a relatively long time. When you connect using the node name, the cluster does not thrash in the event of such a failure. Instead, it simply arbitrates for ownership of the quorum device. After this is settled, one cluster node remains, where the appropriate failover services are running. You can then reconnect the cluster administration console to the surviving server. From CMD shell on one server, if you try to access a drive owned by the other server you get "Incorrect function". Why? MSCS is a "shared nothing" environment, meaning that disk resources are owned by only one server at any point in time. "Incorrect function" is the message you get when trying to do local access to disks that are owned by a different server. How come stopping the server service on either cluster node does not cause failover? It appears that the cluster software is not monitoring the server service but just the local cluster objects directly, not via the server service. MSCS does not explicitly check for the server service, but it does monitor the LanManServer. Therefore, with SMB shares, it will fail these over in the event that the LanManServer service failed or was stopped. If you want MSCS to monitor and restart the ser ver service also, you can easily do so using the admin wizard to set it up as a "generic service". A corporate network failure didn’t cause failover of any resources. The Cluster Admin tool fails with an error dialog stating that the cluster service has stopped. How do the cluster nodes identify when a net failure occurs? This is a case where clustering by itself cannot eliminate every potential single point of failure in a system. Just as highly available clusters should employ hardware RAID to protect against loss of physical disk drives, they should also include dual-path SCSI and redundant NICs to protect against loss of a single SCSI controller or network interface card. How do you move the quorum resource to another disk? This is done in Cluster Admin by selecting the cluster and right clicking. One of the three tabs is Quorum Resource, which allows you to modify this entry. If the heartbeat link is down and both machines are performing quorum, how should the machine that cannot reserve the SCSI bus react? In the normal case, should only the machine that can reserve the SCSI bus survive and the other machine go down? First, both nodes cannot have the quorum resource. However, both nodes can be operating in the cluster if one node has the quorum resource and the second node joins the cluster. When a partition is discovered, both nodes arbitrate for the quorum resource. One node wins the arbitration (if they are still partitioned) and the other node loses. The loser shuts down the cluster service, the winner fails over all groups and continues to operate. What's the recommended procedure if you want to run CHKDSK on a disk connected to the shared SCSI bus of a cluster? CLUSSVC has start options where the service can be started without quorum logging. This is either at a command prompt or from the service panel, with the -noquorumlogging option. At that point, the storage devices on the shared SCSI bus can be checkdsk'd. Back to Topic
Index
Developer Issues Customers and software vendors are interested in
developing DLL's
to make applications "cluster aware". Is there
any
documentation, sample code, etc. to assist them in the
process? Yes, there is a Software Development Kit (SDK) for MSCS. The MSCS SDK has an SMB file share example DLL (with code). Developers can take this as a template and fill in their own application specific code in the specific routines (Online, Offline, Is Alive, Looks Alive, etc.) How will Microsoft distribute the MSCS Software Development Kit (SDK)? During the beta program, the beta SDK for MSCS will be distributed on the Beta CD that accompanies Microsoft Developers Network (MSDN) Level III, and on the Beta Evaluation CD provided to organizational customers who have executed Microsoft Select volume licensing agreements. Following release of Windows NT Server/E, the MSCS SDK will be distributed in the Platform SDK via MSDN Level III. MSCS SDK documentation says, "Registry replication is a configurable feature that is available to the Generic Application and Generic Service resource types. Basically, you tell it what registry key to watch/replicate and that’s all there is to it. If the application/service stores volatile information in a specific registry key, then the key should be declared in the properties section of the resource so that it may be replicated. If this is done, when the resource comes online on another node, it will have the same registry information as the previously online resource. Application/service registry keys, by default, are not replicated or stored within the cluster database." Why should anyone use the cluster API's to write registry keys to the cluster database? When should one use one over the other? If you're just going to use a generic application resource, then you should just use registry checkpointing. However, using the generic application resource type has some limitations. For example:
Alternatively, you can write a resource DLL for the application. At that point you face additional issues. First of all, if you're talking about user configurable parameters, they should be using private properties associated with the resource type. It gives a common method by which admin tools can query and set the parameters for a given resource. These property requests are ultimately handled by the resource DLL. That leads to the question of why the resource DLL and application should use the cluster database. Each resource has its own section of the cluster database, as opposed to the general per-application focus of the Windows NT registry. This becomes an issue if you want your resource to be more granular that just your application. For example, if your resource is your database server, you can only run the server on one node at a time. On the other hand, if your resource is databases presented by that server, then you can have the database server running on both nodes. (For example, one node might have a payroll database, while the other will have an orders database.) If one node goes down, the server on the other node can pick up the database that no longer has a host. This is the active/active configuration mentioned above. To do this your settings need to be per-resource, not per application. Also, registry checkpointing is only done when the resource is running. If you make any settings changes via a separate admin tool when the resource isn't online, those changes won't get propagated. With service resources there is an option to have part of the registry entries fail over to the secondary node. Since all file share information is stored in the registry, can this be used as an alternate way to provide file share failover? No. Share information is stored in the registry, but that doesn't mean modifying the registry is the correct way to create shares. One problem would be that you have to reboot for the registry changes to result in the creation of a share. There also remains the problem of what you do when you fail over. If both machines are set up with shares pointing to a drive on the shared bus, one machine is going to have shares referring to a device the machine can't access. What mechanisms are advised with respect to Named Pipes and Semaphores in a cluster application environment for process-to-process communication (for example, registry settings changed on one node of the cluster, how are they updated at the other node, etc.)? Since the main issue is the transfer of inner transactional state information you could use the transacted registry feature of MSCS to get registry information over to the other node in case of a failover or, even better, make your transactions small enough so they can be replayed easily. Use Microsoft Transaction Server to get the best support for your (D)COM objets. The MSCS SDK references the file MSCLUS.DLL. What is this and where is it located? MSCLUS.DLL is the COM interface to the CLUSAPI. Because it is close to completion, it was included in the initial MSCS SDK documentation. However, it was not completed in time to ship with the original release of Windows NT Server, Enterprise Edition 4.0. Microsoft plans to release it via web and MSDN distribution in the 2nd half of 1997. If you have cluster calls in an application, what do you need to do to make your application work in a non-cluster environment as well? Right now you should ensure that you can install on a cluster as well as on a single machine. Note that MSCS does not yet support an application level channel through the cluster. The Cluster SDK gives you an idea of what you can do today to get aware of a cluster and what you can do with it. Back to Topic
Index
Comparison to Other Products
What other solutions are currently available to
facilitate high
availability in a Microsoft Windows NT Server environment?
What are
the major differences between MSCS and these other
solutions? Clustering and high availability software for Windows NT Server is currently available from a variety of vendors including Amdahl, Compaq, Data General, Digital, Fujitsu, IBM, Marathon, NCR, Netframe, N.S.I., Octopus, Stratus, Tandem, Unisys, Veritas, and Vinca. The features, benefits, pricing, and hardware requirements of these products vary considerably. Each has its own unique strengths, and each is currently providing value to satisfied customers. However, these same vendors have also participated in the Open Process design reviews for MSCS, and many have already announced plans to offer MSCS-based clustering solutions. Why? Because enterprise customers want broadly available, cross-platform solutions like MSCS that enhance flexibility, reduce lock-in, expand their choices, and drive competitive pricing. In addition, MSCS is unique in its ability to deliver all of the following benefits:
In 1995 Microsoft announced it had licensed clustering technology from Digital Equipment. In 1996, Microsoft announced it had licensed clustering technology from Tandem. How closely is MSCS related to the Digital and/or Tandem clustering products? Both Digital Clusters for NT Server and Tandem Cluster Availability Solution share many of the key benefits of MSCS including active/active clusters, automatic failover/failback, and graphical administration. However, they are all three different products, written & supported by different vendors. MSCS benefited from the proven clustering technology of both Digital and Tandem. Microsoft developers built on that foundation, adding tight integration with Windows NT Server distributed services, support for industry networking and storage standards, plus a dramatically new level of ease-of-use for administrators and developers. Digital and Tandem, plus other leading system vendors, supported the development of MSCS and will offer solutions based on MSCS so that their customers can benefit from its advances and wide industry support. MSCS provides failover for individual applications and for whole servers. Other high availability solutions only provide failover for servers. What are the implications of this difference? Many of the simpler failover products currently available for Windows NT Server can only recognize and recover from complete server failures. MSCS, on the other hand, is a true clustering solution that can also monitor individual applications and resources. This allows MSCS to automatically recognize and recover from more failure conditions, and provides administrators with greater flexibility in managing the workload within a cluster. Simple failover products monitor a single "heartbeat" per server. MSCS can monitor server heartbeats PLUS up to two different types of heartbeat for each application and resource: a quick "looks alive" heartbeat, plus an optional "is alive" heartbeat that can perform a more extensive check to detect subtle failure conditions. These heartbeats are very efficient and typically have no appreciable impact on cluster performance. However, the person administering a cluster can easily change the polling rate for any of these heartbeats at any time using the MSCS graphical administrator’s console. With the increasing performance of standard server hardware, many customers today are running mixed workloads, rather than having dedicated single-purpose servers. Unlike simple failover products that can only manage entire physical servers, MSCS simplifies the management of mixed workloads with its concept of "cluster groups": a collection of applications and resources that, together, constitute a single business process, or a "virtual server". MSCS lets administrators establish different failover policies and priorities for each cluster group so that mixed workloads are recovered correctly in the event of an application or server failure. MSCS also lets administrators easily adjust server workload within a cluster by moving individual business processes (i.e., cluster groups) between servers with a simple point-and-click action from the graphical MSCS administrator’s console. N.S.I., Octopus, Vinca and some other vendors offer high availability solutions that use mirrored disks rather than shared SCSI like MSCS. What are the criteria customers should use when comparing mirrored-disk solutions to MSCS? When you compare the relative strengths of a mirrored-disk failover solution to a true clustering solution like MSCS, it's obvious that they actually complement each other. The strengths of MSCS for high availability sites are:
The strengths of mirrored-disk failover solutions for live backup and disaster recovery are:
Clusters like MSCS are a preferred solution for providing highly available services in data centers and other mission-critical sites. Mirrored-disk failover solutions can complement clusters by providing live backup of important data plus automatic failover to disaster recovery sites in the event of a total site failure. It is because of these complimentary roles that vendors such as N.S.I., Octopus, and Vinca have already announced plans to offer data mirroring and remote site recovery that works with MSCS clusters. How does a clustering solution like MSCS differ from a "fault tolerant" or "non stop" server? MSCS clusters offer high availability. The term "fault tolerant" is generally used to describe technology that offers a higher level of resilience and recovery. "Fault tolerant" servers typically use a high degree of hardware redundancy plus specialized software to provide near-instantaneous recovery from any single hardware or software fault. Examples of fault tolerant servers include Tandem NonStop and Marathon Endurance 4000 (which is based on Windows NT Server.) These solutions cost significantly more than a clustering solution, because you must pay for redundant hardware that waits idly for a fault from which to recover. Fault tolerant servers are used for applications that support very high value, high rate transactions such as check clearinghouses, Automated Teller Machines (ATMs), or stock exchanges. How does MSCS compare to Marathon Endurance 4000? Both MSCS and Endurance 4000 are designed to provide high reliability for standard Windows NT Server applications running on standard hardware. However, they are optimized for different types of customer applications, and are not competitive alternatives. MSCS is a high-availability clustering product, while Endurance 4000 is what's generally referred to as a "fault tolerant" product. There are basically three differences between these products:
MSCS will be the preferred solution for applications and data that can afford to be unavailable for up to a minute at a time. Marathon Endurance 4000 or other fault-tolerant solutions would be a preferred solution for applications that must sustain very high value, high throughput transactions without pause. Digital VAX clusters would allow multiple nodes to boot from a single node's OS. Does MSCS offer this ability? No, since that approach would compromise the availability and scalability benefits of the shared-nothing architecture used by MSCS. Instead, MSCS will rely on system management tools such as Microsoft System Management Server to automate the installation and maintenance of software on distributed and clustered servers. What is the relationship between Tandem ServerNet=99 and Microsoft Cluster Server? Tandem ServerNet is a high performance, high reliability communications technology that can be used by MSCS as the "interconnect" (i.e. private network) between the servers in a cluster. Microsoft expects Tandem ServerNet to be a popular interconnect choice for high performance clusters, both because of its advanced technology, and because of the number of system vendors which have licensed ServerNet from Tandem. As a convenience to customers, Microsoft will package Tandem's software drivers for ServerNet with the MSCS feature of Windows NT Server, Enterprise Edition. Does Oracle Parallel Server=99 for Windows NT Server use MSCS? No. Oracle Parallel Server (OPS) contains its own clustering services. This means that an OPS cluster only provides high availability for OPS. Back to Topic Index © 1997 Microsoft Corporation. All rights reserved. Terms of Use. |
|
------ =_NextPart_000_01BDE090.6E77EEA0--