Volume 7, Number 1 - Grid Computing
MCNC Grid Computing & Networking Services, a member of the MCNC non-profit independent family of companies located in North Carolina's Research Triangle Park, and the University of North Carolina's 16-campus system are grid-enabling the existing statewide research and education network that interconnects universities throughout North Carolina.
This statewide grid is anticipated to be the first statewide research and education grid in the country. The initiative is viewed as the most ambitious upgrade to the state's computing infrastructure in history and a catalyst for economic development. It will support multiple scientific disciplines in addition to other grid information technology applications, such as administrative and library services.
The initiative emerged from MCNC's development of one of the country's first scientific computing grid networks in 2001, the North Carolina Bioinformatics Grid Test Bed, and has now been expanded to include:
Details of these initiative subsets are provided in the following sections of this article.
Grid computing will benefit urban and rural areas of the state, spanning business, academia and government. It is especially important to smaller institutions that only need computing resources periodically and often cannot afford to invest in new technologies.
Most of the state's high-performance computing resources are at the large research universities surrounding Research Triangle Park _ the University of North Carolina at Chapel Hill, Duke University and N.C. State University. Smaller universities, many in the more rural areas of the state, have historically lacked access to advanced computing resources. By enabling researchers and educators throughout the state to take advantage of computing resources that already exist at the large universities, and enabling all researchers in the state to pool resources and intellectual expertise, resources available to an individual researcher anywhere in the state will vastly increase. The statewide grid will be a catalyst for greater levels of innovation, the creation of more intellectual property and ultimately more businesses started with local entrepreneurial leadership throughout North Carolina.
The North Carolina Research and Education Network (NCREN), established in 1985 through a collaboration of MCNC and the University of North Carolina system, is the backbone infrastructure for the statewide grid.
Operated by MCNC, it has evolved along with the Internet from a
research project to a critical infrastructure for the research and education
community. NCREN is a production-level, IP (Internet Protocol) network
providing advanced communications and Internet services to more than 180
locations, including universities and other government and non-profit
institutions throughout North Carolina. It serves about half a million students,
faculty and staff from the University of North Carolina's 16-campus system,
Duke University, Wake Forest University, and others. The network provides
high-speed Internet service, access to Internet2 and the national research
and education Abilene network, and interactive, near
broadcast-quality video conferencing for distance-learning classes.
Scientific communities, with exponentially increasing storage and resource needs, are driving the development of grid computing frameworks to support the next generation of innovation. Biology and life sciences researchers, historically not heavy users of high-performance computing, are now at the leading edge of this evolution as the sequencing of entire genomes has unlocked a new horizon of opportunity. The availability of massive compilations of genomic and related data is merging biology with information science. Storage and management of these data sets will require systems capable of managing petabytes of data, and analysis and modeling will require high-performance computing capabilities.
With enhanced capabilities to address computational and data requirements, grid computing is an ideal solution to the needs of life sciences researchers. Also, the transparent and seamless access to compute and data storage resources, as if they are located on a single computer, makes grid solutions an even more compelling fit.
MCNC and the North Carolina Biotechnology Center's Genomics and Bioinformatics Consortium, in collaboration with IBM, launched the N.C. BioGrid in 2001 as one of the nation's first grid test beds for computing, data storage and networking resources for life sciences research. The N.C. Bioinformatics Consortium includes more than 80 organizations representing academia, business and industry. Members include the University of North Carolina's 16-campus system, Duke University, Wake Forest University, GlaxoSmithKline Inc., IBM, the Research Triangle Institute, SAS Institute, Biogen, the National Institute of Environmental Health Sciences, and the U.S Environmental Protection Agency.
Work is being conducted to test software for a better understanding of the issues associated with storage, analysis and movement of large bioinformatics data sets in a high-speed networked environment. The objective is to enable participants to share data and computing resources, thus eliminating the need for costly duplication of data sets and computing resources at each institution.
Currently, the test bed involves resources from four organizations _ the University of North Carolina at Chapel Hill, North Carolina State University, Duke University and MCNC.
In planning the BioGrid test bed, a number of key objectives were identified:
Perhaps the biggest challenge for the N.C. BioGrid was to identify the appropriate grid middleware. This led to an evaluation of numerous grid platforms and testing a mix of solutions. A hybrid solution of multiple grid middleware platforms was developed, working with technologies that have a relationship with or roadmap to Open Source Grid Architecture (OSGA) standards. Currently, the following grid platforms are deployed:
Globus Toolkit 2.4 - for core grid functionality such as job scheduling across administrative domains, a resource registry, and a framework for developing grid-aware applications.
Avaki Data Grid 3.0 — to provide a globally available file system using a global namespace.
In addition, we are working with the following supporting technologies:
Platform LSF — for job scheduling on clusters and large SMP servers.
Sun Grid Engine — for job scheduling on clusters.
Sun ONE Directory Server — LDAP infrastructure for managing user accounts.
CHEF - a framework for building grid-aware collaborative portals.
MyProxy - an online repository that enables remote management of grid credentials.
The first prototype application selected for the N.C. BioGrid was NCBI BLAST. This tool is widely used in the bioinformatics research community to search for similarities between candidate proteomic or nucleotide sequences and target genomes.
IBM worked with MCNC and its university partners to develop more sophisticated grid applications. In an Extreme Blue project conducted during 2003, IBM teamed a group of four student interns with mentors to build a grid-enabled interface to the BioPerl libraries to address the computational needs of the Fungal Genomics Lab at N.C. State University. Results that took one to two weeks using a single system are now produced in near real time on the BioGrid. The Fungal Genomics Lab has also integrated one of its clusters with the BioGrid test bed.
In a second example, IBM and MCNC worked with researchers at the University of North Carolina at Chapel Hill to build a grid-enabled drug discovery application that screens candidate chemical compounds for biological activity. This is accomplished by performing a parameter space study to produce a training set to develop a model. The model is then applied to other data. Work that previously took a month is now accomplished in a single day.
To address multiple disciplines beyond the original scope of the N.C. BioGrid test bed, MCNC has reconfigured its compute, storage, data and application resources into a grid architecture _ the MCNC Enterprise Grid.
Biology research is an application for grid, but the test bed infrastructure is now evolving to address multiple research disciplines and applications. As campus infrastructures are moving to grid frameworks, the development of the MCNC Enterprise Grid is a step towards the "grid of grids" concept. As computing and storage clusters evolve into grids, they will be interconnected into larger grids that will cross multiple organizational boundaries (firewalls), such as through the North Carolina statewide grid.
The initial launch of the MCNC Enterprise Grid in October 2003 included two high-performance computing systems:
A combination of direct-attached and network-attached (Network Appliance) disks complement the computer systems with over 10 Terabytes of storage. Gigabit Ethernet (Cisco), Infiniband (Topspin), and Fiber Channel (IBM) comprise the varied technologies used for interconnection and switching between the compute and storage nodes.
In addition to academic use, North Carolina commercial organizations may also use the MCNC Enterprise Grid as a fee-based service for research purposes. Charges are based on a per CPU-Hour basis or negotiated rates for dedicated access. Services include up to 2 gigabytes of home directory space and a selection of software packages.
Researchers are using the grid resources for a variety of tasks, including scientific modeling and analysis. The on-demand utility computing services model allows customers to pay only for what they need, when they need it. The shared resources reduce the requirement for large investments in high performance computing hardware and support staff at businesses and universities.
MCNC's Enterprise Grid supplements the N.C. BioGrid test bed. It is a resource for the development of a new Grid Technology Evaluation Center and the North Carolina statewide grid.
The Grid Technology Evaluation Center (GTEC) is another development that emerged from MCNC's experience gained with the N.C. BioGrid. MCNC is working with commercial industry partners to develop the center, which will further address the challenges associated with moving grid technologies from the research lab and test bed environment to core enterprise infrastructure and of the emerging "Next Generation Internet" that delivers a new generation of digital consumer services.
The GTEC will facilitate, enhance, enable, and expedite the development and deployment of grid computing infrastructure and services through:
-
-
-
GTEC services will include application benchmarking, interoperability verification, systems integration (including integration with legacy systems), and operational training.
As an emerging technology, it will take years before the ubiquitous use of grid computing on MCNC's statewide network is realized. As early grid technology adopters and active participants in standards bodies, MCNC has been able to identify the challenges in deploying, operating, and scaling a grid infrastructure beyond the test bed phase. MCNC Research & Development Institute's grid-related research focuses on filling the gaps in existing grid-ware to address these challenges, as shown in the accompanying illustration.
Some of the research efforts include:
GridIR: A grid-based information retrieval system that provides a scalable framework to uniformly search and retrieve public and private diverse data across the grid while allowing local control on the data. MCNC is also actively involved in the Global Grid Forum (GGF), forming and participating in the GGF GridIR working group to standardize the interfaces for providing information retrieval capability in a grid environment.
GridScope: An effort to build a grid monitoring and tracking tool that presents a logical view of the interactions within and between grid applications and captures the grid interactions as well.
Cluster-on-Demand (COD): A system to enable rapid, automated, on-the-fly partitioning of a physical cluster into multiple independent virtual clusters.
Most organizations today have firewalls around their organizational computer resources to protect their sensitive and proprietary data. Grid topologies span multiple administrative domains with autonomous security mechanisms. Unlike the Internet, the grid allows an outsider complete access to the resource, thus increasing the risk associated with it. The central idea of grid computing to enable sharing of resources across existing organizational and geographical boundaries makes it difficult to use existing security mechanisms such as firewalls on the grid.
Though organizations may be willing to share resources and data with others for collaborative or monetary reasons, information assurance must be guaranteed for participation in a grid environment. It is imperative to bridge the gap between different security mechanisms, while providing local autonomy.
Joint Control of Virtual Organizations (JoVO): JoVO seeks to address difficulty of mapping into virtual organizations some of the typical social and political arrangements that are associated with shared resources that need joint control. The objective of JoVO is to develop a scalable, reliable, distributed identity, authentication and authorization infrastructure to facilitate secure collaborations in grid environments. The perceived JoVO framework will enable multiple parties to form a virtual coalition with jointly agreed and enforceable rules to enable timely information sharing and collaborative processing. The joint control of identity, attributes and access control policy is achieved through the use of threshold-based certification authorities. The framework is a public key infrastructure (PKI) that is both fault and intrusion tolerant.
The technical approach to JoVO is based on MCNC's completed research project funded by the Defense Advanced Research Projects Agency (DARPA).
SITAR: The need to provide information assurances for data and applications necessitates the need for intrusion and fault tolerant security capabilities. SITAR (Scalable Intrusion-Tolerant Architecture) is designed to ensure that critical services and applications remain operational, even while under attack.
SITAR, a completed DARPA- funded project, is an extensible framework that incorporates the fault tolerant concepts of redundancy, diversity, and ballot voting along with adaptive reconfiguration and proactive monitoring for extending fault tolerance to distributed services. Its fault tolerance approach focuses on detecting and mitigating the effects of known and unknown intruder attacks that attempt to interrupt service availability.
High-speed, on-demand, application-initiated provisioning of bandwidth will improve the efficiency and reduce the latency in a grid network. In January, MCNC announced the successful demonstration of an optical network provisioning protocol to enable more efficient computing applications. The demonstration of the Just-in-Time (JIT) protocol for provisioning and managing light path connections in the all-optical Advanced Technology Demonstration Network (ATDnet) in Washington, D.C., confirmed the viability of user-initiated, ultra-fast provisioning of all-optical network connections. The light paths linked host systems at the U.S. Department of Defense's Laboratory for Telecommunications Sciences, the Naval Research Laboratory's Center for Computational Science and the Defense Intelligence Agency.
With JIT, optical connections can be provisioned between sites in a few milliseconds through microelectromechanical switches, and in a few microseconds when faster photonic switches are deployed.
JIT research was partially funded by NASA and supported by
the Advanced Research and Development Activity, a Department of
Defense research and development community.
In developing the N.C. BioGrid, MCNC created the foundation for a statewide grid infrastructure that will support on-demand access to resources and services. The MCNC enterprise grid, which provides core computing and storage resources to researchers, serves as a model for integration with and access to inter-domain and global grids. The GTEC provides a platform for the development and integration of grid infrastructure, applications and services.
Grid represents a fundamental turning of the technology crank and is the next big thing in the evolution of the Internet. MCNC is moving aggressively to accelerate the integration of grid technology into both the fabric of North Carolina's Internet infrastructure and into the fabric of the new economy.
MCNC is a private, independent, non-profit corporation established
in 1980 to advance technology-led economic development and
job creation throughout North Carolina. MCNC Research &
Development Institute develops new technologies through its own initiatives and as
a research partner for private industry and the U.S. government,
conducting advanced and applied research across a broad technology spectrum,
including microsystems, flexible electronics, sensor development, signal
electronics, wireless systems, microfabrication, high-speed secure networks and
grid computing. MCNC Grid Computing & Networking Services delivers
advanced communications resources statewide to more than 180 public and
private institutions. MCNC Ventures provides early-stage funding and assistance
to entrepreneurial start-up companies. The MCNC family of companies is
located in North Carolina's Research Triangle Park. For more information,
please visit www.mcnc.org.
Phil Emer is a senior member of MCNC Grid Computing & Networking Services' Advance Technologies Group technical staff and program director for The Grid Technology Evaluation Center. He has spent nearly 15 years working at the intersections of networking, research and academia. He was the chief architect of the N.C. BioGrid, managing the development of a heterogeneous, multi-institutional, grid test bed that spans MCNC, N.C. State University, Duke University, and the University of North Carolina at Chapel Hill.
Chuck Kesler is a program manager for MCNC's Grid Computing & Networking Services' grid deployment and data center services. He provides technical architecture and project management for MCNC's grid computing and hosting initiatives. His activities have included spearheading the deployment of the North Carolina BioGrid test bed and leading a collaborative grid infrastructure working group that includes representatives from the local university community.
Lavanya Ramakrishnan is a research engineer for MCNC Research & Development Institute. She is currently involved with various grid and security projects, including the development of a security infrastructure for grid applications. She is also involved with designing portal-based interfaces to grid functionality and development, testing and evaluating the use of various grid middleware to be deployed on several grid test beds, including the NASA-funded Virtual Collaborative Center and N.C. BioGrid. She serves as a senior leader for the `Cluster On Demand' project, a collaboration with Duke University funded by the NSF Middleware Initiative, to develop a grid service for dynamic virtual clusters. She is also actively involved in the security working groups at the Global Grid Forum (GGF).
Scott Yates
Director of Corporate Communications
MCNC
P.O. Box 13910
3021 Cornwallis Road
Research Triangle Park, NC 27709
PH : (919)248-1907
FAX : (866)773-5617
E-mail : [email protected]
![]() |
![]() |
![]() |