Newton Program Description
Prepared by Gerald Ragghianti
Last modified October 12, 2011
The Newton HPC Program is a joint initiative between the Office of Research, the Office of Information Technology (OIT), and the departments of the University of Tennessee to establish and support a high-performance research computing environment to take over the labor-intensive tasks associated with building and managing large computing installations. The program offers a flexible computing framework to facilitate work in a variety of research areas. The support staff then leverages this standard computing framework to provide the most effective and efficient support for computationally intensive research.
The Newton high performance compute cluster consists of over 300 Linux compute nodes, 4,200 x86_64 architecture processors, and 8000 GBytes of RAM. The compute cluster uses Infiniband networking to provide high bandwidth (up to 40 Gbit/sec) and low latency message passing between cluster nodes during parallel computations. The Newton Program also operates multiple high-memory compute nodes which can provide 128 GBytes of RAM for large in-memory dataset calculations. The cluster has a theoretical peak performance of 30 Tflop/s. All computing infrastructure is housed in a data center that is actively monitored 24-hours a day and is managed by a team of professional system administrators.
Newton storage resources consist of approximately 30 TBytes of high-availability mass storage and 50 TBytes of high-performance Lustre storage for use by computational jobs. All mass storage on the cluster is backed up nightly to a storage system that is housed in a geographically separate data center, and historical snapshots of the backup data are made available to users for data recovery purposes.
The Newton computing infrastructure is managed using a custom-designed system that supports a high degree of automation in management and monitoring while remaining flexible enough to support new technologies and new computational techniques. The system also supports automatic documentation and accounting of system configuration and changes. The Newton program uses the Grid Engine batch-queue system to allocate cluster processing units to users’ computational jobs. The Grid Engine supports finely grained resource controls that allow the program to make service level guarantees on job turnaround, job throughput, and resource reservations for high-priority projects.
All researchers affiliated with the University of Tennessee are eligible for accounts on the Newton systems. A basic account allows use of computing and storage resources when not in use by higher-priority members of the Newton Program. Through a direct buy-in process, researchers may gain higher priority in using system resources. The program uses buy-in funds to make infrastructure and computing capacity improvements and allocates priority use of the systems in proportion to each buy-in researcher’s financial contribution. In addition, computational resources are available for direct allocation to faculty members by the Office of Research.
(PREVIOUS BOILERPLATE WORDING)
The University of Tennessee is dedicated to providing a secure and robust information technology (IT) infrastructure that supports the university’s mission. The university community is based on principles of honesty, academic integrity, respect for others, and respect for others’ privacy and property. To support this goal, the university has established an IT security strategy that seeks to protect the confidentiality, integrity, and availability of information and information systems by applying a risk-based methodology that aligns with industry-accepted security standards and best practices.
The university’s Information and Computer System Classification Policy (IT0115) establishes categories for information and information systems, based on the expected impact of a loss of confidentiality, integrity, or availability. It also defines the roles and responsibilities of all persons who create, handle, or manage the information. The policy aids in creating controls that limit the unauthorized disclosure, alteration, and destruction of information. This policy is available at http://www.tennessee.edu/policy.
The university’s Acceptable Use Policy (IT0110) (AUP) helps the university protect the confidentiality and integrity of electronic information and the privacy of its users. It governs how university information and information systems may and may not be used. It further requires that all IT security best practices be followed where technically possible. This policy is also available at http://www.tennessee.edu/policy.
The university IT security best practices outline the optional, recommended, and required security controls for information and information systems, based on their assigned classification category. The following is a list of the topics and applications that currently have defined security controls:
- Availability Planning
- Change Management
- Encryption of Stored Data on End User Devices
- Incident Response Process
- Media Sanitization
- Multifunction Devices
- Network Access and Termination
ORNL (from their public access web page)
QUESTIONS: Is this the same thing as JICS?
What is the difference between JICS and NICS?
Extreme Scale System Center
The Extreme Scale System Center’s (ESSC) primary goal is to help enable the best and most productive use possible of emerging peta-/exa-scale high-performance computers. Of particular interest are the systems expected from the DARPA High Productivity Computing Systems (HPCS) program. The ESSC is intended to foster long-term collaborative relationships and interactions between DoD, DOE, DARPA, NRL and ORNL technical staff that will lead to improved and potentially revolutionary approaches to reducing time to solution of extreme-scale computing and computational science problems. The ESSC will support the major thrust areas required to accomplish this goal.
The Computer Science and Mathematics Division (CSM) is ORNL’s premier source of basic and applied research in high-performance computing, applied mathematics, and intelligent systems. Basic and applied research programs are focused on computational sciences, intelligent systems, and information technologies.
Our mission includes working on important national priorities with advanced computing systems, working cooperatively with U.S. Industry to enable efficient, cost-competitive design, and working with universities to enhance science education and scientific awareness. Our researchers are finding new ways to solve problems beyond the reach of most computers and are putting powerful software tools into the hands of students, teachers, government researchers, and industrial scientists.
The Computational Sciences and Engineering Division is a major research division at the Department of Energy’s Oak Ridge National Laboratory. CSED develops and applies creative information technology and modeling and simulation research solutions for National Security and National Energy Infrastructure needs.
The Computational Science and Engineering Division is focused on the following National Research Efforts.
- National Security
- Energy Assurance
Petascale Computing on Jaguar
The National Center for Computational Sciences (NCCS), sponsored by the Department of Energy’s (DOE) Office of Science, manages the 1.64-petaflop Jaguar supercomputer for use by scientists and engineers in solving problems of national and global importance. The new petaflops machine will make it possible to address some of the most challenging scientific problems in areas such as climate modeling, renewable energy, materials science, fusion and combustion. Annually, 80 percent of Jaguar’s resources are allocated through DOE’s Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program, a competitively selected, peer reviewed process open to researchers from universities, industry, government and non-profit organizations.
Through a close, 4-year partnership between ORNL and Cray, Jaguar has delivered state-of-the-art computing capability to scientists and engineers from UT, other national laboratories, and industry. The XT system was initially installed as a 25-teraflop XT3 in 2005. By early 2008 Jaguar was a 263-teraflop Cray XT4, able to solve some of the most challenging problems that could not be solved otherwise. In 2008 Jaguar was expanded with the addition of a 1.4-petaflop Cray XT5. The resulting system has over 181,000 processing cores connected internally with the Cray Seastar2+ network. The XT4 and XT5 parts of Jaguar are combined into a single system using an InfiniBand network that links each piece to the Spider file system.
Throughout its series of upgrades, Jaguar has maintained a consistent programming model for the users. This programming model allows users to continue to evolve their existing codes rather than write new ones. Applications that ran on previous versions of Jaguar can be recompiled, tuned for efficiency, and then run on the new machine.
Jaguar is the most powerful computer system for science with world-leading performance, more than three times the memory of any other computer, and world leading bandwidth to disks and networks. The AMD Opteron processor is a powerful, general purpose processor that uses the X86 instruction set, which has a rich set of applications, compilers, and tools. Jaguar has hundreds of applications that have been ported and run on the Cray XT system, many of which have been scaled up to run on 25,000 to 150,000 cores. Jaguar is ready to take on the most challenging problems for the world.