Trustworthy – Reliable and Secure – Computing

Dr. Ravishankar K. Iyer and Dr. Zbigniew Kalbarczyk bring expertise and many years of experience in reliable system design, measurement and validation. Our research has been supported by multiple sponsors, including NSF, DARPA, AFRL, DHS, DOE, NASA and industrial partners, e.g., IBM, Sun Microsystems (now Oracle), HP, Motorola, Microsoft, Intel, and Boeing. The following description discusses details on selected research projects, which exemplify our expertise and accomplishments.

(1)   Trusted ILLIAC approach, Application-aware security and reliability. This project explores advances in security and reliability technologies, breakthroughs in compiler analysis, novel modeling and measurement techniques to demonstrate a new paradigm for achieving reliability and security.

A unique feature of Trusted ILLIAC is a unified framework, which dynamically configures hardware via smart compilers to enable rapid deployment of low-cost application-aware engines and processing cores. Supporting OS and middleware facilitate model-driven trust management and oversight in protecting against wide range of attacks and failures. Design and benchmarking tools and well-defined procedures provide configurability of system services exposed to end users. In addition, the configurability of Trusted ILLIAC framework opens an unparallel array of research and implementation opportunities to prototype and test fundamentally new reliability and security technologies.  The infrastructure benefits technology transfer to real-world environments by enabling researchers to collaborate with developers from the government and industry to determine how trustworthy hardware assists and software stacks can be integrated into products.

A multidisciplinary team of faculty, staff, and students (from Information Trust Institute, Coordinate Science Laboratory, ECE and CS Departments) is engaged in building the Trusted ILLIAC prototype. Multiple sponsors contributed to the Trusted ILLIAC project, including NSF, DARPA, the Microelectronics Advanced Research Corporation (MARCO), AMD, Intel, HP, IBM, Xilinx, and Nallatech.

Work in the context of Trusted ILLIAC provides solid foundation for exploring efficient error detection and rapid recovery in emerging computing paradigms such as cloud. Recently we have been involved with our industrial partners (e.g., IBM) and government (e.g., AFRL) in investigating resiliency of virtual environment used in the cloud and techniques for low-cost checkpoint based recovery of guest systems. Towards this goal, we have been studied checkpoint and rollback technique (called VM-μCheckpoint) to be applied to enhance VM availability. VM-μCheckpoint ensures: (i) small overhead by using in-memory checkpoint and in-place recovery of VMs (i.e., recovery of a failed VM in its current context)., (ii) reduces chances for checkpoint corruption due to latent errors, (iii) rapid recovery (within one second) as compared with the stop-and-dump approach provided by VMMs. Initially we designed and implemented VM-μCheckpoint in the Xen VMM. The evaluation results demonstrate that VM-μCheckpoint incurs an average of 6.3% overhead (in terms of program execution time) for 50ms checkpoint intervals when executing the SPEC CINT 2006 benchmark.

(2)   Measurement-based analysis of operational failures and security incidents.  Our experience includes extensive work on analysis of failure/incident data from our industrial partners, e.g., Microsoft, IBM, and Tandem (now HP non-stop division) and universities, e.g., National Center for Supercomputing Applications at University of Illinois.

A data-driven model for analyzing security vulnerabilities. We used an in-depth analysis of the vulnerability reports and the application source code to build a finite state machine model that captures operations involved in exploiting application vulnerabilities and identifies the security checks to foil an attack. A demonstration of practical usefulness of the proposed modeling approach was the discovery of new heap overflow vulnerability in Null HTTPD application.

This work provided basis for a first complete design and implementation (as part of Linux operating system) of memory layout randomization for security. The proposed transparent runtime randomization (TRR) protects system against a large class of attacks that exploit vulnerabilities due to software design and implementation errors such as unchecked buffers. TRR is based on our observation that attacks rely on correctly determining the runtime location of key program objects such as buffers and pointers. Using runtime randomization, TRR makes such calculation virtually impossible. Randomization is achieved by dynamically relocating a program’s stack, heap, libraries, and part of its data structures to a random memory location. TRR is implemented by changing the Linux kernel and dynamic program loader hence it is transparent to applications.  Furthermore, it incurs less than 6% program startup overhead and no runtime overhead. Following this work memory layout randomization has been adopted by the community and today it is a default feature of the Linux operating system.

An important outcome from an in-depth analysis of attacks was first study which showed practicality of new generic category of security threats called non-control-data attacks. At that time known memory corruption attacks followed a similar pattern known as the control-data attack because they alter the target program’s control data (data that are loaded to processor program counter at some point of program execution, e.g., return addresses and function pointers) in order to execute injected malicious code or out-of-context library code (in particular, return-to-library attacks).

Our study provided experimental evidence that non-control-data attacks are realistic and can generally target real-world applications. The target applications included various server implementations for the HTTP, FTP, SSH, and Telnet protocols. The demonstrated attacks exploit buffer overflow, heap corruption, format string, and integer overflow vulnerabilities. All the constructed non-control-data attacks resulted in security compromises as severe as those due to traditional control-data attacks – gaining the privilege of the victim process. Furthermore, the diversity of application data being attacked, including configuration data, user identity data, user input data, and decision-making data, shows that attack patterns can be diversified. These results indicated that attackers can indeed compromise many real-world applications without breaking their control flow integrity.

Measurement-based analysis of data on security incidents. Recently we studied the forensic data on security incidents occurred during the last 5 years at the National Center for Supercomputing Applications at the University of Illinois. An in-depth examination of security incidents shows that it is possible to: (i) characterize incidents (in terms of incident categories, severity, and detection phase), (ii) determine the progression of an attack and (iii) identify alerts (raised by the monitoring tools) responsible for detecting an intruder. Specifically, the results from this initial study indicate that about 27% of incidents are not detected by any alert and 26% of the incidents involved credentials stealing (i.e., the attacker logins to the system using stolen credential of a legitimate user and hence, the intruder becomes an insider). This type of analysis is an essential step in characterizing the detection capabilities of the monitoring system, determining the potential holes in the security monitoring and protection mechanisms and guise the design of techniques for improving system/application resiliency to malicious attacks.

(3)   Methods and tools for automated validation and benchmarking of dependable computing systems. Over the years we have been exploring and designing methods and tool for experimental and formal validation of computing systems and applications.

Experimental Validation. Our research involved a pioneering work on developing experimental methods based on error injection, to evaluate effectiveness of the detection mechanisms and assess the overall application/system resiliency to perturbations. Error injection is a commonly used approach to conduct detailed studies of the complex interactions between error and error-handling mechanisms and assess efficiency of protection mechanisms. The NFTAPE software toolset, created in our research group, provides a controlled environment that supports the insertion of errors to mimic real failure scenarios and enable accurate assessment of the efficiency and effectiveness of error detection and recovery mechanisms. NFTAPE is widely used by academia and industry (e.g., IBM, Huawei, Honeywell, and Motorola) for fault/error injection based assessment of systems and applications including space-borne systems from NASA-JPL, telecommunication applications, operating systems (Linux on Pentium, AIX on PowerPC, Solaris on Sparc).

Formal Validation. Formal reasoning about error sensitivity and the efficiency of error detectors complements experimental (error injection based) methods. Formal validation can uncover possible “corner cases” that may be missed by conventional error injection due to its inherent statistical nature. Our tool, SymPLFIED, uses symbolic execution and model checking to exhaustively reason about the effect of an error on the program. The tool was applied to assess error sensitivity and identify error propagation in a widely deployed aircraft collision avoidance application, tcas. The framework identified errors that lead to a catastrophic outcome in the application, which were not found by random fault-injection in a comparable amount of time. Further evolution of this work was development of SymPLAID, a formal tool for uncovering security vulnerabilities in the application code.

Reliable and Safe Biomedical Monitoring

(4)   This research project spans several domains of computer systems design and incorporates cutting-edge technologies used in embedded system design, reconfigurable hardware systems and General Purpose Graphics Processing Units (GPGPUs).

The overarching goal of the project is to be able to carry out monitoring of vitals of a person in several different ways to facilitate an automated decision making process regarding the health status of the individual. In particular, we are focusing on two scenarios -

  • The person’s motion is unrestricted and is fitted with a device that is capable of carrying out the necessary monitoring and decision-making process
  • The person is stationary and his vitals are recorded and analyzed.

The two scenarios described here require two different types of devices with different processing abilities to achieve the same goal.  The first is suitable for cyber-physical system or embedded system while the second can take advantage of the stationarity of the individual and utilize more powerful computer to perform the analysis.

Several different areas of health monitoring are being focused on as part of the project. The main ones are:

  1. Vitals / Biomedical SignalsSeveral different vitals can be monitoring simultaneously in real-time to achieve the goals mentioned earlier. The underlying question here is – which signals have to be monitored to achieve the end goal with minimal overhead.
  2. AlgorithmsThe quality of the decision making process lies in the ability of the underlying algorithm to utilize the information inherent in the signals to make coherent and reliable decisions. These algorithms also have to be tailored to suit the underlying processing elements.
  3. SensorsOnce the signals to be monitored have been identified, it is important to use appropriate sensors to collect the same. Cheaper commercial ones and custom built sensors are being focused on.
  4. Processing PlatformThis forms one of the major components of the system architecture. Several different platforms are being focused on using commercial-off-the-shelf components, reconfigurable hardware and GPGPUs.

As part of the project, a first generation prototype device was built using microcontrollers and sensors to collect brain activity signals, blood oxygen levels, heart rate and motion information of the person. The device is capable of accurately detection a sub-class of abnormal brain activity called seizures. A new generation device will incorporate the same principles but using more flexible hardware which allows for custom configuration and ease of final algorithmic implementation. Novel algorithms for detection of various kinds of abnormalities are being focused on.

The results and experimental work from all these studies have been published in international conferences e.g., IEEE Conference of Dependable Systems and Networks (DSN), Symposium on Reliable and Distributed Systems (SRDS), and IFIP Information Security Conference (SEC), or included in books which address issues of computing system benchmarking or protection against accidental errors and malicious attacks. Publications reporting our work on methods for formal validation (SymPLFIED tool) and modeling stream processing applications for dependability evaluation won prestigious Best Paper Awards at Conference of Dependable Systems and Networks in 2008 and 2011, respectively.

Currently we participate in several research projects that focus on design and validation of new generation high availability secure computing systems, which exploit emerging technologies, such as resource virtualization and multicore processor architectures. Specific projects include: (i) NSF-funded research on data-driven analysis of security attacks in large scale systems, (ii) DOE and DHS supported project on trustworthy cyber infrastructure for the power grid, (iii) AFRL-funded center for excellence on assured cloud computing, (iv) Boeing supported project on cyber situational awareness and network defense, and (v) IBM sponsored work on development of methods and tools for error characterization of high-availability operating systems.