publications
2023
- Generalized self-cueing real-time attention scheduling with intermittent inspection and image resizingS. Liu, X. Fu, Y. Hu, M. Wigness, P. David, S. Yao, L. Sha, and T. AbdelzaherReal-Time Systems, 2023
This paper proposes a generalized self-cueing real-time attention scheduling framework for DNN-based visual machine perception pipelines on resource-limited embedded platforms. Self-cueing means we identify subframe-level regions of interest in a scene internally by exploiting temporal correlations among successive video frames as opposed to externally via a cueing sensor. One limitation of our original self-cueing-and-inspection strategy (Liu et al. in 28th IEEE real-time and embedded technology and applications symposium (RTAS), 2022b) lies in its lack of computational efficiency under high workloads, like busy traffic scenarios where a large number of objects are identified and separately inspected. We extend the conference publication by integrating image resizing with intermittent inspection and task batching in attention scheduling. The extension enhances the original algorithm by accelerating the processing of large objects by reducing their resolution at the cost of only a negligible degradation in accuracy, thereby achieving a higher overall object inspection throughput. After extracting partial regions around objects of interest, using an optical flow-based tracking algorithm, we allocate computation resources (i.e. DNN inspection) to them in a criticality-aware manner using a generalized batched proportional balancing algorithm (GBPB), to minimize a concept of generalized system uncertainty. It saves computational resources by inspecting low-priority regions intermittently at low frequencies and inspecting large objects at low resolutions. We implement the system on an NVIDIA Jetson Xavier platform and extensively evaluate its performance using a real-world driving dataset from Waymo. The proposed GBPB algorithm consistently outperforms the previous BPB algorithm that only uses intermittent inspection and a set of baselines. The performance gain of GBPB is larger in facing more significant resource constraints (i.e., lower sampling intervals or busy traffic scenarios) because its multi-dimensional scheduling strategy achieves better resource allocation of machine perception. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
- SL1-Simplex: Safe Velocity Regulation of Self-Driving Vehicles in Dynamic and Unforeseen EnvironmentsY. Mao, Y. Gu, N. Hovakimyan, L. Sha, and P. VoulgarisACM Transactions on Cyber-Physical Systems, 2023
This article proposes a novel extension of the Simplex architecture with model switching and model learning to achieve safe velocity regulation of self-driving vehicles in dynamic and unforeseen environments. To guarantee the reliability of autonomous vehicles, an gL1 adaptive controller that compensates for uncertainties and disturbances is employed by the Simplex architecture as a verified high-assurance controller (HAC) to tolerate concurrent software and physical failures. Meanwhile, the safe switching controller is incorporated into the HAC for safe velocity regulation in the dynamic (prepared) environments, through the integration of the traction control system and anti-lock braking system. Due to the high dependence of vehicle dynamics on the driving environments, the HAC leverages the finite-time model learning to timely learn and update the vehicle model for gL1 adaptive controller, when any deviation from the safety envelope or the uncertainty measurement threshold occurs in the unforeseen driving environments. With the integration of gL1 adaptive controller, safe switching controller and finite-time model learning, the vehicle’s angular and longitudinal velocities can asymptotically track the provided references in the dynamic and unforeseen driving environments, while the wheel slips are restricted to safety envelopes to prevent slipping and sliding. Finally, the effectiveness of the proposed Simplex architecture for safe velocity regulation is validated by the AutoRally platform. © 2023 Association for Computing Machinery.
- SchedGuard++: Protecting against Schedule Leaks Using Linux Containers on Multi-Core ProcessorsACM Transactions on Cyber-Physical Systems, 2023
Timing correctness is crucial in a multi-criticality real-time system, such as an autonomous driving system. It has been recently shown that these systems can be vulnerable to timing inference attacks, mainly due to their predictable behavioral patterns. Existing solutions like schedule randomization cannot protect against such attacks, often limited by the system’s real-time nature. This article presents "SchedGuard++": a temporal protection framework for Linux-based real-time systems that protects against posterior schedule-based attacks by preventing untrusted tasks from executing during specific time intervals. SchedGuard++ supports multi-core platforms and is implemented using Linux containers and a customized Linux kernel real-time scheduler. We provide schedulability analysis assuming the Logical Execution Time (LET) paradigm, which enforces I/O predictability. The proposed response time analysis takes into account the interference from trusted and untrusted tasks and the impact of the protection mechanism. We demonstrate the effectiveness of our system using a realistic radio-controlled rover platform. Not only is "SchedGuard++"able to protect against the posterior schedule-based attacks, but it also ensures that the real-time tasks/containers meet their temporal requirements. © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2022
- Real-Time Task Scheduling for Machine Perception in Intelligent Cyber-Physical SystemsIEEE Transactions on Computers, 2022
This paper explores criticality-based real-time scheduling of neural-network-based machine inference pipelines in cyber-physical systems (CPS) to mitigate the effect of algorithmic priority inversion. We specifically focus on the perception subsystem, an important subsystem feeding other components (e.g., planning and control). In general, priority inversion occurs in real-time systems when computations that are of lower priority are performed together with or ahead of those that are of higher priority. In current machine perception software, significant priority inversion occurs because resource allocation to the underlying neural network models does not differentiate between critical and less critical data within a scene. To remedy this problem, in recent work, we proposed an architecture to partition the input data into regions of different criticality, then formulated a utility-based optimization problem to batch and schedule their processing in a manner that maximizes confidence in perception results, subject to criticality-based time constraints. This journal extension matures the work in several directions: (i) We extend confidence maximization to a generalized utility optimization formulation that accounts for criticality in the utility function itself, offering finer-grained control over resource allocation within the perception pipeline; (ii) we further instantiate and compare two different criticality metrics (distance-based and relative velocity-based) to understand their relative advantages; and (iii) we explore the limitations of the approach, specifically how inaccuracies in criticality-based attention cueing affect performance. All experiments are conducted on the NVIDIA Jetson AGX Xavier platform with a real-world driving dataset. © 1968-2012 IEEE.
- Verifiable Obstacle DetectionProceedings - International Symposium on Software Reliability Engineering, ISSRE, 2022
Perception of obstacles remains a critical safety concern for autonomous vehicles. Real-world collisions have shown that the autonomy faults leading to fatal collisions originate from obstacle existence detection. Open source autonomous driving implementations show a perception pipeline with complex interdependent Deep Neural Networks. These networks are not fully verifiable, making them unsuitable for safety-critical tasks. In this work, we present a safety verification of an existing LiDAR based classical obstacle detection algorithm. We establish strict bounds on the capabilities of this obstacle detection algorithm. Given safety standards, such bounds allow for determining LiDAR sensor properties that would reliably satisfy the standards. Such analysis has as yet been unattainable for neural network based perception systems. We provide a rigorous analysis of the obstacle detection system with empirical results based on real-world sensor data. © 2022 IEEE.
- Self-Cueing Real-Time Attention Scheduling in Criticality-Aware Visual Machine PerceptionS. Liu, X. Fu, M. Wigness, P. David, S. Yao, L. Sha, and T. AbdelzaherIEEE Real-Time and Embedded Technology and Applications Symposium, RTAS, 2022
This paper presents a self-cueing real-time frame-work for attention prioritization in AI-enabled visual perception systems that minimizes a notion of state uncertainty. By attention prioritization we refer to inspecting some parts of the scene before others in a criticality-aware fashion. By self-cueing, we refer to not needing external cueing sensors for prioritizing attention, thereby simplifying design. We show that attention prioritization saves resources, thus enabling more efficient and responsive real-time object tracking on resource-limited embedded platforms. The system consists of two components: First, an optical flow-based module decides on the regions to be viewed on a subframe level, as well as their criticality. Second, a novel batched proportional balancing (BPB) scheduling policy decides how to schedule these regions for inspection by a deep neural network (DNN), and how to parallelize execution on the GPU. We implement the system on an NVIDIA Jetson Xavier platform, and empirically demonstrate the superiority of the proposed architecture through an extensive evaluation using a real-word driving dataset. © 2022 IEEE.
- Latency analysis of self-suspending task chainsT. Kloda, J. Chen, A. Bertout, L. Sha, and M. Caccamo2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022, 2022
Many cyber-physical systems are offloading computation-heavy programs to hardware accelerators (e.g., GPU and TPU) to reduce execution time. These applications will self-suspend between offloading data to the accelerators and obtaining the returned results. Previous efforts have shown that self-suspending tasks can cause scheduling anomalies, but none has examined inter-task communication. This paper aims to explore self-suspending tasks’ data chain latency with periodic activation and asynchronous message passing. We first present the cause for suspension-induced delays and worst-case latency analysis. We then propose a rule for utilizing the hardware co-processors to reduce data chain latency and schedulability analysis. Simulation results show that the proposed strategy can improve overall latency while preserving system schedulability. © 2022 EDAA.
- Synergistic Redundancy: Towards Verifiable Safety for Autonomous VehiclesarXiv preprint arXiv:2209.01710, 2022
2021
- An Analyzable Inter-core Communication Framework for High-Performance Multicore Embedded SystemsR. Tabish, J.-Y. Wen, R. Pellizzoni, R. Mancuso, H. Yun, M. Caccamo, and L.R. ShaJournal of Systems Architecture, 2021
Multicore processors provide great average-case performance. However, the use of multicore processors for safety-critical applications can lead to catastrophic consequences because of contention on shared resources. The problem has been well-studied in literature, and solutions such as partitioning of shared resources have been proposed. Strict partitioning of memory resources among cores, however, does not allow intercore communication. This paper proposes a Communication Core Model (CCM) that implements the inter-core communication by bounding the amount of intercore interference in a partitioned multicore system. A system-level perspective of how to realize such CCM along with the implementation details is provided. A formula to derive the WCET of the tasks using CCM is provided. We compare our proposed CCM with Contention-based Communication (CBC), where no private banking is enforced for any core. The analytical approach results using San Diego Vision Benchmark Suite (SD-VBS) for two models indicate that the CCM shows an improvement of up to 65 percent compared to the CBC. Moreover, our experimental results indicate that the measured WCET using SD-VBS is within the bounds calculated using the proposed analysis. © 2021
- Risk Ranked Recall: Collision Safety Metric for Object Detection Systems in Autonomous Vehicles2021 10th Mediterranean Conference on Embedded Computing, MECO 2021, 2021
Commonly used metrics for evaluation of object detection systems (precision, recall, mAP) do not give complete information about their suitability of use in safety critical tasks, like obstacle detection for collision avoidance in Autonomous Vehicles (AV). This work introduces the Risk Ranked Recall (R3) metrics for object detection systems. The R3 metrics categorize objects within three ranks. Ranks are assigned based on an objective cyber-physical model for the risk of collision. Recall is measured for each rank. © 2021 IEEE.
- Robust Vehicle Lane Keeping Control with Networked Proactive AdaptationAmerican Control Conference, 2021
Road condition is an important environmental factor for autonomous vehicle control. A dramatic change in the road condition from the nominal status is a source of uncertainty that can lead to a system failure. Once the vehicle encounters an uncertain environment, such as hitting an ice patch, it is too late to reduce the speed, and the vehicle can lose control. To cope with unforeseen uncertainties in advance, we study a proactive robust adaptive control architecture for autonomous vehicles’ lane-keeping control problems. The data center generates a prior environmental uncertainty estimate by combining weather forecasts and measurements from anonymous vehicles through a spatio-temporal filter. The prior estimate contributes to designing a robust heading controller and nominal longitudinal velocity for proactive adaptation to each new condition. The control parameters are updated based on posterior information fusion with on-board measurements. © 2021 American Automatic Control Council.
- SchedGuard: Protecting against schedule leaks using Linux containersIEEE Real-Time and Embedded Technology and Applications Symposium, RTAS, 2021
Real-time systems have recently been shown to be vulnerable to timing inference attacks, mainly due to their predictable behavioral patterns. Existing solutions such as schedule randomization lack the ability to protect against such attacks, often limited by the system’s real-time nature. This paper presents ’SchedGuard’: a temporal protection framework for Linux-based hard real-time systems that protects against posterior scheduler side-channel attacks by preventing untrusted tasks from executing during specific time segments. SchedGuard is integrated into the Linux kernel using cgroups, making it amenable to use with container frameworks. We demonstrate the effectiveness of our system using a realistic radio-controlled rover platform and synthetically generated workloads. Not only is SchedGuard able to protect against the attacks mentioned above, but it also ensures that the real-time tasks/containers meet their temporal requirements. © 2021 IEEE.
- Checking is Believing: Event-Aware Program Anomaly Detection in Cyber-Physical SystemsL. Cheng, K. Tian, D.D. Yao, L. Sha, and R.A. BeyahIEEE Transactions on Dependable and Secure Computing, 2021
Securing cyber-physical systems (CPS) against malicious attacks is of paramount importance because these attacks may cause irreparable damages to physical systems. Recent studies have revealed that control programs running on CPS devices suffer from both control-oriented attacks (e.g., code-injection or code-reuse attacks) and data-oriented attacks (e.g., non-control data attacks). Unfortunately, existing detection mechanisms are insufficient to detect runtime data-oriented exploits, due to the lack of runtime execution semantics checking. In this work, we propose Orpheus, a new security methodology for defending against data-oriented attacks by enforcing cyber-physical execution semantics. We first present a general method for reasoning cyber-physical execution semantics of a control program (i.e., causal dependencies between the physical context/event and program control flows), including the event identification and dependence analysis. As an instantiation of Orpheus, we then present a new program behavior model, i.e., the event-Aware finite-state automaton (eFSA). eFSA takes advantage of the event-driven nature of CPS control programs and incorporates event checking in anomaly detection. It detects data-oriented exploits if a specific physical event is missing along with the corresponding event dependent state transition. We evaluate our prototype’s performance by conducting case studies under data-oriented attacks. Results show that eFSA can successfully detect different runtime attacks. Our prototype on Raspberry Pi incurs a low overhead, taking 0.0001s for each state transition integrity checking, and 0.063s∼∼ 0.211s for the cyber-physical contextual consistency checking. © 2004-2012 IEEE.
- Safety Constrained Multi-UAV Time Coordination: A Bi-level Control Framework in GPS Denied Environment∗AIAA Aviation and Aeronautics Forum and Exposition, AIAA AVIATION Forum 2021, 2021
Unmanned aerial vehicles (UAVs) suffer from sensor drifts in GPS denied environments, which can cause safety issues. To avoid intolerable sensor drifts while completing the time-critical coordination task for multi-UAV systems, we propose a safety constrained bi-level control framework. The first level is the time-critical coordination level that achieves a consensus of coordination states and provides a virtual target which is a function of the coordination state. The second level is the safety-critical control level that is designed to follow the virtual target while adapting the attacked UAV(s) at a path re-planning level to support resilient state estimation. In particular, the time-critical coordination level framework generates the desired speed and position profile of the virtual target based on the multi-UAV cooperative mission by the proposed consensus protocol algorithm. The safety-critical control level is able to make each UAV follow its assigned path while detecting the attacks, estimating the state resiliently, and driving the UAV(s) outside the effective range of the spoofing device within the escape time. The numerical simulations of a three-UAV system demonstrate the effectiveness of the proposed safety constrained bi-level control framework. © 2021, American Institute of Aeronautics and Astronautics Inc.. All rights reserved.
- Backup Plan Constrained Model Predictive ControlIEEE Conference on Decision and Control, 2021
This article proposes a new safety concept: dynamically formulated backup plan safety. The backup plan safety is defined as the ability to complete one of the alternative missions formulated in real-time in the case of primary mission abortion. To incorporate this new safety concept in control problems, we formulate a feasibility maximization problem that adopts additional (virtual) input horizons toward the alternative missions on top of the input horizon toward the primary mission. Cost functions for the primary and alternative missions construct multiple objectives, and multi-horizon inputs evaluate them. To address the feasibility maximization problem, we develop a multi-horizon multi-objective model predictive path integral control (3M) algorithm. Model predictive path integral control (MPPI) is a sampling-based scheme that can help the proposed algorithm deal with nonlinear dynamic systems and achieve computational efficiency by parallel computation. Simulations of the aerial vehicle control problems demonstrate the new concept of backup plan safety and the performance of the proposed algorithm. © 2021 IEEE.
2020
- A Safety Constrained Control Framework for UAVs in GPS Denied EnvironmentIEEE Conference on Decision and Control, 2020
Unmanned aerial vehicles (UAVs) suffer from sensor drifts in GPS denied environments, which can lead to potentially dangerous situations. To avoid intolerable sensor drifts in the presence of GPS spoofing attacks, we propose a safety constrained control framework that adapts the UAV at a path re-planning level to support resilient state estimation against GPS spoofing attacks. The attack detector is used to detect GPS spoofing attacks and provides a switching criterion between the robust control mode and emergency control mode. An attacker location tracker (ALT) is developed to track the attacker’s location and estimate the spoofing device’s output power by the unscented Kalman filter (UKF) with sliding window outputs. Using the estimates from ALT, we design an escape controller (ESC) based on the model predictive controller (MPC) such that the UAV escapes from the effective range of the spoofing device within the escape time. © 2020 IEEE.
- On Removing Algorithmic Priority Inversion from Mission-critical Machine Inference PipelinesProceedings - Real-Time Systems Symposium, 2020
The paper discusses algorithmic priority inversion in mission-critical machine inference pipelines used in modern neural-network-based cyber-physical applications, and develops a scheduling solution to mitigate its effect. In general, priority inversion occurs in real-time systems when computations that are of lower priority are performed together with or ahead of those that are of higher priority.1 In current machine intelligence software, significant priority inversion occurs on the path from perception to decision-making, where the execution of underlying neural network algorithms does not differentiate between critical and less critical data. We describe a scheduling framework to resolve this problem, and demonstrate that it improves the system’s ability to react to critical inputs, while at the same time reducing platform cost. © 2020 IEEE.
- SCE-Comm: A Real-Time Inter-Core Communication Framework for Strictly Partitioned Multi-core ProcessorsR. Tabish, J.-Y. Wen, R. Pellizzoni, R. Mancuso, H. Yun, M. Caccamo, and L. Sha2020 9th Mediterranean Conference on Embedded Computing, MECO 2020, 2020
Multicore processors provide great average case performance. However, the use of multicore processors for safety-critical applications can lead to catastrophic consequences because of contention on shared resources. The problem has been well-studied in literature and solutions such as partitioning of shared resources have been proposed. Strict partitioning of memory resources among cores, however, does not allow inter-core communication. In this paper, we propose Communication Core Model (CCM) that implements the inter-core communication by bounding the amount of intercore interference in a partitioned multi-core system. A system-level perspective of how to realize such CCM along with the implementation details is provided. We compare our proposed CCM with Contention-based Communication (CBC) model where no private banking is enforced for any core. For evaluation, we consider San Diego vision benchmark suite (SD-VBS). The results of the evaluation show that the CCM offers 56 percent improvement in worst case execution time (WCET) when compared with CBC. © 2020 IEEE.
- UACFinder: Mining Syntactic Carriers of Unspecified Assumptions in Medical Cyber-Physical System Design ModelsZ. Fu, C. Guo, Z. Zhang, S. Ren, and L. ShaACM Transactions on Cyber-Physical Systems, 2020
During the system development process, domain experts and developers often make assumptions about specifications and implementations. However, most of the assumptions being taken for granted by domain experts and developers are too tedious to be documented by them. When these unspecified assumptions are violated in an environment in which the system operates, failures can occur. According to the U.S. Food and Drug Administration (FDA) medical device recall database, medical device recalls caused by software failures are at an all-time high. One major cause of these recalls is violations of unspecified assumptions made in medical systems. Therefore, it is crucial to have tools to automatically identify such unspecified assumptions at an early stage of the systems development process to avoid fatal failures. In this article, we present a tool called Unspecified Assumption Carrier Finder (UACFinder) that uses data mining techniques to automatically identify potential syntactic carriers of unspecified assumptions in system design models. The main idea of this tool is based on the observation we obtained from our earlier analysis of software failures in medical device recalls caused by unspecified assumptions. We observed that unspecified assumptions often exist in medical systems through syntactic carriers, such as constant variables, frequently read/updated variables, and frequently executed action sequences. Therefore, we develop the UACFinder to automatically find these potential unspecified assumption syntactic carriers rather than unspecified assumptions themselves. Once the UACFinder identifies the potential unspecified assumption syntactic carriers, domain experts and developers can validate whether these syntactic carriers indeed carry unspecified assumptions. We use a simplified cardiac arrest treatment scenario as a case study to evaluate the UACFinder in mining potential syntactic carriers of unspecified assumptions. In addition, we invite a medical doctor to validate unspecified assumptions carried by the mined syntactic carriers. The case study demonstrates that the UACFinder is effective in helping to identify potential unspecified assumptions from system design models. © 2020 ACM.
- A framework for supporting the development of verifiably safe medical best practice guideline systemsC. Guo, Z. Fu, Z. Zhang, S. Ren, and L. ShaJournal of Systems Architecture, 2020
Improving safety of patient care is an ultimate objective for medical systems. Though many medical best practice guidelines exist and are in hospital handbooks, they are often lengthy and difficult for medical professionals to remember and apply clinically. Hence, developing safe medical best practice guideline systems is an urgent need. The paper presents a framework to support the development of verifiably safe medical best practice guideline systems. The framework facilitates medical professionals’ participation in computer modeling, clinical validation, formal verification and root cause identification of safety failures at both model and code levels. To implement the framework, our strategies are to maximally utilize existing models/tools designed for validation and verification respectively, but build bridges among different selected models/tools. In particular, we use statechart tool to build statechart models for medical best practice guidelines and use statechart models to interact with medical professionals for clinical validations. The statechart models are then automatically transformed to verifiable models by the framework so that the safety properties can be formally verified. The computer models that are both validated by medical professionals and verified by formal verification tools are then used to generate computer executable code. To improve code level safety, the framework further transforms safety properties specified at the model level to runtime code monitors to ensure that these safety properties are complied at runtime. We use a simplified version of cardiac arrest treatment scenario provided to our team by Carle Foundation Hospital as a case study to evaluate the framework in developing a verifiably safe medical system. © 2019 Elsevier B.V.
- Safety constrained multi-uav time coordination: A bi-level control framework in gps denied environment∗AIAA AVIATION 2020 FORUM, 2020
Unmanned aerial vehicles (UAVs) suffer from sensor drifts in GPS denied environments, which can cause safety issues. To avoid intolerable sensor drifts while completing the time-critical coordination task for multi-UAV systems, we propose a safety constrained bi-level control framework. The first level is the time-critical coordination level that achieves a consensus of coordination states and provides a virtual target which is a function of the coordination state. The second level is the safety-critical control level that is designed to follow the virtual target while adapting the attacked UAV(s) at a path re-planning level to support resilient state estimation. In particular, the time-critical coordination level framework generates the desired speed and position profile of the virtual target based on the multi-UAV cooperative mission by the proposed consensus protocol algorithm. The safety-critical control level is able to make each UAV follow its assigned path while detecting the attacks, estimating the state resiliently, and driving the UAV(s) outside the effective range of the spoofing device within the escape time. The numerical simulations of a three-UAV system demonstrate the effectiveness of the proposed safety constrained bi-level control framework. © 2020, American Institute of Aeronautics and Astronautics Inc, AIAA. All rights reserved.
2019
- Towards resilient UAV: Escape time in GPS denied environment with sensor driftIFAC-PapersOnLine, 2019
This paper considers a resilient state estimation framework for unmanned aerial vehicles (UAVs) that integrates a Kalman filter-like state estimator and an attack detector. When an attack is detected, the state estimator uses only IMU signals as the GPS signals do not contain legitimate information. This limited sensor availability induces a sensor drift problem questioning the reliability of the sensor estimates. We propose a new resilience measure, escape time, as the safe time within which the estimation errors remain in a tolerable region with high probability. This paper analyzes the stability of the proposed resilient estimation framework and quantifies a lower bound for the escape time. Moreover, simulations of the UAV model demonstrate the performance of the proposed framework and provide analytical results. Copyright © 2019. The Authors. Published by Elsevier Ltd. All rights reserved.
- Design Verifiably Correct Model Patterns to Facilitate Modeling Medical Best Practice Guidelines with StatechartsC. Guo, Z. Fu, Z. Zhang, S. Ren, and L. ShaIEEE Internet of Things Journal, 2019
Improving patient care safety is an ultimate objective for medical cyber-physical systems. A recent study shows that the patients’ death rate can be significantly reduced by computerizing medical best practice guidelines. To facilitate the development of computerized medical best practice guidelines, statecharts are often used as a modeling tool because of their high resemblances to disease and treatment models and their capabilities to provide rapid prototyping and simulation for clinical validations. However, some implementations of statecharts, such as Yakindu statecharts, are priority-based and have synchronous execution semantics which makes it difficult to model certain functionalities that are essential in modeling medical guidelines, such as two-way communications and configurable execution orders. Rather than introducing new statechart elements or changing the statechart implementation’s underline semantics, we use existing basic statechart elements to design model patterns for the commonly occurring issues. In particular, we show the design of model patterns for two-way communications and configurable execution orders and formally prove the correctness of these model patterns. We further use a simplified airway laser surgery scenario as a case study to demonstrate how the developed model patterns address the two-way communication and configurable execution order issues and their impact on validation and verification of medical safety properties. © 2014 IEEE.
- Decision-driven schedulingJ.-E. Kim, T. Abdelzaher, L. Sha, A. Bar-Noy, R.L. Hobbs, and W. DronReal-Time Systems, 2019
This paper presents a scheduling model, called decision-driven scheduling, elaborates key optimality results for a fundamental scheduling model, and evaluates new heuristics solving more general versions of the problem. In the context of applications that need control and actuation, the traditional execution model has often been either time-driven or event-driven. In time-driven applications, sensors are sampled periodically, leading to the classical periodic task model. In event-driven applications, sensors are sampled when an event of interest occurs, such as motion-activated cameras, leading to an event-driven task activation model. In contrast, in decision-driven applications, sensors are sampled when a particular decision must be made. We offer a justification for why decision-driven scheduling might be of increasing interest to Internet-of-things applications, and explain why it leads to interesting new scheduling problems (unlike time-driven and event-driven scheduling), including the problems addressed in this paper. © 2019, Springer Science+Business Media, LLC, part of Springer Nature.
- A Container-based DoS Attack-Resilient Control Framework for Real-Time UAV SystemsJ. Chen, Z. Feng, J.-Y. Wen, B. Liu, and L. Sha2019 Design, Automation and Test in Europe Conference and Exhibition, DATE 2019, 2019
The Unmanned aerial vehicles (UAVs) sector is fast-expanding. Protection of real-time UAV applications against malicious attacks has become an urgent problem that needs to be solved. Denial-of-service (DoS) attack aims to exhaust system resources and cause important tasks to miss deadlines. DoS attack may be one of the common problems of UAV systems, due to its simple implementation. In this paper, we present a software framework that offers DoS attack-resilient control for real-time UAV systems using containers: ContainerDrone. The framework provides defense mechanisms for three critical system resources: CPU, memory, and communication channel. We restrict attacker’s access to CPU core set and utilization. Memory bandwidth throttling limits attacker’s memory usage. By simulating sensors and drivers in the container, a security monitor constantly checks DoS attacks over communication channels. Upon the detection of a security rule violation, the framework switches to the safety controller to mitigate the attack. We implemented a prototype quadcopter with commercially off-the-shelf (COTS) hardware and open-source software. Our experimental results demonstrated the effectiveness of the proposed framework defending against various DoS attacks. © 2019 EDAA.
2018
- Safety-assured model-driven design of the multifunction vehicle bus controllerY. Jiang, H. Liu, H. Song, H. Kong, R. Wang, Y. Guan, and L. ShaIEEE Transactions on Intelligent Transportation Systems, 2018
In this paper, we present a formal model-driven design approach to establish a safety-assured implementation of multifunction vehicle bus controller (MVBC), which controls the data transmission among the devices of the vehicle. First, the generic models and safety requirements described in International Electrotechnical Commission Standard 61375 are formalized as time automata and timed computation tree logic formulas, respectively. With model checking tool Uppaal, we verify whether or not the constructed timed automata satisfy the formulas and several logic inconsistencies in the original standard are detected and corrected. Then, we apply the code generation tool Times to generate C code from the verified model, which is later synthesized into a real MVBC chip, with some handwriting glue code. Furthermore, the runtime verification tool RMOR is applied on the integrated code, to verify some safety requirements that cannot be formalized on the timed automata. For evaluation, we compare the proposed approach with existing MVBC design methods, such as BeagleBone, Galsblock, and Simulink. Experiments show that more ambiguousness or bugs in the standard are detected during Uppaal verification, and the generated code of Times outperforms the C code generated by others in terms of the synthesized binary code size. The errors in the standard have been confirmed and the resulting MVBC has been deployed in the real train communication network. © 2018 IEEE.
- Model and Integrate Medical Resource Available Times and Relationships in Verifiably Correct Executable Medical Best Practice Guideline ModelsC. Guo, Z. Fu, Z. Zhang, S. Ren, and L. ShaProceedings - 9th ACM/IEEE International Conference on Cyber-Physical Systems, ICCPS 2018, 2018
Improving patient care safety is an ultimate objective for medical cyber-physical systems. A recent study shows that the patients’ death rate is significantly reduced by computerizing medical best practice guidelines [16]. Recent data also show that some morbidity and mortality in emergency care are directly caused by delayed or interrupted treatment due to lack of medical resources [15]. However, medical guidelines usually do not provide guidance on medical resource demands and how to manage potential unexpected delays in resource availability. If medical resources are temporarily unavailable, safety properties in existing executable medical guideline models may fail which may cause increased risk to patients under care. The paper presents a separately model and jointly verify (SMJV) architecture to separately model medical resource available times and relationships and jointly verify safety properties of existing medical best practice guideline models with resource models being integrated in. The separated modeling approach also allows different domain professionals to make independent model modifications, facilitates the management of frequent resource availability changes, and enables resource statechart reuse in multiple medical guideline models. A simplified stroke scenario is used as a case study to investigate the effectiveness and validity of the SMJV architecture. The case study indicates that the SMJV architecture is able to identify unsafe properties caused by unexpected resource delays. © 2018 IEEE.
- RSimplex: A robust control architecture for cyber and physical failuresX. Wang, N. Hovakimyan, and L. ShaACM Transactions on Cyber-Physical Systems, 2018
As the complexity of Cyber-Physical Systems (CPS) increases, it becomes increasingly challenging to ensure CPS reliability, especially in the presence of software and/or physical failures. The Simplex architecture is shown to be an efficient tool to address software failures in such systems. When physical failures exist, however, Simplex may not function correctly because physical failures could change system dynamics and the original Simplex design may not work for the new faulty system. To address concurrent software and physical failures, this article presents the RSimplex architecture, which integrates Robust Fault-Tolerant Control (RFTC) techniques into the Simplex architecture. It includes the uncertainty monitor, the High-Performance Controller (HPC), the Robust High-Assurance Controller (RHAC), and the decision logic that triggers the switch of the controllers. Based on the output of the uncertainty monitor, we introduce a monitor-based switching rule in the decision logic in addition to the traditional envelope-based rule. The RHAC is designed based on RFTCs. We show that RSimplex can efficiently handle a class of software and physical failures. © 2018 ACM
- Dependable model-driven development of CPS: From Stateflow simulation to verified implementationY. Jiang, H. Song, Y. Yang, H. Liu, M. Gu, Y. Guan, J. Sun, and L. ShaACM Transactions on Cyber-Physical Systems, 2018
Simulink is widely used for model-driven development (MDD) of cyber-physical systems. Typically, the Simulink-based development starts with Stateflow modeling, followed by simulation, validation, and code generation mapped to physical execution platforms. However, recent trends have raised the demands of rigorous verification on safety-critical applications to prevent intrinsic development faults and improve the system dependability, which is unfortunately challenging. Even though the constructed Stateflow model and the generated code pass the validation of Simulink Design Verifier and Simulink Polyspace, respectively, the system may still fail due to some implicit defects contained in the design model (design defect) and the generated code (implementation defects). In this article, we bridge the Stateflow-basedMDD and a well-defined rigorous verification to reduce development faults. First, we develop a self-contained toolkit to translate a Stateflow model into timed automata, where major advanced modeling features in Stateflow are supported. Taking advantage of the strong verification capability of Uppaal, we can not only find bugs in Stateflow models that are missed by Simulink Design Verifier but also check more important temporal properties. Next, we customize a runtime verifier for the generated non-intrusive VHDL and C code of a Stateflow model for monitoring. The major strength of the customization is the flexibility to collect and analyze runtime properties with a pure software monitor, which offers more opportunities for engineers to achieve high reliability of the target system compared with the traditional act that only relies on Simulink Polyspace. In this way, safety-critical properties are both verified at the model level and at the consistent system implementation level with physical execution environment in consideration. We apply our approach to the development of a typical cyber-physical system-train communication controller based on the IEC standard 61375. Experiments show that more ambiguousness in the standard are detected and confirmed and more development faults and those corresponding errors that would lead to system failure have been removed. Furthermore, the verified implementation has been deployed on real trains. © 2018 Association for Computing Machinery.
- IAFinder: Identifying potential implicit assumptions to facilitate validation in medical cyber-physical systemZ. Fu, Z. Wang, C. Guo, Z. Zhang, S. Ren, and L. ShaProceedings - Design Automation Conference, 2018
According to the U.S. Food and Drug Administration (FDA) medical device recall database, medical device recalls are at an all-time high. One of the major causes of the recalls is due to implicit assumptions of which either the medical device operating environment does not match, or the device operators are not aware of. In this paper, we present IAFinder (Implicit Assumption Finder), a tool that uses data mining techniques to automatically extract invariants from design models implemented with statecharts. By identifying invariants that are not explicitly specified in the design models, we are able to find implicit assumptions and better facilitate domain experts to validate them and make the validated implicit assumptions explicit. We use a cardiac arrest statechart model as a case study to illustrate the usage of IAFinder in identifying implicit assumptions. © 2018 Association for Computing Machinery.
- A cyber-physical system framework for early detection of paroxysmal diseasesZ. Gu, Y. Jiang, M. Zhou, M. Gu, X. Song, and L. ShaIEEE Access, 2018
Paroxysmal diseases of inpatients are globally recognized as one of the top challenges in medicine. Poor clinical outcomes are primarily caused by delayed recognition, especially due to diverse clinical diagnostic criteria with complex manifestations, irregular episodes, and already overloaded clinical activities. With the proliferation of measuring devices and increased computational capabilities, cyber-physical characterization plays an increasingly important role in many domains to provide enabling technologies. This paper presents a cyber-physical system (CPS) framework to assist physicians in making earlier diagnoses of paroxysmal sympathetic hyperactivity based on existing medical knowledge. We propose a configurable diagnostic knowledge model to characterize clinical criteria to reduce domain knowledge deficiency between physicians and computer scientists. We present a component-based medical CPS framework to employ the knowledge models and integrate medical devices. Our approach aims to relieve medical staff from the heavy burden of clinical activities and to provide timely decision support. We evaluate our approach on 128 real-world clinical cases. Compared with the state-of-the-art approach, the results demonstrate that we enable early detection in 11.02% more patients and detect the condition 16.57 hours earlier on average. © 2013 IEEE.
- SafeTrace: A safety-driven requirement traceability framework on device interaction hazards for MD PnPA.Y.-Z. Ou, M. Rahmaniheris, Y. Jiang, L. Sha, Z. Fu, and S. RenACM Symposium on Applied Computing, 2018
Requirements management and safety analysis have been the key foundations of the successful development of life-critical systems, and the traceability of safety-related artifacts across such systems is becoming ever more important. Unless safety analysts can trace when and how requirements and design change, their analysis will become inconsistent, and eventually fail as proof that a given system can mitigate certain faults during certification processes. However, most prior research on traceability has focused on requirements, design and source code changes, rather than the integration of safety analysis by considering device interactions such as the Medical Device plug-and-play (MD PnP) into traceability and change-impact analysis. To help fill this gap, this paper proposes a safety-driven requirement traceability framework, SafeTrace, that traces the relations between safety requirements, design, and safety analysis, and the impact of requirement and design changes on safety analysis for life-critical systems with a focus on medical device interaction hazards. © 2018 ACM.
- Study of Software-Related Causes in the FDA Medical Device RecallsZ. Fu, C. Guo, S. Ren, Y. Jiang, and L. ShaIEEE International Conference on Engineering of Complex Computer Systems, ICECCS, 2018
As technology advances, medical devices are playing increasingly more important roles in patient care. Unfortunately, based on the U.S. Food and Drug Administration (FDA) data, medical device recalls are at an all time high. One of the major causes of the recalls is due to defective software. In fact, one in every three medical devices that use software for operation has been recalled because of failures in the software itself. Unlike traditional software, software-based medical devices have specific domain fault modes, and these fault modes have been not addressed in software design literature, such as dosage calculation fault. In this paper, we first present a process that collects software-related medical device recalls from the FDA database. Collecting all software-related medical device recalls is an effort that needs the support and contributions from a large research, industrial, and medical community, To facility such effort, we have developed a web-based platform for different users to contribute and share new software-related medical device recalls into the collection. Second, we analyze one hundred software-related recalls that we have collected from the FDA database. Our analysis reveals that there are four major categories of software failures in medical device recalls and implicit assumptions made by medical device manufacturers are among one of the leading causes in medical device recalls. Last, we present an approach for implicit assumption management in medical cyber-physical system designs. © 2017 IEEE.
- Athena: Towards decision-centric anticipatory sensor information delivery†J. Lee, K. Marcus, T. Abdelzaher, T.A. Amin, A. Bar-Noy, W. Dron, R. Govindan, R. Hobbs, S. Hu, J.-E. Kim, L. Sha, S. Yao, and Y. ZhaoJournal of Sensor and Actuator Networks, 2018
The paper introduces a new direction in quality-of-service-aware networked sensing that designs communication protocols and scheduling policies for data delivery that are optimized specifically for decision needs. The work complements present decision monitoring and support tools and falls in the larger framework of decision-driven resource management. A hallmark of the new protocols is that they are aware of the inference structure used to arrive at decisions (from logical predicates), as well as the data (and data quality) that need to be furnished to successfully evaluate the unknowns on which these decisions are based. Such protocols can therefore anticipate and deliver precisely the right data, at the right level of quality, from the right sources, at the right time, to enable valid and timely decisions at minimum cost to the underlying network. This paper presents the decision model used and the protocol design philosophy, reviews the key recent results and describes a novel system, called Athena, that is the first to embody the aforementioned data delivery paradigm. Evaluation results are presented that compare the performance of decision-centric anticipatory information delivery to several baselines, demonstrating its various advantages in terms of decision timeliness, validity and network resources used. The paper concludes with a discussion of remaining future challenges in this emerging area. © 2018 by the authors.
2017
- Toward safe interoperations in network connected medical cyber-physical systems using open-loop safe protocolsA.Y.-Z. Ou, M. Rahmaniheris, Y. Jiang, P.-L. Wu, and L. ShaIEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, 2017
Using wireless networks in medical Cyber-Physical Systems could be challenging. Because the medical system not only assists the medical personnel to deliver medical services to the patient but also needs to deal with accidental situations such as communication failures without compromising the patient’s safety. Previous research work tackled the communication failure problems in medical CPS from architecture perspectives. However, as medical devices configurations become more complex when a medical CPS is composed of many medical devices, we need to know that whether the certain configuration and a combination of the devices will not compromise the patient’s safety. We present an algorithm to tackle the problem that whether a given system configuration exists a possible series of system transitions that allows the physicians to perform medical operations; in the mean time, the system transitions ensure the patient’s safety while communication failures may happen during the transitions. © 2017 IEEE.
- Model and integrate medical resource availability into verifiably correct executable medical guidelinesC. Guo, Z. Fu, Z. Zhang, S. Ren, and L. ShaIEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, 2017
Improving effectiveness and safety of patient care is an ultimate objective for medical cyber-physical systems. A recent study shows that the patients’ death rate can be reduced by computerizing medical guidelines [20]. Most existing medical guideline models are validated and/or verified based on the assumption that all necessary medical resources needed for a patient care are always available. However, the reality is that some medical resources, such as special medical equipment or medical specialists, can be temporarily unavailable for an individual patient. In such cases, safety properties validated and/or verified in existing medical guideline models without considering medical resource availability may not hold any more. © 2017 IEEE.
- Supporting Emergency Medical Care Teams with an Integrated Status Display Providing Real-Time Access to Medical Best Practices, Workflow Tracking, and Patient DataP.L. Wu, M.-Y. Nam, J. Choi, A. Kirlik, L. Sha, and Jr. BerlinJournal of Medical Systems, 2017
The work of a hospital’s medical staff is safety critical and often occurs under severe time constraints. To provide timely and effective cognitive support to medical teams working in such contexts, guidelines in the form of best practice workflows for healthcare have been developed by medical organizations. However, the high cognitive load imposed in such stressful and rapidly changing environments poses significant challenges to the medical staff or team in adhering to these workflows. In collaboration with physicians and nurses from Carle Foundation Hospital, we first studied and modeled medical team’s individual responsibilities and interactions in cardiac arrest resuscitation and decomposed their overall task into a set of distinct cognitive tasks that must be specifically supported to achieve successful human-centered system design. We then developed a medical Best Practice Guidance (BPG) system for reducing medical teams’ cognitive load, thus fostering real-time adherence to best practices. We evaluated the resulting system with physicians and nurses using a professional patient simulator used for medical training and certification. The evaluation results point to a reduction of cognitive load and enhanced adherence to medical best practices. © 2017, Springer Science+Business Media, LLC.
- Modeling and Integrating Human Interaction Assumptions in Medical Cyber-Physical System DesignZ. Fu, C. Guo, S. Ren, Y. Ou, and L. ShaProceedings - IEEE Symposium on Computer-Based Medical Systems, 2017
For a cyber-physical system, its execution behaviors are often impacted by human interactive behaviors. However, the assumptions about a cyber-physical systems expected human interactive behaviors are often informally documented, or even left implicit and unspecified in system design. Unfortunately, such implicit human interaction assumptions made by safety critical cyber-physical systems, such as medical cyber-physical systems (M-CPS), can lead to catastrophes. Several recent U.S. Food and Drug Administration (FDA) medical device recalls are due to implicit human interaction assumptions. In this paper, we classify the categories of constraints in human interaction assumptions in the medical domain and develop a mathematical assumption model that allow M-CPS engineers to explicitly and precisely specify assumptions about human interactions. Algorithms are developed to integrate mathematical assumption models with system model so that the safety of the system can be not only validated by both medical and engineering professionals but also formally verified by existing formal verification tools. We use an FDA recalled medical ventilator scenario as a case study to show how the mathematical assumption model and its integration in M-CPS design may improve the safety of the ventilator and M-CPS in general. © 2017 IEEE.
- Pattern-Based Statechart Modeling Approach for Medical Best Practice Guidelines - A Case StudyC. Guo, Z. Fu, S. Ren, Y. Jiang, M. Rahmaniheris, and L. ShaProceedings - IEEE Symposium on Computer-Based Medical Systems, 2017
Improving effectiveness and safety of patient care is an ultimate objective for medical cyber-physical systems. Many medical best practice guidelines exist in the format of hospital handbooks which are often lengthy and difficult for medical staff to remember and apply clinically. Statechart is an effective tool to model medical guidelines and enables clinical validation with medical staffs. However, some advanced statechart elements could result in high cost, such as low understandability, high difficulty in clinical validation, formal verification, and failure trace back. The paper presents a pattern-based statechart modeling approach for medical best practice guidelines, i.e., model medical guidelines with basic statechart elements and model patterns which are built upon these basic elements. For practical use, we implement the proposed approach based on open-source Yakindu statecharts. We also use a simplified cardiac arrest scenario provided to our team by Carle Foundation Hospital as a case study to validate the proposed approach. © 2017 IEEE.
- CPS runtime architecture and automated transformation of applicationsFEAST 2017 - 2017 Workshop on Forming an Ecosystem Around Software Transformation, co-located with CCS 2017, 2017
National Academy of Science’s study on dependable software systems concluded that simplicity is the key. Reducing the complexity of software has been investigated by the FEAST community on automated transformation of application software and by the CPS runtime community on the development of runtime architectures that simply the development of applications. This creates the need of collaboration between these two communities. The goal of this review paper is to bring this need to the attention of FEAST community. © 2017 Association for Computing Machinery.
- Toward physiology-aware DASH: Bandwidth-compliant prioritized clinical multimedia communication in ambulancesM. Hosseini, Y. Jiang, R.R. Berlin, L. Sha, and H. SongIEEE Transactions on Multimedia, 2017
The ultimate objective of medical cyber-physical systems is to enhance the safety and effectiveness of patient care. To ensure safe and effective care during emergency patient transfer from rural areas to center tertiary hospitals, reliable and real-time communication is essential. Unfortunately, real-time monitoring of patients involves transmission of various clinical multimedia data including videos, medical images, and vital signs, which requires use of mobile network with high-fidelity communication bandwidth. However, the wireless networks along the roads in rural areas range from 4G to 2G to low speed satellite links, which poses a significant challenge to transmit critical patient information. In this paper, we present a bandwidth-compliant criticality-aware system for transmission of massive clinical multimedia data adaptive to varying bandwidths during patient transport. Model-based clinical automata are used to determine the criticality of clinical multimedia data. We borrow concepts from DASH, and propose physiology-aware adaptation techniques to transmit more critical clinical data with higher fidelity in response to changes in disease, clinical states, and bandwidth condition. In collaboration with Carle’s ambulance service center, we develop a bandwidth profiler, and use it as proof of concept to support our experiments. Our preliminary evaluation results show that our solutions ensure that most critical patient’s clinical data are communicated with higher fidelity. © 1999-2012 IEEE.
- Physiology-Aware Rural Ambulance RoutingM. Hosseini, R.B. Berlin, and L. ShaProceedings - 2017 IEEE International Conference on Healthcare Informatics, ICHI 2017, 2017
The ultimate objective of medical cyber-physical systems is to enhance the effectiveness of patient care. During emergency patient transport from rural medical facility to center tertiary hospital, real-time monitoring of the patient in the ambulance by a physician expert at the tertiary center is crucial. The physician experts can provide vital assistance to the ambulance EMT personnel. While telemetry healthcare services using mobile networks may enable remote real-time monitoring of transported patients, physiologic measures and tracking are at least as important and requires the existence of high-fidelity communication coverage. However, the wireless networks along the roads especially in rural areas can range from 4G to low-speed 2G, some parts with communication breakage. From a patient care perspective, transport during critical illness can make route selection patient state dependent. Prompt decisions with the relative advantage of a longer more secure bandwidth route versus a shorter, more rapid transport route but with less secure bandwidth must be made. The trade-off between route selection and the quality of wireless communication is an important optimization problem which unfortunately has remained unaddressed by prior work.In this paper, we propose a novel physiology-aware route scheduling approach for emergency ambulance transport of rural patients with acute, high risk diseases in need of continuous remote monitoring. We mathematically model the problem into an NP-hard graph theory problem, and approximate a solution based on a trade-off between communication coverage and shortest path. We profile communication along two major routes in a large rural hospital settings in Illinois, and use the traces to manifest the concept. Further, we design our algorithms and run preliminary experiments for scalability analysis. We believe that our scheduling techniques can become a compelling aid that enables an always-connected remote monitoring system in emergency patient transfer scenarios aimed to prevent morbidity and mortality with early diagnosis and effective treatment. © 2017 IEEE.
- Towards Verifiable Safe and Correct Medical Best Practice Guideline SystemsC. Guo, Z. Fu, S. Ren, Y. Jiang, and L. ShaProceedings - International Computer Software and Applications Conference, 2017
Improving safety of patient care is an ultimate objective for medical systems. Though many medical best practice guidelines exist and are in hospital handbooks, they are often lengthy and difficult for medical professionals to remember and apply clinically. Hence, developing safe and correct medical best practice guideline systems is an urgent need. Many efforts have been made in modeling, clinical validation, model level formal verification of medical best practice guidelines. However, code level verification is also necessary to develop verifiable safe and correct medical guideline systems. The paper presents an approach to transform safety properties specified in verifiable medical guideline models to JavaMOP runtime monitor and specify JavaMOP monitors to runtime monitor these safety properties during execution of Java code generated from validated and verified statechart models. We use a simplified version of a cardiac arrest scenario provided by Carle Foundation Hospital as a case study to validate the proposed approach. © 2017 IEEE.
- Adaptive Clinical Data Communication for Remote Monitoring in Rural Ambulance TransportM. Hosseini, R.R. Berlin, Y. Jiang, and L. ShaProceedings - 2017 IEEE 2nd International Conference on Connected Health: Applications, Systems and Engineering Technologies, CHASE 2017, 2017
For emergency medical cyber-physical systems, enhancing the safety and effectiveness of patient care, especially in rural ambulance transport, is essential. Use of telecommunication technologies can enhance effectiveness and safety of remote monitoring of patients by the physicians at the center hospital during ambulance transport and provides vital assistance to the emergency medical technicians (EMT) to associate best treatments. However, the communication along the roads in rural areas can range from 4G to 2G to low speed satellite links, with some parts suffering from communication breakage. This unreliable and limited communication bandwidth together with the produced mass of clinical data and the many information exchanges pose a major challenge in real-time and smooth supervision of patients. In this paper, we present parts of developed physiology-aware communication architecture, a bandwidth-compliant criticality-aware system for transmission of clinical data adaptive to varying bandwidths during patient transport. We propose adaptation techniques to transmit more critical data with higher fidelity in response to changes in disease, clinical states, and bandwidth condition. In collaboration with Carle’s ambulance service center, we develop a bandwidth profiler, and use it to collect communication traces to support our experiments. Our evaluation results show that our solutions ensure that most critical patient’s clinical data are communicated with higher fidelity. © 2017 IEEE.
- Data-centered runtime verification of wireless medical cyber-physical systemY. Jiang, H. Song, R. Wang, M. Gu, J. Sun, and L. ShaIEEE Transactions on Industrial Informatics, 2017
Wireless medical cyber-physical systems are widely adopted in the daily practices of medicine, where huge amounts of data are sampled by the wireless medical devices and sensors, and is passed to the decision support systems (DSSs). Many text-based guidelines have been encoded for work-flow simulation of DSS to automate health care based on those collected data. But for some complex and life-critical diseases, it is highly desirable to automatically rigorously verify some complex temporal properties encoded in those data, which brings new challenges to current simulation-based DSS with limited support of automatical formal verification and real-time data analysis. In this paper, we conduct the first study on applying runtime verification to cooperate with current DSS based on real-time data. Within the proposed technique, a user-friendly domain specific language, named DRTV, is designed to specify vital real-time data sampled by medical devices and temporal properties originated from clinical guidelines. Some interfaces are developed for data acquisition and communication. Then, for medical practice scenarios described in DRTV model, we will automatically generate event sequences and runtime property verifier automata. If a temporal property violates, real-time warnings will be produced by the formal verifier and passed to medical DSS. We have used DRTV to specify different kinds of medical care scenarios and have applied the proposed technique to assist existing wireless medical cyber-physical system. As presented in experiment results, in terms of warning detection, it outperforms the only use of DSS or human inspection, and improves the quality of clinical health care of hospital. © 2005-2012 IEEE.
- A mobile geo-communication dataset for physiology-aware DASH in rural ambulance transportM. Hosseini, Y. Jiang, A. Yekkehkhany, R.R. Berlin, and L. Sha8th ACM Multimedia Systems Conference, MMSys 2017, 2017
Use of telecommunication technologies for remote, continuous monitoring of patients can enhance effectiveness of emergency ambulance care during transport from rural areas to a regional center hospital. However, the communication along the various routes in rural areas may have wide bandwidth ranges from 2G to 4G; some regions may have only lower satellite bandwidth available. Bandwidth fluctuation together with real-time communication of various clinical multimedia pose a major challenge during rural patient ambulance transport. The availability of a pre-transport route-dependent communication bandwidth database is an important resource in remote monitoring and clinical multimedia transmission in rural ambulance transport. Here, we present a geo-communication dataset from extensive profiling of 4 major US mobile carriers in Illinois, from the rural location of Hoopeston to the central referral hospital center at Urbana. In collaboration with Carle Foundation Hospital, we developed a profiler, and collected various geographical and communication traces for realistic emergency rural ambulance transport scenarios. Our dataset is to support our ongoing work of proposing "physiology-aware DASH", which is particularly useful for adaptive remote monitoring of critically ill patients in emergency rural ambulance transport. It provides insights on ensuring higher Quality of Service (QoS) for most critical clinical multimedia in response to changes in patients’ physiological states and bandwidth conditions. Our dataset is available online 1 for research community. © 2017 ACM.
- A schedulability test for software migration on multicore systemJ.-E. Kim, R. Bradford, T. Abdelzaher, and L. Sha2017 Design, Automation and Test in Europe, DATE 2017, 2017
This paper presents a new schedulability test for safety-critical software undergoing a transition from single-core to multicore systems - a challenge faced by multiple industries today. Our migration model consists of a schedulability test and execution model. Its properties enable us to obtain a utilization bound that places an allowable limit on total task execution times. Evaluation results demonstrate the advantages of our scheduling model over competing resource partitioning approaches, such as Periodic Server and TDMA. © 2017 IEEE.
- Modeling and integrating physical environment assumptions in medical cyber-physical system designZ. Fu, C. Guo, S. Ren, Y. Jiang, and L. Sha2017 Design, Automation and Test in Europe, DATE 2017, 2017
Implicit physical environment assumptions made by safety critical cyber-physical systems, such as medical cyber-physical systems (M-CPS), can lead to catastrophes. Several recent U.S. Food and Drug Administration (FDA) medical device recalls are due to implicit physical environment assumptions. In this paper, we develop a mathematical assumption model and composition rules that allow M-CPS engineers to explicitly and precisely specify assumptions about the physical environment in which the designed M-CPS operates. Algorithms are developed to integrate the mathematical assumption model with system model so that the safety of the system can be not only validated by both medical and engineering professionals but also formally verified by existing formal verification tools. We use an FDA recalled medical ventilator scenario as a case study to show how the mathematical assumption model and its integration in M-CPS design may improve the safety of the ventilator and M-CPS in general. © 2017 IEEE.
- VirtualDrone: Virtual sensing, actuation, and communication for attack-resilient unmanned aerial systemsM.-K. Yoon, B. Liu, N. Hovakimyan, and L. ShaProceedings - 2017 ACM/IEEE 8th International Conference on Cyber-Physical Systems, ICCPS 2017 (part of CPS Week), 2017
As modern unmanned aerial systems (UAS) continue to expand the frontiers of automation, new challenges to security and thus its safety are emerging. It is now difficult to completely secure modern UAS platforms due to their openness and increasing complexity. We present the VirtualDrone Framework, a software architecture that enables an attack-resilient control of modern UAS. It allows the system to operate with potentially untrustworthy software environment by virtualizing the sensors, actuators, and communication channels. The framework provides mechanisms to monitor physical and logical system behaviors and to detect security and safety violations. Upon detection of such an event, the framework switches to a trusted control mode in order to override malicious system state and to prevent potential safety violations. We built a prototype quadcoper running an embedded multicore processor that features a hardware-assisted virtualization technology. We present extensive experimental study and implementation details, and demonstrate how the framework can ensure the robustness of the UAS in the presence of security breaches. © 2017 ACM.
- Learning execution contexts from system call distribution for anomaly detection in smart embedded systemM.-K. Yoon, S. Mohan, J. Choi, M. Christodorescu, and L. ShaProceedings - 2017 IEEE/ACM 2nd International Conference on Internet-of-Things Design and Implementation, IoTDI 2017 (part of CPS Week), 2017
Existing techniques used for anomaly detection do not fully utilize the intrinsic properties of embedded devices. In this paper, we propose a lightweight method for detecting anomalous executions using a distribution of system call frequencies. We use a cluster analysis to learn the legitimate execution contexts of embedded applications and then monitor them at run-time to capture abnormal executions. Our prototype applied to a real-world open-source embedded application shows that the proposed method can effectively detect anomalous executions without relying on sophisticated analyses or affecting the critical execution paths. © 2017 ACM.
- WiP abstract: A physiology-aware communication architecture for distributed emergency medical CPSM. Hosseini, R.R. Berlin, and L. ShaProceedings - 2017 ACM/IEEE 8th International Conference on Cyber-Physical Systems, ICCPS 2017 (part of CPS Week), 2017
For emergency medical cyber-physical systems, enhancing the safety and effectiveness of patient care, especially in rural areas, is essential. While the doctor to patient ratio in the United States is 30 to 10,000 in large metropolitan areas, it is only 5 to 10,000 in most rural areas; and the highest death rates are often found in the most rural counties [3]. Use of telecommunication technologies can enhance effectiveness and safety of emergency ambulance transport of patients from rural areas to a regional center hospital. It enables remote monitoring of patients by the physicians at the center hospital and provides vital assistance to the emergency medical technicians (EMT) to associate best treatments. However, the communication along the roads in rural areas can range from 4G to 2G to low speed satellite links, with some parts suffering from communication breakage. This unreliable and limited communication bandwidth together with the produced mass of clinical data and the many information exchanges pose a major challenge in real-time supervision of patients. During this research, we are developing a novel adaptive physiology-aware communication architecture which is aware of the patient condition, the underlying network bandwidth, and the criticality of clinical data in the context of the specific disease to achieve an enhanced remote monitoring. Further, it features reliability, safety, and fault tolerance for cases such as network interruption. © 2017 ACM.
- Using human intellectual tasks as guidelines to systematically model medical cyber-physical systemsA.Y.-Z. Ou, J. Yu, P.-L. Wu, L. Sha, and Jr. Berlin2016 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016 - Conference Proceedings, 2017
In a medical environment such as Intensive Care Unit, there are many possible reasons to cause errors, and one important reason is the effect of human intellectual tasks. In this paper, we first provide five categories of generic intellectual tasks of humans, where tasks among each category may lead to potential medical errors. Then, we present an integrated modeling framework to model a medical Cyber-Physical-Human System (CPHSystem) and use UPPAAL as the foundation to integrate and verify the whole medical CPHSystem design models. When designing a medical CPHSystem, developers need to consider whether the system design can mitigate the errors caused by these tasks or not. With a verified and comprehensive model, we can design a more accurate and acceptable system. We use a cardiac arrest resuscitation guidance and navigation system (CAR-GNSystem) as the motivation example for such medical CPHSystem modeling. Experimental results show that the CPHSystem models help determine system design flaws and can mitigate the potential medical errors caused by the human intellectual tasks. © 2016 IEEE.
- An integrated Medical CPS for early detection of paroxysmal sympathetic hyperactivityZ. Gu, H. Song, Y. Jiang, J. Choi, H. He, L. Sha, and M. GuProceedings - 2016 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016, 2017
Paroxysmal sympathetic hyperactivity (PSH) is an important clinical problem of severe traumatic brain injury (TBI) which incurs approximately 90% of all TBI-related costs. However, current detection approach is hampered by no consensus clinical diagnostic criteria, paroxysmal episode feature with complex manifestations, and already overloaded clinical activities. These limitations cause delayed recognitions which result in poor clinical outcomes. In this paper, we design an integrated Medical Cyber-Physical System (Medical CPS) for early detection of paroxysmal sympathetic hyperactivity patients. First, a formal model is proposed to describe clinical diagnostic criteria. With the formalized models employed, we implement an early detector and integrate it with revised medical device adapters into Medical CPS. Our system will monitor patient conditions automatically and continuously to relieve medical staff from the heavy burden of clinical activities and provide timely decision supports. Evaluations on 107 clinical cases extracted from medical publications demonstrate the effectiveness and the efficiency of our integrated system. © 2016 IEEE.
- On exploiting structured human interactions to enhance sensing accuracy in cyber-physical systemsH. Wang, Y. Gao, S. Hu, S. Wang, R. Mancuso, M. Kim, P. Wu, L. Su, L. Sha, and T. AbdelzaherACM Transactions on Cyber-Physical Systems, 2017
In this article, we describe a general methodology for enhancing sensing accuracy in cyber-physical systems that involve structured human interactions in noisy physical environment. We define structured human interactions as domain-specific workflow. A novel workflow-aware sensing model is proposed to jointly correct unreliable sensor data and keep track of states in a workflow. We also propose a new inference algorithm to handle cases with partially known states and objects as supervision. Our model is evaluated with extensive simulations. As a concrete application, we develop a novel log service called Emergency Transcriber, which can automatically document operational procedures followed by teams of first responders in emergency response scenarios. Evaluation shows that our system has significant improvement over commercial off-theshelf (COTS) sensors and keeps track of workflow states with high accuracy in noisy physical environment. © 2017 ACM.
- Preventable Medical Errors Driven Modeling of Medical Best Practice Guidance SystemsA.Y.-Z. Ou, Y. Jiang, P.-L. Wu, L. Sha, and Jr. BerlinJournal of Medical Systems, 2017
In a medical environment such as Intensive Care Unit, there are many possible reasons to cause errors, and one important reason is the effect of human intellectual tasks. When designing an interactive healthcare system such as medical Cyber-Physical-Human Systems (CPHSystems), it is important to consider whether the system design can mitigate the errors caused by these tasks or not. In this paper, we first introduce five categories of generic intellectual tasks of humans, where tasks among each category may lead to potential medical errors. Then, we present an integrated modeling framework to model a medical CPHSystem and use UPPAAL as the foundation to integrate and verify the whole medical CPHSystem design models. With a verified and comprehensive model capturing the human intellectual tasks effects, we can design a more accurate and acceptable system. We use a cardiac arrest resuscitation guidance and navigation system (CAR-GNSystem) for such medical CPHSystem modeling. Experimental results show that the CPHSystem models help determine system design flaws and can mitigate the potential medical errors caused by the human intellectual tasks. © 2016, Springer Science+Business Media New York.
2016
- A Pathophysiological Model-Driven Communication for Dynamic Distributed Medical Best Practice Guidance SystemsM. Hosseini, Y. Jiang, P. Wu, Jr. Berlin, S. Ren, and L. ShaJournal of Medical Systems, 2016
There is a great divide between rural and urban areas, particularly in medical emergency care. Although medical best practice guidelines exist and are in hospital handbooks, they are often lengthy and difficult to apply clinically. The challenges are exaggerated for doctors in rural areas and emergency medical technicians (EMT) during patient transport. In this paper, we propose the concept of distributed executable medical best practice guidance systems to assist adherence to best practice from the time that a patient first presents at a rural hospital, through diagnosis and ambulance transfer to arrival and treatment at a regional tertiary hospital center. We codify complex medical knowledge in the form of simplified distributed executable disease automata, from the thin automata at rural hospitals to the rich automata in the regional center hospitals. However, a main challenge is how to efficiently and safely synchronize distributed best practice models as the communication among medical facilities, devices, and professionals generates a large number of messages. This complex problem of patient diagnosis and transport from rural to center facility is also fraught with many uncertainties and changes resulting in a high degree of dynamism. A critically ill patient’s medical conditions can change abruptly in addition to changes in the wireless bandwidth during the ambulance transfer. Such dynamics have yet to be addressed in existing literature on telemedicine. To address this situation, we propose a pathophysiological model-driven message exchange communication architecture that ensures the real-time and dynamic requirements of synchronization among distributed emergency best practice models are met in a reliable and safe manner. Taking the signs, symptoms, and progress of stroke patients transported across a geographically distributed healthcare network as the motivating use case, we implement our communication system and apply it to our developed best practice automata using laboratory simulations. Our proof-of-concept experiments shows there is potential for the use of our system in a wide variety of domains. © 2016, Springer Science+Business Media New York.
- On Maximizing Quality of Information for the Internet of Things: A Real-Time Scheduling Perspective (Invited Paper)J.-E. Kim, T. Abdelzaher, L. Sha, A. Bar-Noy, R. Hobbs, and W. DronProceedings - 2016 IEEE 22nd International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2016, 2016
The paper considers the challenge of maximizing the quality of information collected to meet decision needs of real-time Internet-of-Things applications. A novel scheduling model is proposed, where applications need multiple data items to make decisions, and where individual data items can be captured at different levels of quality. We assume the existence of a single bottleneck over which data objects are collected and schedule the transmission of these objects over the bottleneck to meet decision deadlines and data validity constraints, while maximizing quality. A family of heuristic algorithms is presented to solve this problem. Their performance is empirically compared leading to insights into the solution space. © 2016 IEEE.
- Real-Time Computing on Multicore ProcessorsL. Sha, M. Caccamo, R. Mancuso, J.-E. Kim, M.-K. Yoon, R. Pellizzoni, H. Yun, R.B. Kegley, D.R. Perlman, G. Arundale, and R. BradfordComputer, 2016
Architects of multicore chips for avionics must define and bound intercore interference, which requires assuming a constant worst-case execution time for tasks executing on the chip. With the Single Core Equivalent technology package, engineers can treat each core as if it were a single-core chip. © 2016 IEEE.
- An organ-centric best practice assist system for acute careM. Rahmaniheris, P. Wu, L. Sha, and R.R. BerlinProceedings - IEEE Symposium on Computer-Based Medical Systems, 2016
As patient condition changes rapidly in acute care, the monitoring and treatment plan must be adapted accordingly to ensure safe and effective patient care. Most current medical monitoring systems provide little contextual information on patient state. We present best practice assist system to help medical staff assess patient state more accurately and adapt her care plan according to the best practice guidelines and community consensus. The main components of our system are 1) an efficient and clinically sound representation of patient state in the form of disease and interacting organ states 2) a best practice manager that encodes the best practice monitoring and treatment guidelines for a given patient condition. Both components are modeled using finite state machine formalism. In addition, we have implemented a patient control panel and a graphical display to simulate clinical scenarios. Using a cardiac arrest scenario, we demonstrate how our system can help medical staff with patient assessment and adherence to best practice. © 2016 IEEE.
- A Self-Adaptively Evolutionary Screening Approach for Sepsis PatientY. Jiang, P. Tan, H. Song, B. Wan, M. Hosseini, and L. ShaProceedings - IEEE Symposium on Computer-Based Medical Systems, 2016
Today, sepsis syndrome is one of the leading cause of death globally, and is of great clinical importance. In this paper, we present a self-adaptively evolutionary sepsis screening system to shorten the time of syndrome detection and improve the positive effect of treatment, with the screening frequency and content can be automatically adjusted according to the current status of the patient. First, we propose a novel graphical computation model named AdapDBN with a clearly defined syntax for the medical knowledge presentation, especially for the presentation of the pathophysiology model of the disease. Then, the semantics of AdapDBN is formally defined for the evolutionary inference of syndrome onset probability. Finally, we demonstrate how to initialize AdapDBN with sepsis-related epidemiologic statics, published clinical research and physician’s knowledge and how to incorporate it into existing sepsis screening and decision support flow. We evaluate its effectiveness and superiority with comparisons to existing computation techniques. © 2016 IEEE.
- Sporadic Decision-Centric Data Scheduling with Normally-off SensorsJ.-E. Kim, T. Abdelzaher, L. Sha, A. Bar-Noy, and R. HobbsProceedings - Real-Time Systems Symposium, 2016
The Internet of Things heralds a new generation of data-centric applications, where controllers connect to large numbers of heterogeneous sensing devices. We consider a model, where the control loop does not execute periodically. Instead, controllers are prompted by contextual cues to make one-off decisions, resulting in sporadic activations. Since the need for data arises only sporadically, sensors do not sample data continuously. Rather, they are normally off (e.g., to save energy), but are activated by the controller on demand, when data is needed. Collected data has validity intervals, after which it must be re-sampled, since the measured value may change. Once a decision is made based on the data, sensors are turned off again. We call this model sporadic decision-centric data scheduling with normally-off sensors. It gives rise to novel scheduling problems because of the way the timing of activation of different sensors affects load attributed to data sampling; the shorter the interval between activation of a given sensor and the time a corresponding decision is made, the lower the number of samples taken by that sensor to support the decision, and thus decision cost. The paper defines the aforementioned decision-centric data scheduling problem and derives the optimal scheduling policy, called EDEF-LVF, for this task model. Simulation results confirm the superiority of EDEF-LVF compared to several baselines. © 2016 IEEE.
- The dragonbeam framework: Hardware-protected security modules for in-place intrusion detectionM.-K. Yoon, M. Christodorescu, L. Sha, and S. MohanSYSTOR 2016 - 9th ACM International Systems and Storage Conference, 2016
The sophistication of malicious adversaries is increasing every day and most defenses are often easily overcome by such attackers. Many existing defensive mechanisms often make differing assumptions about the underlying systems and use varied architectures to implement their solutions. This often leads to fragmentation among solutions and could even open up additional vulnerabilities in the system. We present the DragonBeam Framework that enables system designers to implement their own monitoring methods and analyses engines to detect intrusions in modern operating systems. It is built upon a novel hardware/software mechanism. Depending on the type of monitoring that is implemented using this framework, the impact on the monitored system is very low. This is demonstrated by the use cases presented in this paper that also showcase how the DragonBeam framework can be used to detect different types of attack. Copyright © 2016 ACM.
- Transforming Medical Best Practice Guidelines to Executable and Verifiable Statechart ModelsC. Guo, S. Ren, Y. Jiang, P.-L. Wu, L. Sha, and R.B. Berlin2016 ACM/IEEE 7th International Conference on Cyber-Physical Systems, ICCPS 2016 - Proceedings, 2016
Improving effectiveness and safety of patient care is an ultimate objective for medical cyber- physical systems. However, the existing medical best practice guidelines in hospital handbooks are often lengthy and difficult for medical staff to remember and apply clinically. Statechart is a widely used model in designing complex systems and enables rapid prototyping and clinical validation with medical doctors. However, clinical validation is often not adequate for guaranteeing the correctness and safety of medical cyber-physical systems, and formal verification is required. The paper presents an approach that transforms medical best practice guidelines to verifiable statechart models and supports both clinical validation in collaboration with medical doctors and formal verification. In particular, we use an open source statechart tool Yakindu to model best practice guidelines and use the statechart to interact with doctors for validating the model correctness. The statechart model is then automatically transformed to a verifiable formal model, such as timed automata, so that existing formal verification tool, such as UPPAAL, can be used to verify required safety properties. The approach also provides the ability to trace back to the paths in the statechart model (Yakindu model) when a specific property in its associated formal model (UPPAAL model) fails. A cardiac arrest scenario is used as a case study to validate the proposed approach. The tool is available on our website www.cs.iit.edu/ code/software/Y2U. © 2016 IEEE.
- Use runtime verification to improve the quality of medical care practiceY. Jiang, H. Liu, H. Kong, R. Wang, M. Hosseini, J. Sun, and L. ShaProceedings - International Conference on Software Engineering, 2016
Clinical guidelines and decision support systems (DSS) play an important role in daily practices of medicine. Many text-based guidelines have been encoded for work-flow simulation of DSS to automate health care. During the collaboration with Carle hospital to develop a DSS, we identify that, for some complex and life-critical diseases, it is highly desirable to automatically rigorously verify some complex temporal properties in guidelines, which brings new challenges to current simulation based DSS with limited support of automatical formal verification and real-time data analysis. In this paper, we conduct the first study on applying runtime verification to cooperate with current DSS based on real-time data. Within the proposed technique, a user-friendly domain specific language, named DRTV, is designed to specify vital real-time data sampled by medical devices and temporal properties originated from clinical guidelines. Some interfaces are developed for data acquisition and communication. Then, for medical practice scenarios described in DRTV model, we will automatically generate event sequences and runtime property verifier automata. If a temporal property violates, real-time warnings will be produced by the formal verifier and passed to medical DSS. We have used DRTV to specify different kinds of medical care scenarios, and applied the proposed technique to assist existing DSS. As presented in experiment results, in terms of warning detection, it outperforms the only use of DSS or human inspection, and improves the quality of clinical health care of hospital. © 2016 ACM.
- From Stateflow Simulation to Verified Implementation: A Verification Approach and A Real-Time Train Controller DesignY. Jiang, Y. Yang, H. Liu, H. Kong, M. Gu, J. Sun, and L. Sha2016 IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2016 - Proceedings, 2016
Simulink is widely used for model driven development (MDD) of industrial software systems. Typically, the Simulink based development is initiated from Stateflow modeling, followed by simulation, validation and code generation mapped to physical execution platforms. However, recent industrial trends have raised the demands of rigorous verification on safety-critical applications, which is unfortunately challenging for Simulink. In this paper, we present an approach to bridge the Stateflow based model driven development and a well- defined rigorous verification. First, we develop a self- contained toolkit to translate Stateflow model into timed automata, where major advanced modeling features in Stateflow are supported. Taking advantage of the strong verification capability of Uppaal, we can not only find bugs in Stateflow models which are missed by Simulink Design Verifier, but also check more important temporal properties. Next, we customize a runtime verifier for the generated nonintrusive VHDL and C code of Stateflow model for monitoring. The major strength of the customization is the flexibility to collect and analyze runtime properties with a pure software monitor, which opens more opportunities for engineers to achieve high reliability of the target system compared with the traditional act that only relies on Simulink Polyspace. We incorporate these two parts into original Stateflow based MDD seamlessly. In this way, safety-critical properties are both verified at the model level, and at the consistent system implementation level with physical execution environment in consideration. We apply our approach on a train controller design, and the verified implementation is tested and deployed on a real hardware platform. © 2016 IEEE.
- TaskShuffler: A Schedule Randomization Protocol for Obfuscation against Timing Inference Attacks in Real-Time SystemsM.-K. Yoon, S. Mohan, C.-Y. Chen, and L. Sha2016 IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2016 - Proceedings, 2016
The high degree of predictability in real-time systems makes it possible for adversaries to launch timing inference attacks such as those based on side-channels and covert-channels. We present TaskShuffler, a schedule obfuscation method aimed at randomizing the schedule for such systems while still providing the real-time guarantees that are necessary for their safe operation. This paper also analyzes the effect of these mechanisms by presenting schedule entropy - a metric to measure the uncertainty (as perceived by attackers) introduced by TaskShuffler. These mechanisms will increase the difficulty for would-be attackers thus improving the overall security guarantees for real-time systems. © 2016 IEEE.
- Sepsis Patient Detection and Monitor Based on Auto-BNY. Jiang, L. Sha, M. Rahmaniheris, B. Wan, M. Hosseini, P. Tan, and Jr. BerlinJournal of Medical Systems, 2016
Sepsis is a life-threatening condition caused by an inappropriate immune response to infection, and is a leading cause of elderly death globally. Early recognition of patients and timely antibiotic therapy based on guidelines improve survival rate. Unfortunately, for those patients, it is often detected late because it is too expensive and impractical to perform frequent monitoring for all the elderly. In this paper, we present a risk driven sepsis screening and monitoring framework to shorten the time of onset detection without frequent monitoring of all the elderly. Within this framework, the sepsis ultimate risk of onset probability and mortality is calculated based on a novel temporal probabilistic model named Auto-BN, which consists of time dependent state, state dependent property, and state dependent inference structures. Then, different stages of a patient are encoded into different states, monitoring frequency is encoded into the state dependent property, and screening content is encoded into different state dependent inference structures. In this way, the screening and monitoring frequency and content can be automatically adjusted when encoding the sepsis ultimate risk into the guard of state transition. This allows for flexible manipulation of the tradeoff between screening accuracy and frequency. We evaluate its effectiveness through empirical study, and incorporate it into existing medical guidance system to improve medical healthcare. © 2016, Springer Science+Business Media New York.
- Real-time reachability for verified simplex designT.T. Johnson, S. Bak, M. Caccamo, and L. ShaACM Transactions on Embedded Computing Systems, 2016
The Simplex architecture ensures the safe use of an unverifiable complex/smart controller by using it in conjunction with a verified safety controller and verified supervisory controller (switching logic). This architecture enables the safe use of smart, high-performance, untrusted, and complex control algorithms to enable autonomy without requiring the smart controllers to be formally verified or certified. Simplex incorporates a supervisory controller that will take over control from the unverified complex/smart controller if it misbehaves and use a safety controller. The supervisory controller should (1) guarantee that the system never enters an unsafe state (safety), but should also (2) use the complex/smart controller asmuch as possible (minimize conservatism). The problem of precisely and correctly defining the switching logic of the supervisory controller has previously been considered either using a control-theoretic optimization approach or through an offline hybrid-systems reachability computation. In this work, we show that a combined online/offline approach that uses aspects of the two earlier methods, along with a real-time reachability computation, also maintains safety, but with significantly less conservatism, allowing the complex controller to be used more frequently.We demonstrate the advantages of this unified approach on a saturated inverted pendulum system, inwhich the verifiable region of attraction is over twice as large compared to the earlier approach. Additionally, to validate the claims that the real-time reachability approach may be implemented on embedded platforms, we have ported and conducted embedded hardware studies using both ARM processors and Atmel AVR microcontrollers. This is the first ever demonstration of a hybrid-systems reachability computation in real time on actual embedded platforms, which required addressing significant technical challenges. © 2016 ACM.
- Memory bandwidth management for efficient performance isolation in multi-core platformsH. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. ShaIEEE Transactions on Computers, 2016
Memory bandwidth in modern multi-core platforms is highly variable for many reasons and it is a big challenge in designing real-time systems as applications are increasingly becoming more memory intensive. In this work, we proposed, designed, and implemented an efficient memory bandwidth reservation system, that we call MemGuard. MemGuard separates memory bandwidth in two parts: guaranteed and best effort. It provides bandwidth reservation for the guaranteed bandwidth for temporal isolation, with efficient reclaiming to maximally utilize the reserved bandwidth. It further improves performance by exploiting the best effort bandwidth after satisfying each core’s reserved bandwidth. MemGuard is evaluated with SPEC2006 benchmarks on a real hardware platform, and the results demonstrate that it is able to provide memory performance isolation with minimal impact on overall throughput. © 1968-2012 IEEE.
- Schedulability analysis for memory bandwidth regulated multicore real-time systemsG. Yao, H. Yun, Z.P. Wu, R. Pellizzoni, M. Caccamo, and L. ShaIEEE Transactions on Computers, 2016
Multicore architecture brings a significant challenge in designing critical real-time systems because of timing variability caused by concurrent accesses to shared memory. We propose a memory bandwidth regulated system architecture and a novel analysis method to address this challenge. In the proposed architecture, each core’s memory access rate is regulated in a globally coordinated manner. The architecture allows system designers to control the system to satisfy desired real-time performance. The proposed analysis method provides a way to calculate worst case response time of each real-time task independently from other activities on other cores; it only depends on the task under analysis, the assigned bandwidth, and the number of cores in the system. We believe this independence is critical to enable modular certification of critical real-time systems. We implement the proposed system model on the gem5 architecture simulator. We evaluate the proposed analysis method by comparing the computed runtime with the measured runtime on the modified simulator. We show that the analysis method provides reasonable upper-bounds based on the SPEC2006 benchmark suite. © 1968-2012 IEEE.
- Safety-assured formal model-driven design of the multifunction vehicle bus controllerY. Jiang, H. Liu, H. Song, H. Kong, M. Gu, J. Sun, and L. ShaLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016
In this paper, we present a formal model-driven engineering approach to establishing a safety-assured implementation of Multifunction vehicle bus controller (MVBC) based on the generic reference models and requirements described in the International Electrotechnical Commission (IEC) standard IEC-61375. First, the generic models described in IEC-61375 are translated into a network of timed automata, and some safety requirements tested in IEC-61375 are formalized as timed computation tree logic (TCTL) formulas. With the help of Uppaal, we check and debug whether the timed automata satisfy the formulas or not. Within this step, several logic inconsistencies in the original standard are detected and corrected. Then, we apply the tool Times to generate C code from the verified model, which was later synthesized into a real MVBC chip. Finally, the runtime verification tool RMOR is applied to verify some safety requirements at the implementation level. We set up a real platform with worldwide mostly used MVBC D113, and verify the correctness and the scalability of the synthesized MVBC chip more comprehensively. The errors in the standard has been confirmed and the resulted MVBC has been deployed in real train communication network. © Springer International Publishing AG 2016.
2015
- SINk: A middleware for synchronization of heterogeneous software interfacesM. Hosseini, Y. Jiang, P. Wu, Jr. Berlin, and L. Sha14th Workshop on Adaptive and Reflective Middleware, ARM 2015 - Collocated with ACM/IFIP/USENIX Middleware 2015, 2015
Software is everywhere. The increasing requirement of sup- porting a wide variety of domains has rapidly increased the complexity of software systems, making them hard to main- tain and the training process harder for end-users, which in turn ultimately demanded the development of user-friendly application software with simple interfaces that makes them easy, especially for non-professional personnel, to employ, and interact with. However, due to the lack of source code access for third- party software and the lack of non-graphical interfaces such as web-services or RMI (Remote Method Invocation) access to application functionality, synchronization between het- erogeneous closed-box software interfaces and a user-friendly version of those interfaces has become a major challenge. In this paper, we design SINk1, a middleware that enables synchronization of multiple heterogeneous software applica- tions, using only graphical interface, without the need for source code access or access to the entire platform’s con- trol. SINk helps with synchronization of closed-box industry software, where in fact the only possible way of communi- cation is through software interfaces. It leverages efficient client sever architecture, socket based protocol, adaptation to resolution changes, and parameter mapping mechanisms to transfer control events to ensure the real-time require- ments of synchronization among multiple interfaces are met. Our proof-of-concept evaluation shows there is in fact poten- tial usage of our middleware in a wide variety of domains. © 2015 ACM.
- Worst Case Analysis of Packet Delay in Avionics Systems for Environmental MonitoringK. Kang, M.-Y. Nam, and L. ShaIEEE Systems Journal, 2015
Early analysis of timing is essential in the design of reliable avionics systems. We consider an environmental monitoring system that allows the surroundings of an aircraft to be observed continuously in real time. We analyze timing aspects of the partitions within the sensor and monitoring nodes, and of the intermediate switches that connect them. We use the application-specific I/O integration support tool (ASIIST) to evaluate the worst case delay in the peripheral component interconnect buses of the end nodes. We describe a novel switching algorithm that guarantees a bounded delay for any feasible traffic through each switch, and then derive the worst case delay incurred in a switched network that contains switches operating with the proposed algorithm. By composing these delays, we are able to determine the end-to-end delay over the internal buses and network comprising the entire system, and show how it can be bounded by using our switching algorithm. Our worst case end-to-end delay analysis contributes to more reliable and better verified environmental monitoring services over packet-switched networks in avionics systems. We expect that our work will help reduce the cost of designing and implementing environmental monitoring avionics systems, by making it easier to identify unsatisfactory designs at an early stage. © 2014 IEEE.
- Safe Workflow Adaptation and Validation Protocol for Medical Cyber-Physical SystemsP.-L. Wu, L. Sha, R.B. Berlin, and J.M. GoldmanProceedings - 41st Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2015, 2015
In medical cyber-physical environments, synchronizing supervisory medical systems, physicians’ behavior and patient conditions in compliance with best practice workflow is essential for patient safety. However, patient conditions change rapidly and asynchronously, so workflows have to be adapted to the changes safely. In this paper, we propose a workflow adaptation and validation protocol to help physicians safely adapt workflows to react to patient adverse events based on the path physiological models. Unlike conventional validation protocols, the medical cyber-physical systems cannot lock or recover the states of physical components, such as patient conditions. Therefore, the proposed protocol dynamically adapts the workflow to the patient conditions while validating safety requirements in collaboration with physicians. Moreover, we use cardiac arrest resuscitation as a case study to verify the safety and correctness properties of the proposed protocol. © 2015 IEEE.
- WCET(m) Estimation in Multi-core Systems Using Single Core EquivalenceR. Mancuso, R. Pellizzoni, M. Caccamo, L. Sha, and H. YunProceedings - Euromicro Conference on Real-Time Systems, 2015
Multi-core platforms represent the answer of the industry to the increasing demand for computational capabilities. From a real-time perspective, however, the inherent sharing of resources, such as memory subsystem and I/O channels, creates inter-core timing interference among critical tasks and applications deployed on different cores. As a result, modular per-core certification cannot be performed, meaning that: (1) current industrial engineering processes cannot be reused, (2) software developed and certified for single-core chips cannot be deployed on multi-core platforms as is. In this work, we propose the Single Core Equivalence (SCE) technology: a framework of OS-level techniques designed for commercial (COTS) architectures that exports a set of equivalent single-core virtual machines from a multi-core platform. This allows per-core schedulability results to be calculated in isolation and to hold when multiple cores of the system run in parallel. Thus, SCE allows each core of a multi-core chip to be considered as a conventional single-core chip, ultimately enabling industry to reuse existing software, schedulability analysis methodologies and engineering processes. © 2015 IEEE.
- Memory Heat Map: Anomaly detection in real-time embedded systems using memory behaviorM.-K. Yoon, S. Mohan, J. Choi, and L. ShaProceedings - Design Automation Conference, 2015
In this paper, we introduce a novel mechanism that identifies abnormal system-wide behaviors using the predictable nature of real-time embedded applications. We introduce Memory Heat Map (MHM) to characterize the memory behavior of the operating system. Our machine learning algorithms automatically (a) summarize the information contained in the MHMs and then (b) detect deviations from the normal memory behavior patterns. These methods are implemented on top of a multicore processor architecture to aid in the process of monitoring and detection. The techniques are evaluated using multIPle attack scenarios including kernel rootkits and shellcode. To the best of our knowledge, this is the first work that uses aggregated memory behavior for detecting system anomalies especially the concept of memory heat maps. © 2015 ACM.
- Budgeted generalized rate monotonic analysis for the partitioned, yet globally scheduled uniprocessor modelJ.-E. Kim, T. Abdelzaher, and L. ShaIEEE Real-Time and Embedded Technology and Applications Symposium, RTAS, 2015
This paper solves the challenge of offline response time analysis of independent periodic tasks with constrained deadlines early in the software development cycle, under generalized rate-monotonic scheduling. CPU budgets are allocated to different applications and each application is composed of multiple periodic tasks that must share the same budget. Physical application requirements impose specifications on task periods and deadlines from the very beginning, but unlike the common assumption in traditional response time analysis, task execution times are not known. This is because task execution times depend on the exact system implementation, which is not finalized until later in the development cycle. Questions facing designers become: will my task meet its deadline given lack of knowledge of other tasks’ execution times? What is the smallest deadline that my task can meet? These questions are traditionally addressed by using a two level scheduler: CPU is partitioned and assigned to application, and task priorities are determined within the scope of an application, and when server becomes active it schedules the tasks locally. Such two level scheduling approach introduces priority inversion across applications. In our approach, different applications’ tasks are globally scheduled and yet the CPU resource is still partitioned and assigned to applications as a CPU budget. We schedule all the tasks globally while enforcing application budgets. The proposed new form of response time analysis is called budgeted generalized rate-monotonic analysis to compute the maximum response time for each task given only application budgets and task periods, but without knowledge of task execution times. We formulate this schedulability problem as a mixed integer linear programming problem and demonstrate a solution that computes the exact worst-case response times. Evaluation shows that our solution outperforms, in terms of schedulability, both global utilization bounds and mechanisms that attain temporal modularity via resource partitioning. © 2015 IEEE.
- The design of safe networked supervisory medical systems using organ-centric hierarchical control architectureW. Kang, L. Sha, Jr. Berlin, and J.M. GoldmanIEEE Journal of Biomedical and Health Informatics, 2015
There are growing demands to leverage network connectivity and interoperability of medical devices in order to improve patient safety and the effectiveness of medical services. However, if not properly designed, the integration of medical devices through networking could significantly increase the complexity of the system and make the system more vulnerable to potential errors, jeopardizing patient safety. The system must be designed and verified to guarantee the safety of patients and the effectiveness of medical services in the face of potential problems such as network failures. In this paper, we propose organ-centric hierarchical control architecture as a viable solution that reduces the complexity in system design and verification. In our approach, medical devices are grouped into clusters according to organ-specific human physiology. Each cluster captures common patterns arising out of medical device interactions and becomes a survivable semiautonomous unit during network failures. Further, safety verification and runtime enforcement can be modularized along organ-centric hierarchical control structure. We showthe feasibility of the proposed approach under Simulink’s model-based development framework. A simplified scenario for airway laser surgery is used as a case study. © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
- Schedulability bound for integrated modular avionics partitionsJ.-E. Kim, T. Abdelzaher, and L. ShaProceedings -Design, Automation and Test in Europe, DATE, 2015
In the avionics industry, as a hierarchical scheduling architecture Integrated Modular Avionics System has been widely adopted for its isolating capability. In practice, in an early development phase, a system developer does not know much about task execution times, but only task periods and IMA partition information. In such a case the schedulability bound for a task in a given partition tells a developer how much of the execution time the task can have to be schedulable. Once the developer knows the bound, then the developer can deal with any combination of execution times under the bound, which is safe in terms of schedulability. We formulate the problem as linear programming that is commonly used in the avionics industry for schedulability analysis, and compare the bound with other existing ones which are obtained with no period information. © 2015 EDAA.
- Exploiting structured human interactions to enhance estimation accuracy in cyber-Physical systemsY. Gao, S. Hu, R. Mancuso, H. Wang, M. Kim, P. Wu, L. Su, L. Sha, and T. AbdelzaherACM/IEEE 6th International Conference on Cyber-Physical Systems, ICCPS 2015, 2015
In this paper, we describe a general methodology for enhancing measurement accuracy in cyber-physical systems that involve structured human interactions with a noisy physical environment. We define structured human interactions as those that follow a domain-specific workflow. The idea of the paper is simple: We exploit knowledge of the workflow to correct unreliable sensor data. The intellectual contribution lies in an algorithm for joint estimation of the current state of the workflow together with correction of noisy sensor measurements, given only the noisy measurements and an overall workflow description. We demonstrate through simulations and a physical implementation the degree to which knowledge of workflow can increase sensing accuracy. As a specific instantiation of this idea, we present a novel situation-awareness tool called the Emergency Transcriber designed to automatically document operational procedures followed by teams of first responders in emergency-response scenarios. Evaluation shows that our system provides a significant fidelity enhancement over the state of the art, effectively coping with the noisy environment of emergency teams.
- Medical-grade quality of service for real-time mobile healthcareK. Kang, Q. Wang, J. Hur, K.-J. Park, and L. ShaComputer, 2015
A wireless electrocardiogram case study suggests that current CDMA2000 cellular technology has considerable potential in medical telemetry. Modifications to the network protocol stack ensure the highest data integrity and lowest service delay. © 2015 IEEE.
- Real-time reachability for verified simplex designS. Bak, T.T. Johnson, M. Caccamo, and L. ShaProceedings - Real-Time Systems Symposium, 2015
The Simplex Architecture ensures the safe use of an unverifiable complex controller by using a verified safety controller and verified switching logic. This architecture enables the safe use of high-performance, untrusted, and complex control algorithms without requiring them to be formally verified. Simplex incorporates a supervisory controller and safety controller that will take over control if the unverified logic misbehaves. The supervisory controller should (1) guarantee the system never enters and unsafe state (safety), but (2) use the complex controller as much as possible (minimize conservatism). The problem of precisely and correctly defining this switching logic has previously been considered either using a control-theoretic optimization approach, or through an offline hybrid systems reach ability computation. In this work, we prove that a combined online/offline approach, which uses aspects of the two earlier methods along with a real-time reach ability computation, also maintains safety, but with significantly less conservatism. We demonstrate the advantages of this unified approach on a saturated inverted pendulum system, where the usable region of attraction is 227% larger than the earlier approach. © 2014 IEEE.
- Controller Redundancy Design for Cyber-Physical SystemsJ. Yao, X. Liu, G. Zhu, and L. Sha2015
Continuous Systems 72 3.4.2 MSR of Interpolation Gain-Scheduling Controller with Delays 73 3.5 Stability Property of Switched Linear Systems from NetMSR and Reach Set Perspective 74 3.5.1 Reach Set of States with Delay 74 3.5.2 Switch Logic 75 3.5.3 Stability Analysis 75 3.6 Evaluations 77 3.6.1 Satellite Altitude Control Model 77 3.6.2 Simulation Configuration 77 3.6.3 NetMSR Evaluations 78 3.6.4 Reliability Evaluations 80 3.7 Conclusion and Future Work 82 Acknowledgments 82 3.A Appendix 83 3.A.1 Proof of Lemma 3.3 83 3.A.2 Proof of Lemma 3.4 84 References 85 Systems where the physical subsystem is tightly integrated with the cyber subsystem are usually referred as cyber-physical systems (CPS) [1-5]. CPS applications can be found in many areas, such as automated manufacturing, health care, civil infrastructure, avionics, aircraft, and vehicular communications [6]. Networked control systems (NCS) are considered to be a prominent subcategory of CPS, in which networks provide a means for communication among sensors, actuators, and controllers. As networking technology continues to increase in bandwidth capability and decrease in price, NCS will become more pervasive and have the potential to radically change the way the physical systems interact with each other and their environment. Communication networks have been used to exchange information among sensors, controllers, and actuators in digital control systems since the 1970s, originally motivated by the need of reducing costs for cabling, modularizing, and integrating system flexibly in the automobile industry. Today, many other applications of NCS can be easily found in diverse industries. For example, Avionics Full-Duplex Switched Ethernet (AFDX) is used in control systems of the Airbus A380 and Boeing B787 [7], and a new automotive network communications protocol, FlexRay was developed for vehicle control and is used in automobiles, such as the BMW 7 Series (F01), Audi A8, and Rolls-Royce Ghost [8]. Employing communications networks in control systems may considerably reduce integration complexity and make system architecture more scalable. © 2016 by Taylor and Francis Group, LLC.
2014
- Applying software model checking to PALS systemsM.-Y. Nam, L. Sha, S. Chaki, and C. KimAIAA/IEEE Digital Avionics Systems Conference - Proceedings, 2014
Physically Asynchronous/Logically Synchronous (PALS) is an architecture pattern that allows developers to design and verify a system as though all nodes executed synchronously. The correctness of PALS protocol was formally verified. However, the implementation of PALS adds additional code that is otherwise not needed. In our case, we have a middleware (PALSWare) that supports PALS systems. In this paper, we introduce a verification framework that shows how we can apply Software Model Checking (SMC) to verify a PALS system at the source code level. SMC is an automated and exhaustive source code checking technology. Compared to verifying (hardware or software) models, verifying the actual source code is more useful because it minimizes any chance of false interpretation and eliminates the possibility of missing software bugs that were absent in the model but introduced during implementation. In other words, SMC reduces the semantic gap between what is verified and what is executed. Our approach is compositional, i.e., the verification of PALSWare is done separately from applications. Since PALSWare is inherently concurrent, to verify it via SMC we must overcome the statespace explosion problem, which arises from concurrency and asynchrony. To this end, we develop novel simplification abstractions, prove their soundness, and then use these abstractions to reduce the verification of a system with many threads to verifying a system with a relatively small number of threads. When verifying an application, we leverage the (already verified) synchronicity guarantees provided by the PALSWare to reduce the verification complexity significantly. Thus, our approach uses both ’abstraction’ and ’composition’, the two main techniques to reduce statespace explosion. This separation between verification of PALSWare and applications also provides better management against upgrades to either. We validate our approach by verifying the current PALSWare implementation, and several PALSWare-based distributed real time applications. © 2014 IEEE.
- qPALS: Quality-aware synchrony protocol for distributed real-time systemsW. Kang, and L. ShaKSII Transactions on Internet and Information Systems, 2014
Synchronous computing models provided by real-time synchrony protocols, such as TTA [1] and PALS [2], greatly simplify the design, implementation, and verification of real-time distributed systems. However, their application to real systems has been limited since their assumptions on underlying systems are hard to satisfy. In particular, most previous real-time synchrony protocols hypothesize the existence of underlying fault tolerant real-time networks. This, however, might not be true in most soft real-time applications. In this paper, we propose a practical approach to a synchrony protocol, called Quality-Aware PALS (qPALS), which provides the benefits of a synchronous computing model in environments where no fault-tolerant real-time network is available. qPALS supports two flexible global synchronization protocols: one tailored for the performance and the other for the correctness of synchronization. Hence, applications can make a negotiation flexibly between performance and correctness. In qPALS, the Quality-of-Service (QoS) on synchronization and consistency is specified in a probabilistic manner, and the specified QoS is supported under dynamic and unpredictable network environments via a control-theoretic approach. Our simulation results show that qPALS supports highly reliable synchronization for critical events while still supporting the efficiency and performance even when the underlying network is not stable. © 2014 KSII.
- A treatment validation protocol for cyber-physical-human medical systemsP.-L. Wu, D. Raguraman, L. Sha, R.B. Berlin, and J.M. GoldmanProceedings - 40th Euromicro Conference Series on Software Engineering and Advanced Applications, SEAA 2014, 2014
In cyber-physical-human medical environments, coordinating supervisory medical systems and medical staff to perform treatments in in accordance with best practice is essential for patient safety. However, the dynamics of patient conditions and the non-deterministic nature of potential side effects of treatments pose significant challenges. In this paper, we propose a validation protocol to enforce the correct execution sequence of performing treatment, regarding preconditions validation, side effects monitoring, and expected responses checking based on the path physiological models. The proposed protocol organizes the medical information concisely and comprehensively to help medical staff validate treatments. Unlike traditional validation mechanism for cyber systems, the medical system cannot lock or rollback the states of physical components, such as patient conditions. Therefore, the proposed protocol dynamically adapts to the patient conditions and side effects of treatments. Moreover, a cardiac arrest scenario is used as a case study to verify the safety and correctness properties of the proposed protocol. © 2014 IEEE.
- Integrated Modular Avionics (IMA) partition scheduling with conflict-free I/O for multicore avionics systemsJ.-E. Kim, M.-K. Yoon, R. Bradford, and L. ShaProceedings - International Computer Software and Applications Conference, 2014
The trend in the semiconductor industry toward multicore processors poses a significant challenge to many suppliers of safety-critical real-time embedded software. Having certified their systems for use on single-core processors, these companies may be forced to migrate their installed base of software onto multicore processors as single-core processors become harder to obtain. These companies naturally want to minimize the potentially high costs of recertifying their software for multicore processors. In support of this goal, we propose an approach to solving a fundamental problem in migrating legacy software applications to multicore systems, namely that of preventing conflicts among I/O transactions from applications residing on different cores. We formalize the problem as a partition scheduling problem that serializes I/O partitions. Although this problem is strongly NP-complete, we formulate it as a Constraint Programming (CP) problem. Since the CP approach scales poorly, we propose a heuristic algorithm that outperforms the CP approach in scalability. © 2014 IEEE.
- Towards a cyber-medical model for device configuration safety in acute careM. Rahmaniheris, L. Sha, R.B. Berlin, and J.M. Goldman2014 IEEE Healthcare Innovation Conference, HIC 2014, 2014
The safety and efficacy of acute patient care depends on continuously proper use of medical monitoring and therapeutic devices. The device configuration safety depends on the rapidly changing patient condition. Therefore, the constraints to be enforced must be adapted dynamically. In this paper, we model the patient condition as communicating organ state machines, and the change of patient condition is modeled as the transition between organ states. The states are represented based on a pathophysiological model, which captures the changes in organ states as a results of one or more diseases. Applying the proposed patient state representation, we have a different set of device configuration requirements for each state. We introduce a new process for runtime validation of monitoring/ therapeutic configuration safety requirements using the proposed communicating organ state machines. We illustrate the usefulness of the proposed approach using a cardiac arrest scenario. © 2014 IEEE.
- WiP abstract: A treatment coordination protocol for cyber-physical-human medical systemsP.-L. Wu, D. Raguraman, L. Sha, R.B. Berlin Jr., and J.M. Goldman2014 ACM/IEEE International Conference on Cyber-Physical Systems, ICCPS 2014, 2014
Improving effectiveness and safety of healthcare infrastructure is an important issue for cyber-physical-human medical systems. However, statistics indicates that preventable medical error rate is highest in ICU as compared to other hospital units. Many preventable medical errors are result from unintended deviation from the best practice guidelines, because medical staff are under tremendous pressure and overloaded by great amount of unorganized information. In order to reduce preventable medical errors, we develop a coordination protocol to validate treatments based on the pathophys-iological models 1. © 2014 IEEE.
- Guaranteeing the end-to-end latency of an IMA system with an increasing workloadM.-Y. Nam, J. Lee, K.-J. Park, L. Sha, and K. KangIEEE Transactions on Computers, 2014
New features are often added incrementally to avionics systems to minimize the need for redesign and recertification. However, it then becomes necessary to check that the timing constraints of existing as well as new applications are met. We facilitate these checks by introducing a new data switch that bounds the latency of end-to-end communications across a network. This switch runs a clock-driven switching algorithm that is throughput-optimal with a bounded worst-case delay for all feasible traffic. We propose associated heuristics that determine whether the timing constraints of an integrated modular avionics (IMA) system network that uses this switch are met, even if new features have caused traffic to increase, and then search for alternative network configurations if necessary. Virtual integration is used to make a combined analysis of the worst-case delay in the network and the local buses of individual computing modules. This analysis considers the shared network topology, local hardware architectures, and specified IMA configurations. Our approach can be used by a system architect as an effective method for quickly determining which possible system architectures should be pursued to meet timing constraints, and it allows the cascading effects of changes to be tracked and managed. We demonstrate how these heuristics work through an example in which changes are made to an environmental monitoring facility within an avionics system that uses our switch. © 2013 IEEE.
2013
- Modeling and architecture design of an MDPnP acute care monitoring systemM. Rahmaniheris, W. Kang, L.-J. Lee, L. Sha, R.B. Berlin Jr., and J.M. GoldmanProceedings of CBMS 2013 - 26th IEEE International Symposium on Computer-Based Medical Systems, 2013
Medical Device Plug-and-Play (MDPnP) permits the mitigation of preventable medical errors. However, MDPnP results in a dynamic environment, which presents great challenges to traditional software architectures. This paper addresses these challenges by describing an abstract representation of dynamic clinical environment and a modular architecture is utilized to support safe system reconfiguration. © 2013 IEEE.
- Towards organ-centric compositional development of safe networked supervisory medical systemsW. Kang, P. Wu, M. Rahmaniheris, L. Sha, R.B. Berlin Jr., and J.M. GoldmanProceedings of CBMS 2013 - 26th IEEE International Symposium on Computer-Based Medical Systems, 2013
Medical devices are increasingly capable of interacting with each other by leveraging network connectivity and interoperability, promising a great benefit for patient safety and effectiveness of medical services. However, ad-hoc integration of medical devices through networking can significantly increase the complexity of the system and make the system more vulnerable to potential errors and safety hazards. In this paper, we address this problem and introduce an organ-centric compositional development approach. In our approach, medical devices are composed into semi-autonomous clusters according to organ-specific physiology in a network-fail-safe manner. Each organ-centric cluster captures common device interaction patterns of sensing and control to support human physiology. The library of these formally verified organ-centric architectural patterns enables rapid and safe composition of supervisory controllers, which are specialized for specific medical scenarios. Using airway-laser surgery as a case study of practical importance, we demonstrate the feasibility of our approach under Simulink’s model-driven development framework. © 2013 IEEE.
- L1Simplex: Fault-tolerant control of cyber-physical systemsX. Wang, N. Hovakimyan, and L. Sha2013 ACM/IEEE International Conference on Cyber-Physical Systems, ICCPS 2013, 2013
As the complexity of Cyber-Physical Systems (CPS) increases, it becomes more and more challenging to ensure the reliability of CPS, especially in the presence of system failures. Simplex architecture is shown to be an efficient tool to address the software failure in such systems. However, when physical failures also appear, Simplex does not work any more because the physical dynamics change due to physical failures. The Simplex architecture designed for the original physical model may not be suitable for the new dynamics. To address both software and physical failures, this paper presents the L1Simplex architecture, which contains the safety monitor, the high-performance controller (HPC), the L1- based high-assurance controller (HAC), and the decision logic for controller switching. The safety monitor is used to monitor the system behavior. It leads to another controller switching rule besides the stability-envelope-based rule in the decision logic. The HAC is designed based on the L1 adaptive controller, with which the stability envelope is computed. We show that the L1Simplex architecture can efficiently handle a class of software and physical failures. © 2013 IEEE.
- A low complexity coordination architecture for networked supervisory medical systemsP.-L. Wu, W. Kang, A. Ai-Nayeem, L. Sha, R.B. Berlin Jr., and J.M. Goldman2013 ACM/IEEE International Conference on Cyber-Physical Systems, ICCPS 2013, 2013
Cooperating medical devices, envisioned by Integrated Clinical Environment (ICE) of Medical Device Plug-and-Play (MDPnP), is expected to improve the safety and the quality of patient care. To ensure safety, the cooperating medical devices must be thoroughly verified and tested. However, concurrent control of devices without proper coordination poses a significant challenge for the verification of the safety, since complex interaction patterns between devices might cause the explosion of the verification state space. In this paper, we propose a low-complexity coordination architecture and protocol for networked supervisory medical systems. The proposed architecture organizes the systems in a hierarchical and organ-based manner in accordance to human physiology and home-ostasis. Further, the proposed protocol avoids potential conflicts and unsafe controls, while allowing efficient concurrent operations of medical devices. The evaluation results show that our approach reduce the complexity by several orders of magnitude. © 2013 IEEE.
- A low complexity coordination architecture for networked supervisory medical systemsP.-L. Wu, W. Kang, A. Al-Nayeem, L. Sha, R.B. Berlin Jr., and J.M. GoldmanACM/IEEE 4th International Conference on Cyber-Physical Systems, ICCPS 2013, 2013
Cooperating medical devices, envisioned by Integrated Clinical Environment (ICE) of Medical Device Plug-and-Play (MDPnP), is expected to improve the safety and the quality of patient care. To ensure safety, the cooperating medical devices must be thoroughly verified and tested. However, concurrent control of devices without proper coordination poses a significant challenge for the verification of the safety, since complex interaction patterns between devices might cause the explosion of the verification state space. In this paper, we propose a low-complexity coordination architecture and protocol for networked supervisory medical systems. The proposed architecture organizes the systems in a hierarchical and organ-based manner in accordance to human physiology and home-ostasis. Further, the proposed protocol avoids potential conflicts and unsafe controls, while allowing efficient concurrent operations of medical devices. The evaluation results show that our approach reduce the complexity by several orders of magnitude. Copyright 2013 ACM.
- L1Simplex: Fault-tolerant control of cyber-physical systemsX. Wang, N. Hovakimyan, and L. ShaACM/IEEE 4th International Conference on Cyber-Physical Systems, ICCPS 2013, 2013
As the complexity of Cyber-Physical Systems (CPS) increases, it becomes more and more challenging to ensure the reliability of CPS, especially in the presence of system failures. Simplex architecture is shown to be an efficient tool to address the software failure in such systems. However, when physical failures also appear, Simplex does not work any more because the physical dynamics change due to physical failures. The Simplex architecture designed for the original physical model may not be suitable for the new dynamics. To address both software and physical failures, this paper presents the L1Simplex architecture, which contains the safety monitor, the high-performance controller (HPC), the L1-based high-assurance controller (HAC), and the decision logic for controller switching. The safety monitor is used to monitor the system behavior. It leads to another controller switching rule besides the stability-envelope-based rule in the decision logic. The HAC is designed based on the L1 adaptive controller, with which the stability envelope is computed. We show that the L1Simplex architecture can efficiently handle a class of software and physical failures. Copyright 2013 ACM.
- SecureCore: A multicore-based intrusion detection architecture for real-time embedded systemsM.-K. Yoon, S. Mohan, J. Choi, J.-E. Kim, and L. ShaReal-Time Technology and Applications - Proceedings, 2013
Security violations are becoming more common in real-time systems - an area that was considered to be invulnerable in the past - as evidenced by the recent W32.Stuxnet and Duqu worms. A failure to protect such systems from malicious entities could result in significant harm to both humans as well as the environment. The increasing use of multicore architectures in such systems exacerbates the problem since shared resources on these processors increase the risk of being compromised. In this paper, we present the SecureCore framework that, coupled with novel monitoring techniques, is able to improve the security of realtime embedded systems. We aim to detect malicious activities by analyzing and observing the inherent properties of the real-time system using statistical analyses of their execution profiles. With careful analysis based on these profiles, we are able to detect malicious code execution as soon as it happens and also ensure that the physical system remains safe. © 2013 IEEE.
- MemGuard: Memory bandwidth reservation system for efficient performance isolation in multi-core platformsH. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. ShaReal-Time Technology and Applications - Proceedings, 2013
Memory bandwidth in modern multi-core platforms is highly variable for many reasons and is a big challenge in designing real-time systems as applications are increasingly becoming more memory intensive. In this work, we proposed, designed, and implemented an efficient memory bandwidth reservation system, that we call MemGuard. MemGuard distinguishes memory bandwidth as two parts: guaranteed and best effort. It provides bandwidth reservation for the guaranteed bandwidth for temporal isolation, with efficient reclaiming to maximally utilize the reserved bandwidth. It further improves performance by exploiting the best effort bandwidth after satisfying each core’s reserved bandwidth. MemGuard is evaluated with SPEC2006 benchmarks on a real hardware platform, and the results demonstrate that it is able to provide memory performance isolation with minimal impact on overall throughput. © 2013 IEEE.
- S3A: Secure system simplex architecture for enhanced security and robustness of cyber-physical systemsS. Mohan, S. Bak, E. Betti, H. Yun, L. Sha, and M. CaccamoHiCoNS 2013 - 2nd ACM International Conference on High Confidence Networked Systems, Part of CPSWeek 2013, 2013
The recently discovered ’W32.Stuxnet’ worm has drastically changed the perception that systems managing critical infrastructure are invulnerable to software security attacks. Here we present an architecture that enhances the security of safety-critical cyber-physical systems despite the presence of such malware. Our architecture uses the property that control systems have deterministic real-time) execution behavior to detect an intrusion within 0.6 μs while still guaranteeing the safety of the plant. We also show that even if an attacker is successful (or gains access to the operating system’s administrative privileges), the overall state of the physical system still remains safe. © 2013 ACM.
- Net simplex: Controller fault tolerance architecture in networked control systemsJ. Yao, X. Liu, G. Zhu, and L. ShaIEEE Transactions on Industrial Informatics, 2013
The assurance of reliability becomes increasingly challenging as the complexity of networked control systems (NCS) rapidly increases. Simplex architecture was designed to tolerate control software design and implementation. This architecture consists of a high assurance controller (HAC) and a high performance controller (HPC). The HAC uses the linear state feedback control to create a large maximum stability region (MSR). The HPC aims at achieving a better control performance and may use any design. However, the plant’s states under HPC must stay within the MSR, or the control is switched to HAC. © 2005-2012 IEEE.
- Model-based analysis of wireless system architectures for real-time applicationsK. Kang, M.-Y. Nam, and L. ShaIEEE Transactions on Mobile Computing, 2013
We propose a model-based description and analysis framework for the design of wireless system architectures. Its aim is to address the shortcomings of existing approaches to system verification and the tracking of anomalies in safety-critical wireless systems. We use Architecture Analysis and Description Language (AADL) to describe an analysis-oriented architecture model with highly modular components. We also develop the cooperative tool chains required to analyze the performance of a wireless system by simulation. We show how this framework can support a detailed and largely automated analysis of a complicated, networked wireless system using examples from wireless healthcare and video broadcasting. © 2002-2012 IEEE.
- Towards organ-centric compositional development of safe networked supervisory medical systemsW. Kang, P. Wu, M. Rahmaniheris, L. Sha, R.B. Berlin Jr., and J.M. GoldmanProceedings - IEEE Symposium on Computer-Based Medical Systems, 2013
Medical devices are increasingly capable of interacting with each other by leveraging network connectivity and interoperability, promising a great benefit for patient safety and effectiveness of medical services. However, ad-hoc integration of medical devices through networking can significantly increase the complexity of the system and make the system more vulnerable to potential errors and safety hazards. In this paper, we address this problem and introduce an organ-centric compositional development approach. In our approach, medical devices are composed into semi-autonomous clusters according to organ-specific physiology in a network-fail-safe manner. Each organ-centric cluster captures common device interaction patterns of sensing and control to support human physiology. The library of these formally verified organ-centric architectural patterns enables rapid and safe composition of supervisory controllers, which are specialized for specific medical scenarios. Using airway-laser surgery as a case study of practical importance, we demonstrate the feasibility of our approach under Simulink’s model-driven development framework. © 2013 IEEE.
- Modeling and architecture design of an MDPnP acute care monitoring systemM. Rahmaniheris, W. Kang, L.-J. Lee, L. Sha, R.B. Berlin Jr., and J.M. GoldmanProceedings - IEEE Symposium on Computer-Based Medical Systems, 2013
Medical Device Plug-and-Play (MDPnP) permits the mitigation of preventable medical errors. However, MDPnP results in a dynamic environment, which presents great challenges to traditional software architectures. This paper addresses these challenges by describing an abstract representation of dynamic clinical environment and a modular architecture is utilized to support safe system reconfiguration. © 2013 IEEE.
- Middleware design for physically-asynchronous logically-synchronous (PALS) systemsA. Al-Nayeem, C. Kim, W. Kang, P.-L. Wu, and L. Sha2013 International Conference on Embedded Software, EMSOFT 2013, 2013
The Physically-Asynchronous Logically-Synchronous (PALS) system is a recently proposed architectural pattern for cyber-physical systems. It guarantees a logically synchronous design abstraction for real-time distributed computations. In this work, we develop a new middleware, called PALSware, to support an efficient and robust implementation of the PALS system and its extensions. PALSware guarantees consistency in distributed applications by eliminating any asynchronous interactions resulting from distributed clocks and node failures. We present a layered design for this middle-ware that is both reusable in different system architectures and can be extended with architecture-specific solutions for fault management. We demonstrate the middleware for an academic control testbed and show the consistency in a fault injection framework designed for this middleware. © 2013 IEEE.
- On-chip control flow integrity check for real time embedded systemsF.A.T. Abad, J.V.D. Woude, Y. Lu, S. Bak, M. Caccamo, L. Sha, R. Mancuso, and S. Mohan2013 IEEE 1st International Conference on Cyber-Physical Systems, Networks, and Applications, CPSNA 2013, 2013
Modern industrial plants, vehicles and other cyber-physical systems are increasingly being built as an aggregation of embedded platforms. Together with the soaring number of such systems and the current trends of increased connectivity, new security concerns are emerging. Classic approaches to security are not often suitable for embedded platforms. In this paper we propose a hardware based approach for checking the integrity of code flow of real-time tasks whit precisely predictable overheads that do not affect the critical path. Specifically, we employ a hardware module to perform control flow graph (CFG) validation at run-time of real-time component. For this purpose, we developed a binary-based, CFG generation tool. In addition, we also present our implementation of a CFG integrity checking module. The proposed approach is aimed at improving real-time systems security. © 2013 IEEE.
- Optimized scheduling of multi-IMA partitions with exclusive region for synchronized real-time multi-core systemsJ.-E. Kim, M.-K. Yoon, S. Im, R. Bradford, and L. ShaProceedings -Design, Automation and Test in Europe, DATE, 2013
Integrated Modular Avionics (IMA) architecture has been widely adopted by the avionics industry due to its strong temporal and spatial isolation capability for safety-critical realtime systems. The fundamental challenge to integrating an existing set of single-core IMA partitions into a multi-core system is to ensure that the isolation of the partitions will be maintained without incurring huge redevelopment and recertification costs. To address this challenge, we developed an optimized partition scheduling algorithm which considers exclusive regions to achieve the synchronization between partitions across cores. We show that the problem of finding the optimal partition schedule is NP-complete and present a Constraint Programming formulation. In addition, we relax this problem to find the minimum number of cores needed to schedule a given set of partitions and propose an approximation algorithm which is guaranteed to find a feasible schedule of partitions if there exists a feasible schedule of exclusive regions. © 2013 EDAA.
- Holistic design parameter optimization of multiple periodic resources in hierarchical schedulingM.-K. Yoon, J.-E. Kim, R. Bradford, and L. ShaProceedings -Design, Automation and Test in Europe, DATE, 2013
Hierarchical scheduling of periodic resources has been increasingly applied to a wide variety of real-time systems due to its ability to accommodate various applications on a single system through strong temporal isolation. This leads to the question of how one can optimize over the resource parameters while satisfying the timing requirements of real-time applications. A great deal of research has been devoted to deriving the analytic model for the bounds on the design parameter of a single resource as well as its optimization. The optimization for multiple periodic resources, however, requires a holistic approach due to the conflicting requirements of the limited computational capacity of a system among resources. Thus, this paper addresses a holistic optimization of multiple periodic resources with regard to minimum system utilization. We extend the existing analysis of a single resource in order for the variable interferences among resources to be captured in the resource bound, and then solve the problem with Geometric Programming (GP). The experimental results show that the proposed method can find a solution very close to the one optimized via an exhaustive search and that it can explore more solutions than a known heuristic method. © 2013 EDAA.
- Design and QoS of a wireless system for real-time remote electrocardiographyK. Kang, J. Ryu, J. Hur, and L. ShaIEEE Journal of Biomedical and Health Informatics, 2013
Quality of service (QoS) and, in particular, reliability and a bounded low latency are essential attributes of safety-critical wireless systems for medical applications. However, wireless links are typically prone to bursts of errors, with characteristics which vary over time.We propose a wireless system suitable for real-time remote patient monitoring in which the necessary reliability and guaranteed latency are both achieved by an efficient error control scheme. We have paired an example remote electrocardiography application to this wireless system. We also developed a tool chain that uses a formal description of the proposed wireless medical system architecture in the architecture analysis and design language to assess various combinations of system parameters: we can determine the QoS in terms of packet-delivery ratio and the service latency, and also the size of jitter buffer required for seamless ECG monitoring. A realistic assessment, based on data from the MIT-BIT arrhythmia database, shows that the proposed wireless system can achieve an appropriate level of QoS for real-time ECG monitoring if link-level error control is correctly implemented. Additionally, we present guidelines for the design of energy-efficient link-level error control, derived from energy data, obtained from simulations. © 2013 IEEE.
- Design of a crossbar VOQ real-time switch with clock-driven scheduling for a guaranteed delay boundK. Kang, K.-J. Park, L. Sha, and Q. WangReal-Time Systems, 2013
Most commercial network switches are designed to achieve good average throughput and delay needed for Internet traffic, whereas hard real-time applications demand a bounded delay. Our real-time switch combines clearance-time-optimal switching with clock-based scheduling on a crossbar switching fabric. We use real-time virtual machine tasks to serve both periodic and aperiodic traffic, which simplifies analysis and provides isolation from other system operations. We can then show that any feasible traffic will be switched in two clock periods. This delay bound is enabled by introducing one-shot traffic, which can be constructed at the cost of a fixed delay of one clock period. We carry out simulation to compare our switch with the popular iSLIP crossbar switch scheduler. Our switch has a larger schedulability region, a bounded lower end-to-end switching delay, and a shorter clearance time which is the time required to serve every packet in the system. © 2012 Springer Science+Business Media New York.
- Real-time I/O management system with COTS peripheralsE. Betti, S. Bak, R. Pellizzoni, M. Caccamo, and L. ShaIEEE Transactions on Computers, 2013
Real-time embedded systems are increasingly being built using commercial-off-the-shelf (COTS) components such as mass-produced peripherals and buses to reduce costs, time-to-market, and increase performance. Unfortunately, COTS-interconnect systems do not usually guarantee timeliness, and might experience severe timing degradation in the presence of high-bandwidth I/O peripherals. Moreover, peripherals do not implement any internal priority-based scheduling mechanism, hence, sharing a device can result in data of high priority tasks being delayed by data of low priority tasks. To address these problems, we designed a real-time I/O management system comprised of 1) real-time bridges with I/O virtualization capabilities, and 2) a peripheral scheduler. The proposed framework is used to transparently put the I/O subsystem of a COTS-based embedded system under the discipline of real-time scheduling, minimizing the timing unpredictability due to the peripherals sharing the bus. We also discuss computing the maximum delay due to buffered I/O data transactions as well as determining the buffer size needed to avoid data loss. Finally, we demonstrate experimentally that our prototype real-time I/O management system successfully exports multiple virtual devices for a single physical device and prioritizes I/O traffic, guaranteeing its timeliness. © 2012 IEEE.
2012
- Model-based design of a wireless telemetry system and QoS assessment using AADLK. Kang, M.-Y. Nam, J. Lee, J. Park, H. Yoo, and L. ShaProceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012, 2012
High reliability and acceptable latency are major quality-of-service (QoS) considerations in the design of wireless telemetry systems. In this paper, we show how to analyze the QoS of the wireless telemetry system using its formal description written in the Architecture Analysis & Design Language (AADL) and integrated tool chain, with an exemplar electrocardiogram monitoring application. © 2012 IEEE.
- Modeling towards incremental early analyzability of networked avionics systems using virtual integrationM.-Y. Nam, K. Kang, R. Pellizzoni, K.-J. Park, J.-E. Kim, and L. ShaTransactions on Embedded Computing Systems, 2012
With the advance of hardware technology, more features are incrementally added to already existing networked systems. Avionics has a stronger tendency to use preexisting applications due to its complexity and scale. As resource sharing becomes intense among the network and the computing modules, it has become a difficult task for the system designer to make confident architectural decisions even for incremental changes. Providing a tailored environment to model and analyze incremental changes requires a combination of software tools and hardware support.We have built a virtual integration tool called ASIIST which can provide a worst-case end-to-end latency of data that is sent through a network and the internal bus architecture of the end-systems. Also, we have devised a new real-time switching algorithm which guarantees the worst-case network delay of preexisting network traffic under feasible conditions. With the real-time switch support, ASIIST can provide an early modularized analysis of the end-to-end latency to make architectural design choices and incremental changes easier for the user. © 2012 ACM.
- How to reliably integrate medical devices over wirelessC. Kim, M. Sun, M. Rahmaniheris, and L. ShaAnnual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks workshops, 2012
This demonstration presents our NASS (Network Aware Supervisory System) framework prototype for medical device integration systems. The NASS framework interconnects medical devices over wireless for convenience, seamlessness and sanitation, and provides safety-guaranteed supervision. Our prototype was developed in Sun Java Real-time Environment. Real-time Java provides well-formed convenience of dynamically loading and unloading medical application logic and safety rules on the fly in real-time environments. To tackle the complexity of using real-time Java in the safety-critical system, we also applied HW/SW codesign method. Real-time Java Environment Linux operating system may not be robust enough for medical devices to fully rely on. In our prototype, the supervisor software in Java performs all logical decisions including contingency plan generation derived from the safety rules. Once logic is decided, the decisions and plans for the devices are delivered to the hardware implemented in FPGA at each device to physically drive medical equipments. Since the execution of decisions and plans are delegated to the hardware, any failure in software does not harm the integrated safety. Our demonstration shows how safety is managed in different kinds of failures from wireless network failures to device software failures. © 2012 IEEE.
- Memory access control in multiprocessor for real-time systems with mixed criticalityH. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. ShaProceedings - Euromicro Conference on Real-Time Systems, 2012
Shared resource access interference, particularly memory and system bus, is a big challenge in designing predictable real-time systems because its worst case behavior can significantly differ. In this paper, we propose a software based memory throttling mechanism to explicitly control the memory interference. We developed analytic solutions to compute proper throttling parameters that satisfy schedulability of critical tasks while minimize performance impact caused by throttling. We implemented the mechanism in Linux kernel and evaluated isolation guarantee and overall performance impact using a set of synthetic and real applications. © 2012 IEEE.
- Pattern-based composition and analysis of virtually synchronized real-time distributed systemsA. Al-Nayeem, L. Sha, D.D. Cofer, and S.M. MillerProceedings - 2012 IEEE/ACM 3rd International Conference on Cyber-Physical Systems, ICCPS 2012, 2012
Designing and verifying distributed protocols in a multi-rate asynchronous system is, in general, extremely difficult when the distributed computations require consistent input views, consistent actions and synchronized state transitions. In this paper, we address this problem and introduce a formal, complexity-reducing architectural pattern, called Multi-Rate PALS system, to support virtual synchronization in multi-rate distributed computations. The pattern supports a component to be virtually synchronized with other components in different instantiations of this pattern. We present an application of a hierarchical control system to show that the composition of these instantiations can be used to achieve desired system-level properties, such as distributed consistency and distributed coordination. We verify the logical synchronization guarantee of this pattern, which holds as long as the pattern assumptions are satisfied. We also discuss the correctness analysis necessary to validate these assumptions and provide a tool support to perform this analysis automatically on the AADL models. © 2012 IEEE.
- Compositional verification of architectural modelsD. Cofer, A. Gacek, S. Miller, M.W. Whalen, B. LaValley, and L. ShaLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012
This paper describes a design flow and supporting tools to significantly improve the design and verification of complex cyber-physical systems. We focus on system architecture models composed from libraries of components and complexity-reducing design patterns having formally verified properties. This allows new system designs to be developed rapidly using patterns that have been shown to reduce unnecessary complexity and coupling between components. Components and patterns are annotated with formal contracts describing their guaranteed behaviors and the contextual assumptions that must be satisfied for their correct operation. We describe the compositional reasoning framework that we have developed for proving the correctness of a system design, and provide a proof of the soundness of our compositional reasoning approach. An example based on an aircraft flight control system is provided to illustrate the method and supporting analysis tools. © 2012 Springer-Verlag.
2011
- Optimizing tunable WCET with shared resource allocation and arbitration in hard real-time multicore systemsM.-K. Yoon, J.-E. Kim, and L. ShaProceedings - Real-Time Systems Symposium, 2011
The unpredictable worst-case timing behavior of multicore architectures has been the biggest stumbling block for a widespread use of multicores in hard real-time systems. A great deal of research effort has been devoted to address the issue. Among others, the development of a new multicore architecture has emerged as an attractive solution because it can eliminate the unpredictable interference sources in the first place. This opens a new possibility of system-level optimizations with multicore-based hard real-time systems. To address this issue, we propose a new perspective of WCET model called tunable WCET, in which the WCETs of tasks are elastically adjusted according to the optimal shared resource allocation and arbitration methods. For this, we propose novel WCET-aware harmonic round-robin bus scheduling and two-level cache partitioning method. We present a mixed integer linear programming formulation as the solution to the optimization of tunable WCETs. Our experimental results show that the proposed methods can significantly lower overall system utilizations. © 2011 IEEE.
- Limiting worst-case end-to-end latency when traffic increases in a switched avionics networkM.-Y. Nam, E. Seo, L. Sha, K.-J. Park, and K. KangProceedings - 17th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2011, 2011
New features are often added incrementally to avionics systems. This avoids redesign and recertification but still requires verifying the timing constraints of both new and existing applications. We introduce a new switch that facilitates this verification by bounding the latency of end-to-end communication across a network. Our clock-driven real-time switching algorithm is throughput-optimal with a bounded worst-case delay for all feasible traffic. Associated heuristics can verify whether the timing constraints of an avionics network are met, after new features have caused traffic to increase, and then search for alternative network configurations if necessary. We show how these heuristics cope with changes to an example environmental monitoring architecture within an avionics system that incorporates our switch. Our approach to analysis can be used to determine, quickly but rigorously, which system architecture meet timing constraints; and it allows the system architect to manage the cascading effects of component changes in a comprehensive manner. © 2011 IEEE.
- Resource allocation contracts for open analytic runtime modelsM.-Y. Nam, D. De Niz, L. Wrage, and L. ShaEmbedded Systems Week 2011, ESWEEK 2011 - 9th ACM International Conference on Embedded Software, EMSOFT’11, 2011
Open Analytic Runtime (OAR) Models embed analysis algorithms into runtime architectural models, thus integrating the model and its analytic interpretations. Such an integration is critical for Cyber-Physical Systems (CPS) when model parts are independently developed by different teams as it is the case in multi-tier industries, e.g. avionics and automotive. Analysis algorithms play a central role augmenting the designer’s capacity to automatically verify properties of interest in systems at the scale and complexity required by these industries. Unfortunately, the verification results are valid only if the assumptions of the different analysis algorithms (analytic assumptions) are consistent with each other. This paper presents our work on the automatic verification of one important class of analytic assumptions in OAR models: resource allocation assumptions. These assumptions are modeled as Resource Allocation (RA) contracts. RA contract constructs include not only the typical assumes and guarantees but also runtime facts and impli- cations. Finally, we automatically determine the correct sequence of execution of the analysis algorithms based on the contract input/output dependencies described in our models. Together these characteristics enable the automatic assumption verification that preserves the scalability of analytic models. We illustrate our approach using an example model with analysis algorithms for security, schedulability, and energy efficiency. Copyright © 2011 ACM.
- System-wide energy optimization for multiple DVS components and real-time tasksH. Yun, P.-L. Wu, A. Arya, C. Kim, T. Abdelzaher, and L. ShaReal-Time Systems, 2011
Most dynamic voltage and frequency scaling (DVS) techniques adjust only CPU parameters; however, recent embedded systems provide multiple adjustable clocks which can be independently tuned. When considering multiple components, energy optimal frequencies depend on task set characteristics such as the number of CPU and memory access cycles. In this work, we propose a realistic energy model considering multiple components with individually adjustable frequencies such as CPUs, system bus and memory, and related task set characteristics. The model is validated on a real platform and shows less than 2% relative error compared to measured values. Based on the proposed energy model, we present an optimal static frequency assignment scheme for multiple DVS components to schedule a set of periodic real-time tasks. We simulate the energy gain of the proposed scheme compared to other DVS schemes for various task and system configurations, showing up to a 20% energy reduction. We also experimentally verify energy savings of the proposed scheme on a real hardware platform. © 2011 Springer Science+Business Media, LLC.
- Pre-verified safety control framework for real-time medical systemsC. Kim, H. Yun, H.-G. Kim, and L. ShaInformation, 2011
Interoperability of medical devices is a growing need in modern healthcare systems, not just for convenience, but also to preclude potential human errors during medical procedures. Caregivers, as end users, strongly prefer the use of wireless networks for such interconnections between clinical devices due to its seamless connectivity and ease of use/maintenance. In [10], we introduced a Network-Aware Safety Supervisior framework to integrate medical devices into clinical supervisory systems using finite state machine (FSM). In this paper, we simplify FSM into Boolean Logic to minimize safety logic overhead and introduce a generic method, called pre-verified safety control (PVSC) framework to integrate medical devices into clinical management systems using wireless technologies that have their safety properties verified in a formal manner. Our method provides (i) a PVSC safety layer that automatically generates the safety engine to guarantee given safety requirements and (ii) an abstracted application development environment so that applications can be developed independent of underlying complications of wireless communication. To mitigate negative effects due to packet losses, the PVSC framework employs a pipelined "pre-planning" of the device controls. The key motivation of the work in this paper is to preserve safety and the application development environment, as is, even after adding unreliable communication media, such as wireless, along with a pre-planning mechanism. © 2011 International Information Institute.
- A medical-grade wireless architecture for remote electrocardiographyK. Kang, K.-J. Park, J.-J. Song, C.-H. Yoon, and L. ShaIEEE Transactions on Information Technology in Biomedicine, 2011
In telecardiology, electrocardiogram (ECG) signals from a patient are acquired by sensors and transmitted in real time to medical personnel across a wireless network. The use of IEEE 802.11 wireless LANs (WLANs), which are already deployed in many hospitals, can provide ubiquitous connectivity and thus allow cardiology patients greater mobility. However, engineering issues, including the error-prone nature of wireless channels and the unpredictable delay and jitter due to the nondeterministic nature of access to the wireless medium, need to be addressed before telecardiology can be safely realized. We propose a medical-grade WLAN architecture for remote ECG monitoring, which employs the point-coordination function (PCF) for medium access control and ReedSolomon coding for error control. Realistic simulations with uncompressed two-lead ECG data from the MIT-BIH arrhythmia database demonstrate reliable wireless ECG monitoring; the reliability of ECG transmission exceeds 99.99% with the initial buffering delay of only 2.4 s. © 2006 IEEE.
2010
- Reconstructing missing signals in multi-parameter physiologic data by mining the aligned contextual informationY. Li, Y. Sun, P. Sondhi, L. Sha, and C. ZhaiComputing in Cardiology, 2010
The PhysioNet Challenge 2010 is to recover missing segments of a particular signal in the given multiparameter physiologic data set. In this paper we propose a contextual information based framework to achieve robust reconstruction. For a given target signal that is to be reconstructed, our algorithm intelligently choose among three sub-algorithms to best recover the missing segments. Experiments are carried out on the Physionet/ CinC Challenge 2010 data sets. The results show that the proposed method is particularly effective on signals that have well aligned contextual signals.
- Exploring the design space of IMA system architecturesR. Bradford, S. Fliginger, M.-Y. Nam, S. Mohan, R. Pellizzoni, C. Kim, M. Caccamo, and L. ShaAIAA/IEEE Digital Avionics Systems Conference - Proceedings, 2010
In systems such as integrated modular avionics (IMA), there is a substantial benefit from maintaining significant portions of a product family’s architecture unchanged from one system to the next. When there are tight constraints on resources such as bandwidth and processor capacity, however, certain seemingly small changes in a few components have the potential to create a cascade of timing problems. The ability to rapidly analyze and quantify the impact of these changes prior to implementation and system integration provides the engineering team with early validation of the changes, which can prevent substantially increased costs for design, integration, and verification, as well as delays in the development schedule. However, detailed early evaluation of architecture performance involves analysis of many complex interrelated variables and is therefore challenging. Consider the case of moving a task from a processor’s IMA partition to another processor’s partition. The tasks sets need to be updated. The I/O and network traffic must be rerouted. The schedulability equations of processor, I/O and network need to be recreated, and the analysis needs to be propagated end to end. Last but not least, all the architecture specification documents have to be updated. In order to reduce the detailed architecture evaluation effort, we have automated the performance analysis process with a system integration tool prototype called ASIIST (Application-Specific I/O Integration Support Tool). To move a task, we can now use a graphical interface to drag and drop a task from one processor’s IMA partition to another. All the steps described above are done automatically, including the updating of the architecture specification in AADL (Architecture Analysis and Description Language). In this paper, we show how to use this tool to explore the design space of an IMA system architecture, so as to derive designs with specified performance properties. © 2010 IEEE.
- A formal pattern architecture for safe medical systemsM. Sun, J. Meseguer, and L. ShaLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2010
Design patterns have demonstrated major practical uses for cost savings and modular design in software engineering. For safety-critical systems, however, such patterns should also provide formal guarantees that critical safety properties are met. We leverage the power of rewriting logic and parameterization available in Real-Time Maude to add a formal basis for analysis of a novel safety pattern for medical devices. We demonstrate practicality and applicability of our pattern by instantiating it to a pacemaker specification, and we validate our pattern by verifying the safety invariant in the pacemaker instantiation. © 2010 Springer-Verlag Berlin Heidelberg.
- System-wide energy optimization for multiple DVS components and real-time tasksH. Yun, P.-L. Wu, A. Arya, T. Abdelzaher, C. Kim, and L. ShaProceedings - Euromicro Conference on Real-Time Systems, 2010
Most dynamic voltage and frequency scaling (DVS) techniques adjust only CPU parameters; however, recent embedded systems provide multiple adjustable clocks which can be independently tuned. When considering multiple components, energy optimal frequencies depend on task set characteristics such as the number of CPU and memory access cycles. In this work, we propose a realistic energy model considering multiple components with individually adjustable frequencies such as CPU, system bus and memory, and related task set characteristics. The model is validated on a real platform and shows less than 2% relative error compared to measured values. Based on the proposed energy model, we present an optimal static frequency assignment scheme for multiple DVS components to schedule a set of periodic realtime tasks. We simulate the energy gain of the proposed scheme compared to other DVS schemes for various task and system configurations, showing up to a 20% energy reduction. © 2010 IEEE.
- Cyber-physical systems: The next computing revolutionR. Rajkumar, I. Lee, L. Sha, and J. StankovicProceedings - Design Automation Conference, 2010
Cyber-physical systems (CPS) are physical and engineered systems whose operations are monitored, coordinated, controlled and integrated by a computing and communication core. Just as the internet transformed how humans interact with one another, cyber-physical systems will transform how we interact with the physical world around us. Many grand challenges await in the economically vital domains of transportation, health-care, manufacturing, agriculture, energy, defense, aerospace and buildings. The design, construction and verification of cyber-physical systems pose a multitude of technical challenges that must be addressed by a cross-disciplinary community of researchers and educators. Copyright 2010 ACM.
- A framework for the safe interoperability of medical devices in the presence of network failuresC. Kim, M. Sun, S. Mohan, H. Yun, L. Sha, and T.F. Abdelzaher1st ACM/IEEE International Conference on Cyber-Physical Systems, ICCPS ’10, 2010
There exists a growing need for automated interoperability among medical devices in modern healthcare systems. This requirement is not just for convenience, but to prevent the possibility of errors due to the complexity of interactions between the devices and human operators. Hence, a system supporting such interoperability is supposed to provide the means to interconnect distributed medial devices in an open space, so must be designed to account for network failures. In this paper, we introduce a generic framework, the Network-Aware Supervisory System (NASS) to integrate medical devices into such a clinical interoperability system that uses real networks. It provides a development environment, in which medical-device supervisory logic can be developed based on the assumptions of an ideal, robust network. A case study shows that the NASS framework provides the same procedural effectiveness as the original logic based on the ideal network model but with protection against real-world network failures. © 2010 ACM.
- A reduced complexity design pattern for distributed hierarchical command and control systemH. Yun, P.-L. Wu, M. Rahmaniheris, C. Kim, and L. Sha1st ACM/IEEE International Conference on Cyber-Physical Systems, ICCPS ’10, 2010
Cyber Physical Systems (CPS) get a lot of attention due to the strong demand for the integration of physical devices and computing systems. There are many design aspects involved in CPS, such as efficiency, real-time, reliability and security. One of the major issues is system integration and verification. In many safety critical systems verification plays an essential role in system design. However, the high complexity for the composition of diverse systems is a major challenge for system verification. In this paper, we focus on command and control systems for search and rescue missions and propose a systematic design pattern called Interruptible RPC to compose complex systems while keeping the verification costs low. This has been made possible due to the reduced state space of the systems designed using our pattern. Therefore, the system models can be efficiently verified using available verification tools. In our experiments, the search and rescue system based on Interruptible RPC pattern had fewer states than the asynchronous one by several orders of magnitude. © 2010 ACM.
- An interleaving structure for guaranteed QoS in real-time broadcasting systemsK. Kang, and L. ShaIEEE Transactions on Computers, 2010
Providing high-quality broadcast services for soft real-time applications over wireless networks such as CDMA2000, which have high bit error rates, requires the control of errors that occur during data transmission. Reed-Solomon (RS) forward error correction (FEC) in the medium access control (MAC) layer performs this role in 3G broadcast services. We propose new analytic models for predicting the performance of RS coding and its execution time, which take into account the memory property of a fading channel, different channel conditions, and a variable level of block interleaving. We identify RS decoding as a significant cause of variability in execution time, taking the form of jitter, which depends on the channel conditions. We analyze the size of buffer required to absorb the jitter under different channel conditions. We then formulate a trade-off between the performance of RS coding and the delay that it causes in transmitting a fixed amount of data with different levels of block interleaving. Finally, we show how to balance the quality with which content is presented against an acceptable buffering delay, which is very important to soft real-time applications, by using an adequate level of block interleaving. This study offers a guide for the provision of efficient broadcast services in real time with stochastically guaranteed quality. © 2006 IEEE.
- Design of robust adaptive frequency hopping for wireless medical telemetry systemsK.-J. Park, T.R. Park, C.D. Schmitz, and L. ShaIET Communications, 2010
The authors propose an adaptive frequency hopping (AFH) algorithm, entitled robust adaptive frequency hopping (RAFH), for providing increased reliability of a wireless medical telemetry system (WMTS) under coexistence environment with non-medical devices. The conventional AFH scheme classifies channels into ’good’ or ’bad’ according to the threshold-based on-off decision by packet error rate (PER) measurement, and only uses good channels with a uniform hop probability. Unlike the conventional AFH scheme, RAFH is a novel technique, which solves a constrained entropy maximisation problem and assigns every channel a different hop probability as a decreasing function of the measured PER. The key novelty of RAFH over existing AFH schemes is that it reflects the relative channel condition by assigning non-uniform hop probabilities. By adopting constrained entropy maximisation, RAFH not only improves the average PER, but also reduces the PER fluctuation over time under a dynamic interference environment, both of which increase the reliability of WMTS. Through extensive simulation, we show that RAFH outperforms basic frequency hopping (FH) and the conventional AFH with respect to the PER under various scenarios of dynamic interference. © 2010 The Institution of Engineering and Technology.
2009
- Handling mixed-criticality in SoC-based real-time embedded systemsR. Pellizzoni, P. Meredith, M.-Y. Nam, M. Sun, M. Caccamo, and L. ShaEmbedded Systems Week 2009 - 7th ACM International Conference on Embedded Software, EMSOFT ’09, 2009
System-on-Chip (SoC) is a promising paradigm to implement safety-critical embedded systems, but it poses significant challenges from a design and verification point of view. In particular, in a mixed-criticality system, low criticality applications must be prevented from interfering with high criticality ones. In this paper, we introduce a new design methodology for SoC that provides strong isolation guarantees to applications with different criticalities. A set of certificates describing the assumed application behavior is extracted from a functional Architectural Analysis and Design Language (AADL) specification. Our tools then automatically generate hardware wrappers that enforce at run-time the behavior described by the certificates. In particular, we employ run-time monitoring to formally check all data communication in the system, and we enforce timing reservations for both computation and communication resources. Verification is greatly simplified because certificates are much simpler than the components used to implement low-criticality applications. The effectiveness of our methodology is proven on a case study consisting of a medical pacemaker. Copyright 2009 ACM.
- Implementing logical synchrony in integrated modular avionicsS.P. Miller, D.D. Cofer, L. Sha, J. Meseguer, and A. Al-NayeemAIAA/IEEE Digital Avionics Systems Conference - Proceedings, 2009
Many avionics systems must be implemented as redundant, distributed systems in order to provide the necessary level of fault tolerance. To correctly perform their function, the individual nodes of these systems must agree on some part of the global system state. Developing protocols to achieve this agreement is greatly simplified if the nodes execute synchronously relative to each other, but many Integrated Modular Avionics architectures assume nodes will execute asynchronously. This paper presents a simple design pattern, Physically Asynchronous/Logically Synchronous (PALS), that allows developers to design and verify a distributed, redundant system as though all nodes execute synchronously. This synchronous design can then be distributed over a physically asynchronous architecture in such a way that the logical correctness of the design is preserved. Use of this complexity reducing design pattern greatly simplifies the development and verification of fault tolerant distributed applications, ensures optimal system performance, and provides a standard argument for system certification. ©2009 IEEE.
- Real-time control of I/O COTS peripherals for embedded systemsS. Bak, E. Betti, R. Pellizzoni, M. Caccamo, and L. ShaProceedings - Real-Time Systems Symposium, 2009
Real-time embedded systems are increasingly being built using commercial-off-the-shelf (COTS) components such as mass-produced peripherals and buses to reduce costs, time-to-market, and increase performance. Unfortunately, COTS interconnect systems do not usually guarantee timeliness, and might experience severe timing degradation in the presence of high-bandwidth I/O peripherals. To address this problem, we designed a real-time I/O management system comprised of 1) real-time bridges, and 2) a reservation controller. The proposed framework is used to transparently put the I/O subsystem of a COTS-based embedded system under the discipline of real-time scheduling. We also discuss computing a delay bound for I/O data transactions and determining worst-case buffer size. Finally, we demonstrate experimentally that our prototype real-time I/O management system successfully prioritizes I/O traffic and guarantees its timeliness. © 2009 IEEE.
- Rapid early-phase virtual integrationS. Mohan, M.-Y. Nam, R. Pellizoni, L. Sha, R. Bradford, and S. FligingerProceedings - Real-Time Systems Symposium, 2009
In complex hard real-time systems with tight constraints on system resources, small changes in one component of a system can cause a cascade of adverse effects on other parts of the system. We address the inherent complexity of making architectural decisions by raising the level of abstraction at which the analysis is performed. Our analysis approach gives the system architect a rigorous method for quickly determining which system architectures should be pursued, and it allows the architect to track and manage the cascading effects of subsystem/component changes in a comprehensive, quantitative manner. The end product is a virtual architecture analysis that systematically incorporates the inherent coupling among interacting system components that share limited system resources. © 2009 IEEE.
- A formal architecture pattern for real-time distributed systemsA. Al-Nayeem, M. Sun, X. Qiu, L. Sha, S.P. Miller, and D.D. CoferProceedings - Real-Time Systems Symposium, 2009
Pattern solutions [1] for software and architectures have significantly reduced design, verification, and validation times by mapping challenging problems into a solved generic problem. In the paper, we present an architecture pattern for ensuring synchronous computation semantics using the PALS protocol [2]. We develop a modeling framework in AADL to automatically transform a synchronous design of a real-time distributed system into an asynchronous design satisfying the PALS protocol. We present a detailed example of how the PALS transformation works for a dual-redundant system. From the example, we also describe the general transformation in terms of intuitively defined AADL semantics. Furthermore, we develop a static analysis checker to find necessary conditions that must be satisfied in order for the PALS transformation to work correctly. The transformations and static checks that we have described are implemented in OSATE using the generated EMF metamodel API for model manipulation. © 2009 IEEE.
- ASIIST: Application specific I/O integration support tool for real-time bus architecture designsM.-Y. Nam, R. Pellizzoni, L. Sha, and R.M. BradfordIEEE International Conference on Engineering of Complex Computer Systems, ICECCS, 2009
In hard real-time systems such as avionics, computer board level designs are typically customized to meet specific reliability and real time requirements. This paper focuses on computer-aided application-specific design of I/O architecture using PCI as an example. We have built a tool (ASIIST) that will enable engineers to explore design spaces at the I/O bus architecture level, performing analysis that incorporates bus protocols, to provide guarantees of real-time properties. © 2009 IEEE.
- Resilient mixed-criticality systemsCrossTalk, 2009
University of Illinois at Urbana-Champaign Most complex cyber-physical systems (CPSs) are mixed-criticality systems that have to be resilient against software design faults, hardware failures, and physical hazards under software control. This article reviews useful design principles and architecture patterns for the development of such systems.
- The system-level simplex architecture for improved real-time embedded system safetyS. Bak, D.K. Chivukula, O. Adekunle, M. Sun, M. Caccamo, and L. ShaIEEE Real-Time and Embedded Technology and Applications Symposium, RTAS, 2009
Embedded systems in safety-critical environments demand safety guarantees while providing many useful services that are too complex to formally verify or fully test. Existing application-level fault-tolerance methods, even if formally verified, leave the system vulnerable to errors in the realtime operating system (RTOS), middleware, and microprocessor. We introduce the System-Level Simplex Architecture, which uses hardware/software co-design to provide failoperational guarantees for both logical application-level faults, as well as faults in previously dependent layers including the RTOS and microprocessor. We also provide an end-to-end design process for the System-Level Simplex Architecture where the AADL architecture description is automatically constructed and checked and the VHDL hardware code is generated. To show the efficacy of System-Level Simplex design, we apply the approach to both a classic inverted pendulum and a cardiac pacemaker. We perform fault-injection tests on the inverted pendulum design which demonstrate robustness in spite of software controller and operating system faults. For the pacemaker, we contrast the provided safety guarantees with those of a previous-generation pacemaker. © 2009 IEEE.
- End-to-end delay analysis of wireless ECG over cellular networksM.-K. Yoon, J.-E. Kim, K. Kang, K.-J. Park, M.-Y. Nam, and L. ShaWiMD 2009 - 1st ACM International Workshop on Medical-Grade Wireless Networks, co-located with MobiHoc 2009, 2009
In this paper, we present a medical-grade network architecture based on the wireless cellular technology of CDMA2000 1xEV-DO (Evolution-Data Optimized). It can provide highly reliable communication with a bounded delay by exploiting its inherent channel access structure and cooperative packet scheduler at the access network. Additionally, we propose a way of analyzing the worst-case end-to-end delay over the candidate cellular architecture, and apply it to wireless ECG (Electrocardiogram) to examine its applicability. Our simulation study shows how the proposed architecture can provide acceptable quality of service for wireless medical applications. Copyright 2009 ACM.
- Entropy-maximization based adaptive frequency hopping for wireless medical telemetry systemsK.-J. Park, T.R. Park, C.D. Schmitz, and L. ShaWiMD 2009 - 1st ACM International Workshop on Medical-Grade Wireless Networks, co-located with MobiHoc 2009, 2009
In this paper, we propose an adaptive frequency hopping algorithm, entitled robust adaptive frequency hopping (RAFH), for increased reliability of wireless medical telemetry system (WMTS) under coexistence environment with non-medical devices. The conventional adaptive frequency hopping (AFH) scheme classifies channels into "good" or "bad" according to the threshold-based on-off decision, and uses good channels with a uniform hop probability. Unlike the conventional AFH scheme, RAFH solves a constrained entropy maximization problem and assigns each channel a different hop probability, which is a decreasing function of the measured PER. By adopting constrained entropy maximization, RAFH not only improves the average PER, but also reduces the PER fluctuation under dynamic interference environment. Our simulation studies show that RAFH outperforms the basic FH and the conventional AFH with respect to the PER under various scenarios of dynamic interference. Copyright 2009 ACM.
- IEEE 802.11 WLAN for medical-grade QoSK.-J. Park, D.M. Shrestha, Y.-B. Ko, N.H. Vaidya, and L. ShaWiMD 2009 - 1st ACM International Workshop on Medical-Grade Wireless Networks, co-located with MobiHoc 2009, 2009
In this paper, we study the problem of how to design a medicalgrade wireless LAN for healthcare facilities. Unlike IEEE 802.11e MAC, which categorizes traffic primarily by delay constraint, we prioritize medical applications into access categories according to medical criticality. We design a fully-distributed contention control mechanism that can efficiently utilize wireless channel for improving medical-grade QoS. We further derive a sufficient condition for the convergence of the proposed algorithm. Our simulation result shows that our medical access categorization and the contention control mechanism significantly improve the performance of a medical network over the conventional IEEE 802.11e MAC. Copyright 2009 ACM.
- Cyber-physical systems: A new frontierL. Sha, S. Gopalakrishnan, X. Liu, and Q. Wang2009
The report of the President’s Council of Advisors on Science and Technology (PCAST) has placed cyber-physical systems on the top of the priority list for federal research investment in the United States of America in 2008. This article reviews some of the challenges and promises of cyber-physical systems. © 2009 Springer-Verlag US.
2008
- Coscheduling of CPU and I/O transactions in COTS-based embedded systemsR. Pellizzoni, B.D. Bui, M. Caccamo, and L. ShaProceedings - Real-Time Systems Symposium, 2008
Integrating COTS components in critical real-time systems is challenging. In particular, we show that the interference between cache activity and I/O traffic generated by COTS peripherals can unpredictably slow down a real-time task by up to 44%. To solve this issue, we propose a framework comprised of three main components: 1) a COTScompatible device, the peripheral gate, that controls peripheral access to the system; 2) an analytical technique that computes safe bounds on the I/O-induced task delay; 3) a coscheduling algorithm that maximizes the amount of allowed peripheral traffic while guaranteeing all real-time task constraints. We implemented the complete framework on a COTS-based system using PCI peripherals, and we performed extensive experiments to show its feasibility. © 2008 IEEE.
- Queueing-model-based adaptive control of multi-tiered web applicationsX. Liu, J. Heo, L. Sha, and X. ZhuIEEE Transactions on Network and Service Management, 2008
Web applications have been increasingly deployed on the Internet. How to effectively allocate system resources to meet the Service Level Objectives (SLOs) is a challenging problem for Web application providers. In this article, we propose a scheme for automated performance control of Web applications via dynamic resource allocations. The scheme uses a queueing model predictor and an online adaptive feedback loop that enforces admission control of the incoming requests to ensure the desired response time target is met. The proposed Queueing-Model-Based Adaptive Control approach combines both the modeling power of queueing theory and the self-tuning power of adaptive control. Therefore, it can handle both modeling inaccuracies and load disturbances in a better way. To evaluate the proposed approach, we built a multi-tiered Web application testbed with open-source components widely adopted in industry. Experimental studies conducted on the testbed demonstrated the effectiveness of the proposed scheme. © 2009 IEEE.
- ORTEGA: An efficient and flexible online fault tolerance architecture for real-time control systemsX. Liu, Q. Wang, S. Gopalakrishnan, W. He, L. Sha, H. Ding, and K. LeeIEEE Transactions on Industrial Informatics, 2008
Fault tolerance is an important aspect in real-time computing. In real-time control systems, tasks could be faulty due to various reasons. Faulty tasks may compromise the performance and safety of the whole system and even cause disastrous consequences. In this paper, we describe On-demand Real-TimE GuArd (ORTEGA), a new software fault tolerance architecture for real-time control systems. ORTEGA has high fault coverage and reliability. Compared with existing real-time fault tolerance architectures, such as Simplex, ORTEGA allows more efficient resource utilizations and enhances flexibility. These advantages are achieved through the on-demand detection and recovery of faulty tasks. ORTEGA is applicable to most industrial control applications where both efficient resource usage and high fault coverage are desired. © 2006 IEEE.
- Impact of cache partitioning on multi-tasking real time embedded systemsB.D. Bui, M. Caccamo, L. Sha, and J. MartinezProceedings - 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2008, 2008
Cache partitioning techniques have been proposed in the past as a solution for the cache interference problem. Due to qualitative differences with general purpose platforms, real-time embedded systems need to minimize task real-time utilization (function of execution time and period) instead of only minimizing the number of cache misses. In this work, the partitioning problem is presented as an optimization problem whose solution sets the size of each cache partition and assigns tasks to partitions such that system worst-case utilization is minimized thus increasing real-time schedulability. Since the problem is NP-Hard, a genetic algorithm is presented to find a near optimal solution. A case study and experiments show that in a typical real-time embedded system, the proposed algorithm is able to reduce the worst-case utilization by 15% (on average) if compared to the case when the system uses a shared cache or a proportional cache partitioned environment.
- ORTEGA: An efficient and flexible software fault tolerance architecture for real-time control systemsX. Liu, H. Ding, K. Lee, Q. Wang, and L. ShaProceedings - Euromicro Conference on Real-Time Systems, 2008
Fault tolerance is an important aspect in real-time computing. In real-time control systems, tasks could be faulty due to various reasons. Faulty tasks may compromise the performance and safety of the whole system and even cause disastrous consequences. In this paper, we describe ORTEGA (On-demand Real-TimE GuArd), a new software fault tolerance architecture for real-time control systems. ORTEGA has high fault coverage and reliability. Compared with existing real-time fault tolerance architectures, such as Simplex, ORTEGA allows more efficient resource utilizations and enhances flexibility. These advantages are achieved through the on-demand detection and recovery of faulty tasks. ORTEGA is applicable to most industrial control applications where both efficient resource usage and high fault coverage are desired. ©2008 IEEE.
- A switch design for real-time industrial networksQ. Wang, S. Gopalakrishnan, X. Liu, and L. ShaIEEE Real-Time and Embedded Technology and Applications Symposium, RTAS, 2008
The convergence of computers and the physical world is the theme for next generation networking research. This trend calls for real-time network infrastructure, which requires a high-speed real-time WAN to serve as its backbone. However, commercially available high-speed WAN switches (routers) are designed for best-effort Internet traffic. A real-time switch design for the aforementioned networks is missing. We propose a real-time switch design using a crossbar switching fabric. The proposed switch can be implemented by making minimal modification, or even simplification, to the widely implemented iSLIP crossbar switch scheduler. Our real-time switch serves periodic and aperiodic traffic with real-time virtual machine tasks, which simplifies analysis, provides isolation, and facilitates future hierarchical scheduling and flow aggregation. Taking advantage of the fact that most industrial real-time network flows rarely change, our switch is better adapted to providing high bandwidths and low latencies. © 2008 IEEE.
- Cyber-physical systems: A new frontierL. Sha, S. Gopalakrishnan, X. Liu, and Q. WangProceedings - IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing, 2008
The report of the President’s Council of Advisors on Science and Technology (PCAST) has placed CPS on the top of the priority list for federal research investment [6]. This article first reviews some of the challenges and promises of CPS, followed by an articulation of some specific challenges and promises that are more closely related to the Sensor Networks, Ubiquitous and Trustworthy Computing Conference. © 2008 IEEE.
- Sharp thresholds for scheduling recurring tasks with distance constraintsS. Gopalakrishnan, M. Caccamo, and L. ShaIEEE Transactions on Computers, 2008
The problem of identifying suitable conditions for the schedulability of (nonpreemptive) recurring tasks with deadlines is of great importance to real-time systems. In this paper, motivated by the problem of scheduling radar dwells, we show that scheduling problems of this nature show a sharp threshold behavior with respect to system utilization. Sharp thresholds are associated with phase transitions: When the utilization of a task set is less than a critical value, it can be scheduled almost surely and, when the utilization increases beyond the critical level, almost no task set can be scheduled. We make connections to work on random graphs to prove the sharp threshold behavior in the scheduling problem of interest. Using extensive experiments, we determine the threshold for the radar dwell scheduling problem and use it for performance optimization. The connections to random graph theory suggest new ways for understanding the average-case behavior of scheduling policies. These results emphasize the ease with which performance can be controlled in a variety of real-time systems. © 2008 IEEE.
- Design of complex cyber physical systems with formalized architectural patternsL. Sha, and J. MeseguerLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008
The design of cyber physical systems (CPS) presents many challenges because of their complexity, strong safety requirements, distribution, and real-time nature. We propose a novel paradigm, based on the idea of using simplicity to control complexity, to achieve highly reliable CPS designs. The goal is to embody design rules of this complexity-control nature in highly reusable, very robust, and formally verified architectural patterns. We discuss some preliminary work and experiments illustrating how this can be done for CPS systems. © 2008 Springer Berlin Heidelberg.
- Lightning: A hard real-time, fast, and lightweight low-end wireless sensor election protocol for acoustic event localizationQ. Wang, R. Zheng, A. Tirumala, X. Liu, and L. ShaIEEE Transactions on Mobile Computing, 2008
We present the Lightning Protocol, a hard real-time, fast, and lightweight protocol to elect the sensor closest to an impulsive sound source. This protocol can serve proximity-based localization or leader election for sensor collaboration. It utilizes the fact that electromagnetic waves propagate much faster than acoustic waves to efficiently reduce the number of contending sensors in the election. With simple RF bursts, most basic comparison operations, no need of clock synchronization, and a memory footprint as small as 5,330 bytes of ROM and 187 bytes of RAM, the protocol incurs O(1) transmissions, irrespective of the sensor density, and guarantees hard real-time (O(1)) localization time cost. Experiment results using UC Berkeley Motes in a common office environment demonstrate that the time delay for the Lightning Protocol is on the order of milliseconds. The simplicity of the protocol reduces memory cost, computation complexity, and programming difficulty, making it desirable for low-end wireless sensors. © 2008 IEEE.
2007
- The simplex reference model: Limiting fault-propagation due to unreliable components in cyber-physical system architecturesT.L. Crenshaw, E. Gunter, C.L. Robinson, L. Sha, and P.R. KumarProceedings - Real-Time Systems Symposium, 2007
Cyber-Physical Systems are networked, component-based, real-time systems that control and monitor the physical world. We need software architectures that limit fault-propagation across unreliable components. This paper introduces our Simplex reference model which is distinguished by: a Plant being controlled in an external context, a Machine performing the control, a Domain Model that estimates the Plant state, and the Safety Requirements that must be met. The Simplex reference model assists with constructing CPS architectures which limit fault-propagation. We present a representative case study to highlight the ideas behind the model and our particular decomposition. © 2007 IEEE.
- GD-aggregate: A WAN virtual topology building tool for hard real-time and embedded applicationsQ. Wang, X. Liu, J. Hou, and L. ShaProceedings - Real-Time Systems Symposium, 2007
The convergence of computer and physical world calls for next generation Wide Area Network (WAN) infrastructures for hard real-time and embedded applications. Such networks need virtual topologies to achieve scalability, configurability, and flexibility. Virtual topologies are made of virtual links, for which, the state-of-the-art building tool is Guaranteed Rate server based aggregates (GR-aggregates). However, common-practice weight assignment scheme couples GR-aggregate End-to-End (E2E) delay bound with aggregate’s data throughput inverse proportionally. This is undesirable for many hard real-time embedded sensing/actuating applications, whose traffic has small data throughput but requires short E2E delay. We propose Guaranteed Delay server based aggregates (GD-aggregates), which allow assigning weights according to priorities instead of data throughput. This decouples E2E delay guarantee from data throughput, hence meets the needs of hard realtime embedded applications. In addition, GD-aggregates can be analyzed with simple closed form formulae, and can be easily planned with optimization tools. © 2007 IEEE.
- Building reliable MD PnP systemsM. Sun, Q. Wang, and L. ShaProceedings - 2007 Joint Workshop on High Confidence Medical Devices, Software, and Systems and Medical Device Plug-and-Play Interoperability, HCMDSS/MDPnP 2007, 2007
We address two necessary issues needed for developing safe and reliable MD PnP systems: robust wireless networking and automated checking for component interoperability and reliability. First, the robustness of a wireless network can be improved by the use of DSSS-CDMA, which tradeoff throughput to achieve higher reliability and persistence of connections as needed. Second, automated checking of components interoperability and reliability can be partially addressed by the use of the Integrated Framework for Assumptions and Dependencies (IFAD) by specifying a machine-checkable encoding of component information, checking interface assumptions, and finding how various environmental changes can impact the system. © 2007 IEEE.
- PAS: A wireless-enabled, sensor-integrated personal assistance system for independent and assisted livingJ.C. Hou, Q. Wang, B.K. Alshebli, L. Ball, S. Birge, M. Caccamo, C.-F. Cheah, E. Gilbert, C.A. Gunter, E. Gunter, C.-G. Lee, K. Karahalios, M.-Y. Nam, N. Nitya, C. Rohit, L. Sha, W. Shin, S. Yu, Y. Yu, and Z. ZengProceedings - 2007 Joint Workshop on High Confidence Medical Devices, Software, and Systems and Medical Device Plug-and-Play Interoperability, HCMDSS/MDPnP 2007, 2007
Advances in networking, sensors, and embedded devices have made it feasible to monitor and provide medical and other assistance to people in their homes. Aging populations will benefit from reduced costs and improved health-care through assisted living based on these technologies. However, these systems challenge current state-of-the-art techniques for usability, reliability, and security. In this paper we present the PAS open architecture for assisted living, which allows independently developed third party components to collaborate. We discuss key technological issues in assisted living systems, such as tracking, fall detection, security and privacy; and results from our pilot study in a real assisted living facility are presented. © 2007 IEEE.
- Building robust wireless LAN for industrial control with the DSSS-CDMA n cell phone network paradigmQ. Wang, X. Liu, W. Chen, L. Sha, and M. CaccamoIEEE Transactions on Mobile Computing, 2007
Wireless LAN for Industrial Control (IC-WLAN) provides many benefits, such as mobility, low deployment cost, and ease of reconfiguration. However, the top concern is robustness of wireless communications. Wireless control loops must be maintained under persistent adverse channel conditions, such as noise, large-scale path loss, fading, and many electromagnetic interference sources in industrial environments. The conventional IEEE 802.11 WLANs, originally designed for high bandwidth instead of high robustness, are therefore inappropriate for IC-WLAN. A solution lies in the Direct Sequence Spread Spectrum (DSSS) technology: By deploying the largest possible processing gain (slowest bit rate) that fully exploits the low data rate feature of industrial control, much higher robustness can be achieved. We hereby propose using DSSS-CDMA to build IC-WLAN. We carry out fine-grained physical layer simulations and Monte Carlo comparisons. The results show that DSSS-CDMA IC-WLAN provides much higher robustness than IEEE 802.11/802.15.4 WLAN, so that reliable wireless industrial control loops become feasible. We also show that deploying larger processing gain is preferable to deploying more intensive convolutional coding. The DSSS-CDMA IC-WLAN scheme also opens up a new problem space for interdisciplinary study, involving real-time scheduling, resource management, communication, networking, and control. © 2007 IEEE.
2006
- Static analysis to enforce safe value flow in embedded control systemsS. Kowshik, G. Rosu, and L. ShaInternational Conference on Dependable Systems and Networks, 2006
Embedded control systems consist of multiple components with different criticality levels interacting with each other. For example, in a passenger jet, the navigation system interacts with the passenger entertainment system in providing passengers the distance-to-destination information. It is imperative that failures in the non-critical subsystem should not compromise critical functionality. This architectural principle for robustness can, however, be easily compromised by implementation-level errors. We describe Safe-Flow, which statically analyzes core components in the system to ensure that they use non-core values communicated through shared memory only if they are run-time monitored for safety or recoverability. Using simple, local annotations and semantic restrictions on shared memory usage in the core component, SafeFlow precisely identifies accesses to unmonitored non-core values. With a few false positives, it identifies erroneous dependencies of critical data on non-core values that can arise due to programming errors, inadvertent accesses, or wrong assumptions regarding the absence of difficult-to-detect implementation errors such as data races and synchronization. We demonstrate the utility of SafeFlow by applying it to discover critical value flow dependencies in three prototype systems. © 2006 IEEE.
- Switch scheduling and network design for real-time systemsS. Gopalakrishnan, M. Caccamo, and L. ShaReal-Time Technology and Applications - Proceedings, 2006
The rapid need for high bandwidth and low latency communication in distributed real-time systems is driving system architects towards high-speed switches developed for high volume data transfer on the Internet. These switches employ complex scheduling algorithms for transferring data cells from the input line to the output line. From a real-time systems perspective, it is necessary to understand the behavior of these switching algorithms and obtain worst-case delay bounds for message transfer across these switches. Many researchers have derived average-case delay bounds for switching algorithms but mission-critical systems require guarantees for the worst-case. In this context, we derive upper bounds on cell delays across commonly available switches. Our bounds provide a starting point for research in this direction. Moreover, we use the delay bounds to construct low-cost networks of switches given a set of processors and their real-time communication requirements. Experiments with the heuristic algorithm that we propose for network design have produced encouraging results. Importantly, the algorithm is independent of the delay analysis technique and better techniques can be incorporated trivially. By addressing the network design problem, we hope to transform the system architecting process from a manual, ad-hoc operation to a simple and automated step that will result in better designs and cost savings. © 2006 IEEE.
- Autonomous delay regulation for multi-threaded internet serversJ. Heo, X. Liu, L. Sha, and T.F. AbdelzaherInternational Symposium on Performance Evaluation of Computer and Telecommunication Systems 2006, SPECTS’06, Part of the 2006 Summer Simulation Multiconference, SummerSim’06, 2006
How to properly assign system resources to satisfy the Service Level Agreement (SLA) is a challenging research problem. Previous approaches include queuing model based feedback control strategies with queuing predictor for feed-forward delay prediction. However, ignorance of the multi-threaded nature can induce large model errors. To compensate this, previous approaches typically perform offline identification, thus making the system dependent on a particular workload and a specific Internet application. In this paper, we propose a novel framework for autonomous delay regulation for multi-threaded Internet servers. We formulate a processor-sharing queuing model for the multi-threaded server architecture to precisely predict service rate for worker threads. In addition, the proposed scheme uses the sleep actuator to properly assign resources based on the calculated service rale. We evaluate our techniques experimentally using an Apache web server test-bed. We demonstrate that the proposed strategy performs better than the previous approaches under a realistic workload.
- A pattern for adaptive behavior in safety-critical, real-time middlewareT.L. Crenshaw, C.L. Robinson, H. Ding, P.R. Kumar, and L. ShaProceedings - Real-Time Systems Symposium, 2006
Patterns are a valuable method for communicating software engineering expertise about proven solutions for common problems. This paper evaluates the use of domain-independent patterns in a case study of Etherware, a middleware for networked control with a real-time, safety-critical applications model. The case study illustrates the positive and negative impact that four existing patterns have on availability, reliability, and robustness for real-time, safety-critical systems. In particular, we observe Etherware’s specialized usage of the Filter pattern, confirm this usage among other middleware technologies, and subsequently present the Adaptive Control Filter, a design pattern for real-time, safety-critical middleware which can mitigate timing dependencies in networked control. © 2006 IEEE.
- Adaptive control of multi-tiered Web applications using queueing predictorX. Liu, J. Heo, L. Sha, and X. ZhuIEEE Symposium Record on Network Operations and Management Symposium, 2006
How to effectively allocate system resources to meet Service Level Objectives (SLOs) is a challenging problem for Web services providers. In this paper, we propose a scheme for autonomous performance control of Web applications. It uses a queueing model predictor and an online adaptive feedback loop that enforces admission control of the incoming requests to ensure the desired response time target is met. The proposed Queueing-Model-Based Adaptive Control approach combines both the modeling power of queueing theory and self-tuning power of adaptive control. Therefore, it can handle both modeling inaccuracies and load disturbances in a better way. To evaluate the proposed approach, we built a multi-tiered Web application testbed with open-source components widely used in industry. Experimental studies conducted on the testbed demonstrated the effectiveness of the proposed approach. © 2006 IEEE.
- Schedulability envelope for real-time radar dwell schedulingC.-G. Lee, P.-S. Kang, C.-S. Shih, and L. ShaIEEE Transactions on Computers, 2006
This paper proposes novel techniques for scheduling radar dwells in phased array radar systems. In order to handle complex physical characteristics such as dwell interleaving, transmitting duty cycle constraint, and energy constraint, we propose a notion of schedulability envelope. The schedulability envelope designed offline hides the details of complex radar dwell scheduling and provides a simple measure for the schedulability check. Using the schedulability envelope, the proposed technique can efficiently perform the admission control for dynamic target tracking tasks. The simulation results show that the proposed approach can significantly improve the system utilization by taking advantage of dwell interleaving while guaranteeing the schedulability and physical constraints. © 2006 IEEE.
- Switch scheduling and network design for real-time systemsS. Gopalakrishnan, M. Caccamo, and L. ShaIEEE Real-Time and Embedded Technology and Applications Symposium, RTAS, 2006
The rapid need for high bandwidth and low latency communication in distributed real-time systems is driving system architects towards high-speed switches developed for high volume data transfer on the Internet. These switches employ complex scheduling algorithms for transferring data cells from the input line to the output line. From a real-time systems perspective, it is necessary to understand the behavior of these switching algorithms and obtain worst-case delay bounds for message transfer across these switches. Many researchers have derived average-case delay bounds for switching algorithms but mission-critical systems require guarantees for the worst-case. In this context, we derive upper bounds on cell delays across commonly available switches. Our bounds provide a starting point for research in this direction. Moreover, we use the delay bounds to construct low-cost networks of switches given a set of processors and their real-time communication requirements. Experiments with the heuristic algorithm that we propose for network design have produced encouraging results. Importantly, the algorithm is independent of the delay analysis technique and better techniques can be incorporated trivially. By addressing the network design problem, we hope to transform the system architecting process from a manual, ad-hoc operation to a simple and automated step that will result in better designs and cost savings. © 2006 IEEE.
- Local group communication-aware MAC protocol in wireless sensor networksR. Zheng, J.S. Walia, and L. ShaInternational Journal of Wireless Information Networks, 2006
In this paper, we investigate the problem of providing efficient communication primitives across domains of wireless sensor network (WSN) applications. We argue both qualitatively and quantitatively that group communication among sensors of geographic proximity is one of the basic building blocks of many WSN applications. Furthermore, group communication awareness needs to be embedded and implemented at the MAC layer due to the broadcast nature of wireless medium. We devise a MAC protocol, called LGC-MAC to enable efficient single-hop one-to-many and many-to-one communication. We present case studies of two example applications, acoustic target tracking and propagation of information with feedback using LGC-MAC and demonstrate that LGC-MAC can improve the response time, alleviate channel contention and provide better fault tolerance to packet collisions and wireless errors. © Springer Science+Business Media, LLC 2006.
- Optimal block design for asynchronous wake-up schedules and its applications in multihop wireless networksR. Zheng, J.C. Hou, and L. ShaIEEE Transactions on Mobile Computing, 2006
In this paper, we consider the problem of designing optimal asynchronous wake-up schedules to facilitate distributed power management and neighbor discovery in multihop wireless networks. We first formulate it as a block design problem and derive the fundamental trade-offs between wake-up latency and the average duty cycle of a node. After the theoretical foundation is laid, we then devise a neighbor discovery and schedule bookkeeping protocol that can operate on the optimal wake-up schedule derived. To demonstrate the usefulness of asynchronous wake-up, we investigate the efficiency of neighbor discovery and the application of on-demand power management, which overlays a desirable communication schedule over the wake-up schedule mandated by the asynchronous wake-up mechanism. Simulation studies demonstrate that the proposed asynchronous wake-up protocol has short discovery time which scales with the density of the network; it can accommodate various traffic characteristics and loads to achieve an energy savings that can be as high as 70 percent, while the packet delivery ratio is comparable to that without power management. © 2006 IEEE.
- Optimal real-time sampling rate assignment for wireless sensor networksX. Liu, Q. Wang, W. He, M. Caccamo, and L. ShaACM Transactions on Sensor Networks, 2006
How to allocate computing and communication resources in a way that maximizes the effectiveness of control and signal processing, has been an important area of research. The characteristic of a multi-hop Real-Time Wireless Sensor Network raises new challenges. First, the constraints are more complicated and a new solution method is needed. Second, a distributed solution is needed to achieve scalability. This article presents solutions to both of the new challenges. The first solution to the optimal rate allocation is a centralized solution that can handle the more general form of constraints as compared with prior research. The second solution is a distributed version for large sensor networks using a pricing scheme. It is capable of incremental adjustment when utility functions change. This article also presents a new sensor device/network backbone architecture-Real-time Independent CHannels (RICH), which can easily realize multi-hop real-time wireless sensor networking. © 2006 ACM.
- Finite-horizon scheduling of radar dwells with online template constructionS. Gopalakrishnan, M. Caccamo, C.-S. Shih, C.-G. Lee, and L. ShaReal-Time Systems, 2006
Timing constraints for radar tasks are usually specified in terms of the minimum and maximum temporal distance between successive radar dwells. We utilize the idea of feasible intervals for dealing with the temporal distance constraints. In order to increase the freedom that the scheduler can offer a high-level resource manager, we introduce a technique for nesting and interleaving dwells online while accounting for the energy constraint that radar systems need to satisfy. Further, in radar systems, the task set changes frequently and we advocate the use of finite horizon scheduling in order to avoid the pessimism inherent in schedulers that assume a task will execute forever. The combination of feasible intervals and online dwell packing allows modular schedule updates whereby portions of a schedule can be altered without affecting the entire schedule, hence reducing the complexity of the scheduler. Through extensive simulations we validate our claims of providing greater scheduling flexibility without compromising on performance when compared with earlier work based on templates constructed offline. We also evaluate the impact of two parameters in our scheduling approach: the template length (or the extent of dwell nesting and interleaving) and the length of the finite horizon. © Springer Science + Business Media, LLC 2006.
- Performance analysis of power management policies in wireless networksR. Zheng, J.C. Hou, and L. ShaIEEE Transactions on Wireless Communications, 2006
It has long been recognized that energy conservation usually comes at the cost of degraded performance such as longer delay and lower throughput in stand-alone systems and communication networks. However, there have been very few research efforts in quantifying such trade-offs. In this paper, we develop analytical models to characterize the relationships among energy, delay and throughput for different power management policies in wireless communication. Based on the decision when to put nodes to low-power states, we divide power management policies into two categories, i.e., 1) time-out driven and 2) polling-based. M/G/1/K queues with multiple vacations and an attention span are used to model time-out driven policies while transient analysis is applied to derive the state transition probability in polling-based systems. We find that For time-out driven power management policies, the "optimal" policy exhibits a threshold structure, i.e., when the traffic load is below certain threshold, a node should switch to the low-power state whenever possible and always remain active otherwise. From our analysis, contrary to general beliefs, polling-based policies such as the IEEE 802.11 PSM are not energy efficient for light traffic load. © 2006 IEEE.
- High-Confidence medical device software and systemsI. Lee, G.J. Pappas, R. Cleaveland, J. Hatcliff, B.H. Krogh, P. Lee, H. Rubin, and L. ShaComputer, 2006
Given the shortage of caregivers and the increase in an aging US population, the future of US healthcare quality does not look promising and definitely is unlikely to be cheaper. Advances in health information systems and healthcare technology offer a tremendous opportunity for improving the quality of care while reducing costs. © 2006 IEEE.
- Real-time virtual machines for avionics software migrationL. Sha, and C.-G. LeeInternational Journal of Embedded Systems, 2006
Generalised Rate Monotonic Scheduling (GRMS) theory has now been widely adopted in practice and supported by open standards. In recent years, Ada runtime and COTS RTOSs supporting GRMS have met the DO 178B flight control standard (http://www.ddci.com/products_darts.shtml; http://ic.arc.nasa.gov/publications/pdf/2000–0213.pdf). This creates strong incentives for the avionics industry to migrate from a traditional cyclical executive based system called a federated architecture to a GRMS based system to take advantage of GRMS’s flexibility, and theoretical schedulability analysis. However, the industry is reluctant to migrate due to enormous potential re-certification cost. This paper presents a novel software architecture called a Logical Federated Architecture (LFA) that greatly reduces re-certification cost due to the change of scheduling methods. © 2006 Inderscience Enterprises Ltd.
- A microscopic study of power management in IEEE 802.11 wireless networksC. Hu, R. Zheng, J.C. Hou, and L. ShaInternational Journal of Wireless and Mobile Computing, 2006
In this paper, we demonstrate via a rigorous, analytical model that the periodic structure in IEEE 802.11 Power-Save Mode (PSM) together with its signalling overhead leads to both energy and bandwidth under-utilisation. We then devise Sleep In the Middle and Prolonged Activeness (SIMPA), a new power management protocol based on IEEE 802.11 PSM, to decouple the power management decision points and the Beacon Intervals (BIs), so as to allow fine grained control. In SIMPA, wireless devices can switch to the sleep state inside a BI or extend their active states beyond one BI. A comprehensive simulation study in both single hop wireless LANs without the AP support and multihop wireless networks demonstrates that as compared to IEEE 802.11 PSM, SIMPA can effectively reduce energy consumed under light to medium traffic loads and retain the network capacity for data transport at high traffic loads. Copyright © 2006 Inderscience Enterprises Ltd.
- I-living: An open system architecture for assisted livingW. Qixin, S. Wook, L. Xue, Z. Zheng, C. Oh, B.K. Alshebli, M. Caccamo, C.A. Gunter, E. Gunter, J. Hou, K. Karahalios, and S. LuiConference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, 2006
Advances in networking, sensors, and embedded devices have made it feasible to monitor and provide medical and other assistance to people in their homes. Aging populations will benefit from reduced costs and improved healthcare through assisted living based on these technologies. However, these systems challenge current state-of-the-art techniques for usability, reliability, and security. This is a particular challenge for open and extensible systems that combine software and hardware from many vendors and provide information to diverse clinicians. In this paper we present the I-Living architecture for assisted living that allows independent parties work together in a dependable, secure, and low-cost fashion with predictable properties. Our approach is based on an Assisted Living Service Provider (ALSP) who provides a server that collects and maintains encrypted Assisted Persons (APs)’ records. Our ALSP can be a third party distinct from APs, communication providers, and clinicians; or it can be part of an ISP, hospital or similar enterprise. We have explored the architecture by developing a collection of applications and implementing them in a prototype system. Our system shows the feasibility and opportunity of an open approach to assisted living systems. © 2006 IEEE.
- The dependency management framework: A case study of the ION CubeSatH. Ding, L. Arber, L. Sha, and M. CaccamoProceedings - Euromicro Conference on Real-Time Systems, 2006
Due to the complexity and requirements of modern real-time systems, multiple teams must often work concurrently and independently to develop the various components of the system. Since a team typically only knows the dependency relations between the components they wrote and those they directly use, keeping track of system-wide dependency relations is not possible for any individual team. To further complicate matters, dependency relations often change as software components are refined or their interactions modified. Because the robustness of any real-time system hinges on the availability of essential services in spite of faults and failures in useful but non-essential components, keeping track of the constantly evolving dependency relations between the system’s components is crucial. If a system’s designers cannot ensure that critical services only USE but do not DEPEND ON less critical components, a seemingly minor fault can propagate along complex and unforeseen dependency chains and bring down the entire system. Therefore, automatically tracking and analyzing system-wide dependency relations given only local dependency information is vital for the development of robust real time systems. This paper presents DMF (Dependency Management Framework), a prototype toolkit for dependency management in designing robust real-time systems. We demonstrate the usability and scalability of DMF with a case study of ION CubeSat, the University of Illinois at Urbana-Champaign ’s first student-developed satellite. © 2006 IEEE.
2005
- Dependency algebra: A theoretical framework for dependency management in real-time control systemsH. Ding, K. Lee, and L. ShaProceedings - 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems, ECS 2005, 2005
The large and complex dependency relationship between software components is at the root of system instability. A seemingly minor fault can propagate along dependency chains and bring down the system. Therefore, the first step in the development of a robust system is the management and control of dependency relations between software components. The two contributions of this paper are: 1) a refinement of the commonly used failure semantics in the context of a real time control system: 2) the development of a Dependency Algebra. Component dependency relationships can be described in terms of a set of refined failure semantics mappings, enabling differentiation and comparison of different relationships and designs. The system-wide evaluation and tracking of these dependency relationships are performed using Dependency Algebra. © 2005 IEEE.
- Dependency algebra: A tool for designing robust real-time systemsH. Ding, and L. ShaProceedings - Real-Time Systems Symposium, 2005
A robust system is one that can ensure essential services in spite of faults and failures in useful but non-essential components. Unless we can ensure that critical services can only USE but not depend on less critical components, a seemingly minor fault can propagate along complex and implicit dependency chains and bring down the system. Modern real time systems are often developed concurrently by multiple teams. A team typically only knows the dependency relations between their components and neighboring components. In addition, dependency relations will change as software components and their interactions are being modified. Therefore, how to automatically track and analyze the system wide dependency from local information is important for the development of robust real time systems. This paper presents dependency algebra - a unified theoretical framework plus a prototype toolkit for dependency management in real-time systems. © 2005 IEEE.
- Improved timing control for web server systems using internal state informationX. Liu, R. Zheng, J. Heo, and L. Sha14th International World Wide Web Conference, WWW2005, 2005
How to effectively allocate system resource to meet the Service Level Agreement (SLA) of Web servers is a challenging problem. In this paper, we propose an improved scheme for autonomous timing performance control in Web servers under highly dynamic traffic loads. We devise a novel delay regulation technique called Queue Length Model Based Feedback Control utilizing server internal state information to reduce response time variance in presence of bursty traffic. Both simulation and experimental studies using synthesized workloads and real-world Web traces demonstrate the effectiveness of the proposed approach.
- A distributed, energy-aware, utility-based approach for data transport in wireless sensor networksW.-P. Chen, J.C. Hou, L. Sha, and M. CaccamoProceedings - IEEE Military Communications Conference MILCOM, 2005
Distinct from wireless ad hoc networks, wireless sensor networks are data-centric, application-oriented, collaborative, and energy-constrained in nature. In this paper, we formulate the problem of data transport in sensor networks as an optimization problem, with the objective of maximizing the amount of information (utility) collected at sinks, subject to both the channel band-width and energy constraints. We then devise a distributed solution of the convex optimization problem, and explore in three directions. First, we devise a simple node capacity estimation method to online measure the node capacity. Second, we linearize the energy constraint by properly setting the value of the system lifetime in advance and controlling the data rate of a node so as to sustain its battery lifetime longer than the specified lifetime. Finally, we incorporate the optimization results into routing so as to provide sensors with opportunities to select better routes. The simulation results show that the utility-based approach balances between system utility and system lifetime.
- Timing performance control in Web server systems utilizing server internal state informationX. Liu, R. Zheng, J. Heo, Q. Wang, and L. ShaJoint International Conference on Autonomic and Autonomous Systems and International Conference on Networking and Services, ICAS/ICNS 2005, 2005
How to effectively allocate system resource to meet the Service Level Agreement (SLA) of Web servers is a challenging problem. In this paper, we propose an improved scheme for autonomous timing performance control in Web servers under highly dynamic traffic loads. We devise a novel delay regulation technique called Queue Length Model Based Feedback Control utilizing server internal state information to reduce response time variance in presence of bursty traffic. Both simulation and experimental studies using synthesized workloads and real-world Web traces demonstrate the effectiveness of the proposed approach. © 2005 IEEE.
- MAC layer support for group communication in wireless sensor networksR. Zheng, L. Sha, and W. Feng2nd IEEE International Conference on Mobile Ad-hoc and Sensor Systems, MASS 2005, 2005
In this paper, we investigate the problem of providing efficient communication primitives across domains of wireless sensor network (WSN) applications. We argue both qualitatively and quantitatively that group communication among sensors of geographic proximity is one of the basic building blocks of many WSN applications. Furthermore, group communication awareness needs to be embedded and implemented at the MAC layer due to the broadcast nature of wireless medium. We devise a MAC protocol, called LGC-MAC to enable efficient single-hop one-to-many and many-to-one communication. We present case studies of two example applications, acoustic target tracking and propagation of information with feedback using LGC-MAC and demonstrate that LGC-MAC can improve the response time, alleviate channel contention and provide better fault tolerance to packet collisions and wireless errors. © 2005 IEEE.
- A dependable online testing and upgrade architecture for real-time embedded systemsK. Lee, and L. ShaProceedings - 11th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2005
When real-time embedded system software needs to be upgraded, it will be more dependable if the new software is sufficiently tested on the actual deployment platform. The challenge is to provide a safeguard for protecting the normal operations from faulty upgrades. However, the safeguard must be not only efficient but also able to be added and taken away as needed without shutting down the normal operations. We have developed an architecture based on Simplex Architecture and Process Resurrection and have applied it to the inverted pendulum control system. The measurements show that the overhead is small and justifiable. © 2005 IEEE.
- Modeling 3-tiered web applicationsX. Liu, J. Heo, and L. ShaProceedings - IEEE Computer Society’s Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS, 2005
The rapid advancement and deployment of Web applications call for a precise yet simple model for capacity planning and analysis purposes. The most widely deployed Web application architecture is the 3-tiered system, which is composed of a front-end Web server, an application server and a hackend database server. In this paper, we present an analytical model of the 3-tiered Web application architecture. We show by using queueing network theory, we can model the 3-tiered Web application architecture accurately. A test-bed is built to measure model parameters based on industry standard server components and TPC-W benchmark. Validation results show that the proposed model predicts performance measures such as response time and throughput accurately. © 2005 IEEE.
- Co-design based approach to improve robustness in networked control systemsS. Kowshik, G. Baliga, S. Graham, and L. ShaInternational Conference on Dependable Systems and Networks, 2005
Traditional control systems consist of sensors, controllers, and actuators operating with tight periodic dependencies, and communicating over dedicated real-time channels such as CAN or FDDI. However, best effort networks such as 802.11 are being increasingly used in such systems. The unpredictable delays and losses in such networks violate the periodicity assumptions of digital control design, and the consequent fail-safe actions incur significant performance penalties. In this paper, we propose a co-design based approach to address the periodicity requirements of digital control design, and improve robustness by extending deadlines through graceful degradation to the fail-safe action. In particular, we analytically demonstrate significant deadline extensions in the control loop of a traffic control testbed based on our approach. Such deadline extensions also facilitate fault tolerance techniques such as component restarts, and system management mechanisms such as online component upgrades. We validate the results by experiments in the testbed. © 2005 IEEE.
- Process Resurrection: A fast recovery mechanism for real-time embedded systemsK. Lee, and L. ShaIEEE Real-Time and Embedded Technology and Applications Symposium, RTAS, 2005
This paper describes a fast recovery mechanism that meets the requirements of embedded real-time systems. In general purpose computing, restart is an established technology for achieving high availability. Restart has also been used in soft real-time systems where the temporary interruption of service is undesirable but acceptable. However, it has not been widely used in small embedded realtime systems with hard deadlines, mainly because the traditional approaches do not meet their requirements of low memory and processor overhead, and fast response times with little variations. We have developed Process Resurrection, a novel restart mechanism for recovering from crash failures to meet these requirements. The experiments on an inverted pendulum control system shows that it can recover the control process in time after a crash (eg. segmentation fault). Another experiment conducted on an MP3 audio player shows that this technique is also applicable to some multimedia applications. © 2005 IEEE.
- A Framework for Time Indexing in Sensor NetworksG. He, I. Gupta, L. Sha, and R. ZhengACM Transactions on Sensor Networks, 2005
In this article, we define the time-indexing problem as the in-network storage and querying of sensor network data based solely on the time attribute. We argue qualitatively why existing storage schemes may be insufficient as solutions. We then present, analyze, and evaluate novel and lightweight solutions to both the storage and the querying subproblems for time indexing. First, the time-indexed storage problem is formally defined and two formulations are presented seeking to optimize generic utility functions that are derived from concerns about energy, bandwidth usage, and storage balancing. We present and analyze decentralized protocols to solve these formulations and prove the optimality of some of our solutions. Secondly, maintenance and use of simple overlays among rendezvous point nodes in order to enable fault-tolerant and efficient time-indexed queries are discussed. Finally, simulation results are presented to quantify performance characteristics of the protocols, and we find that our proposed scheme has low query overhead that scales with system size and density while exhibiting very good load-balancing and fault-tolerance properties. The use of time-indexed structure is shown to achieve more than double the lifetime of sensor networks compared to existing approaches in some scenarios. Copyright © 2005, ACM. All rights reserved.
- Design and analysis of an MST-based topology control algorithmN. Li, J.C. Hou, and L. ShaIEEE Transactions on Wireless Communications, 2005
In this paper, we present a minimum spanning tree (MST)-based algorithm, called local minimum spanning tree (LMST), for topology control in wireless multihop networks. In this algorithm, each node builds its LMST independently and only keeps on-tree nodes that are one-hop away as its neighbors in the final topology. We analytically prove several important properties of LMST: 1) the topology derived under LMST preserves the network connectivity; 2) the node degree of any node in the resulting topology is bounded by 6; and 3) the topology can be transformed into one with bidirectional links (without impairing the network connectivity) after removal of all unidirectional links. Simulation results show that LMST can increase the network capacity as well as reduce the energy consumption. © 2005 IEEE.
2004
- A case for resource heterogeneity in large sensor networksS. Kandula, J. Hou, and L. ShaProceedings - IEEE Military Communications Conference MILCOM, 2004
Sensor networks have traditionally consisted of nodes with the same amount of resources such as battery life and computational power. While such homogeneity has distinct advantages in terms of ease-of-fabrication, it has also been shown that in multi-hop ad-hoc scenarios homogeneity leads to large duty cycles, small end-to-end data throughput and poor deployment lifetimes. In this paper, we investigate sensor networks that have a single degree of heterogeneity, a random subset of the sensors, called accumulators, have more power and computational capability. To this end, we develop a decentralized, hierarchical clustering algorithm, called Hierarchical Clustering and Routing (HCR) algorithm. HCR exploits heterogeneity among sensor nodes to form a cluster hierarchy, with the objective of achieving better information throughput and improving network lifetime. A unique feature of HCR is its integration of routing with cluster formation and data delivery. Routing tables are constructed/updated in the course of cluster formation and data transmission, therefore incurring essentially no routing overhead. By exploiting geometrical features of hexagons, we show several desirable properties of HCR. In particular, we show via ns-2 simulations that HCR improves message overhead in cluster formation by 120-210%. We also show that heterogeneity achieves performance improvements of (i) upto 200% in terms of information throughput, (ii) power savings of upto 30% of the power spent in data transmission and (iii) a 2% density of accumulators is sufficient for most improvements. ©2004 IEEE.
- Fixed or dynamic priority? That is the questionD. Mossé, T. Baker, S. Baruah, G. Buttazzo, A. Burns, L. Sha, and J. StankovicProceedings - Real-Time Systems Symposium, 2004
Since the first results published in 1973 by Liu and Layland on the Rate Monotonic (RM) and Earliest Deadline First (EDF) algorithms, a lot of progress has been made in the schedulability analysis of periodic task sets. Nevertheless, many important results are not widely known and several misconceptions still exist about the properties of these two scheduling methods. Surprisingly, an exhaustive comparison between fixed priority and dynamic priority scheduling is still missing and there is not a general agreement in the real-time community about the suitability of these two scheduling approaches for future real-time systems. The purpose of the panel is to present the most controversial issues to a wide specialized audience, starting a constructive and interactive discussion aimed at clarifying some critical points and drawing some missing research lines.
- Finite-horizon scheduling of radar dwells with online template constructionS. Gopalakrishnan, M. Caccamo, C.-S. Shih, C.-G. Lee, and L. ShaProceedings - Real-Time Systems Symposium, 2004
Timing constraints for radar tasks are usually specified in terms of the minimum and maximum temporal distance between successive radar dwells. We utilize the idea of feasible intervals for dealing with the temporal distance constraints. In order to increase the freedom that the scheduler can offer a high-level resource manager, we introduce a technique for nesting and interleaving dwells online while accounting for the energy constraint that radar systems need to satisfy. Further, in radar systems, the task set changes frequently and we advocate the use of finite horizon scheduling in order to avoid the pessimism that is inherent in schedulers that assume a task will execute forever. We also develop the notion of modular schedule update which allows portions of a schedule to be altered without affecting the entire schedule, thereby simplifying the scheduler. Through extensive simulations we validate our claims of providing greater scheduling flexibility without compromising on performance when compared with earlier work based on templates constructed offline. © 2004 IEEE.
- Hard real-time communication in bus-based networksS. Gopalakrishnan, L. Sha, and M. CaccamoProceedings - Real-Time Systems Symposium, 2004
Route selection is an important aspect of the design of real-time systems in which messages might have to travel over multiple hops to reach their destination and multiple paths exist between a source and a destination. The length of a route affects the ability to meet deadlines and greedy routing might leave certain messages with no feasible route. We consider bus-based networks on which periodic message transmissions need to be scheduled and present a technique for synthesizing routes such that all messages meet their deadlines. Our offline technique enables system designers to configure routes in a large-scale embedded system. In our solution, we allow message fragmentation and utilize multiple paths to satisfy the requirements of each message. The routing problem is NP-complete and our approximation algorithm is based on a linear programming formulation. In our methodology, we deal with both earliest deadline first and rate monotonie scheduling at each bus in the system. Apart from point-to-point messages, we discuss scheduling multicast messages to facilitate the publisher/subscriber model. Finally, we also mention some heuristics for online routing which might be of value in soft real-time systems. © 2004 IEEE.
- Lightning: A fast and lightweight acoustic localization protocol using low-end wireless micro-sensorsQ. Wang, R. Zheng, A. Tirumala, X. Liu, and L. ShaProceedings - Real-Time Systems Symposium, 2004
Acoustic awareness is an important service in ubiquitous computing environments. This paper presents a fast lightweight acoustic event localization protocol, the Lightning protocol, to locate impulsive sound sources using arrays of wireless micro-sensors. This protocol utilizes domain-invariant knowledge of acoustic and electromagnetic wave propagation to efficiently reduce the number of contending sensors in the localization process. It incurs O(1) transmissions irrespective of the sensor density and guarantees O(1) time delay in localization. Experiment results using UC Berkeley Motes demonstrate that the time delay for Lightning Protocol to locate hand clap sounds is in terms of milliseconds. Dept. of Computer Science, Univ. of Houston © 2004 IEEE.
- Time indexing in sensor networksR. Zheng, G. He, I. Gupta, and L. Sha2004 IEEE International Conference on Mobile Ad-Hoc and Sensor Systems, 2004
In this paper, we define the time indexing problem as the in-network storage and querying of sensor network data based solely on the time attribute. We argue qualitatively why existing storage schemes may be insufficient as solutions. We then present, analyze, and evaluate novel and lightweight solutions to both the storage and the querying sub-problems for time indexing. First, the time-indexed storage problem is formally defined, and two formulations are presented, seeking to optimize generic utility functions that are derived from concerns about energy, bandwidth usage, and storage balancing. We present and analyze decentralized protocols to solve these formulations, and prove the optimality of some of our solutions. Secondly, maintenance and use of simple overlays among rendezvous point nodes, in order to enable fault-tolerant and efficient time-indexed queries, are discussed. Finally, simulation results are presented to quantify performance characteristics of the protocols, and we find that our proposed scheme has low query overhead that scales with system size and density while exhibiting very good load balancing and fault tolerance properties. ©2004 IEEE.
- On time-out driven power management policies in wireless networksR. Zheng, J.C. Hou, and L. ShaGLOBECOM - IEEE Global Telecommunications Conference, 2004
Switching the devices to low-power states in prolonged period of inactivity is a widely used technique to conserve energy for battery-powered wireless devices. In this paper, we present a mathematical abstraction of time-out driven power management policies together with different wakeup mechanism in wireless networks to characterize the energy-performance trade-offs. The time-out driven power management is modeled as a M/G/1/K queue with multiple vacations and an attention span. We then derive the steady state behaviors of such systems, and present a closed-form solution for systems with large buffers. The analysis reveals that the "best" power management policy to minimize energy-delay product exhibits a threshold structure, i.e., when the traffic load is below certain threshold, a node should switch to the low-power state whenever possible and always remain active otherwise, and suggests a threshold-based power management protocol. © 2004 IEEE.
- Real time scheduling theory: A historical perspectiveL. Sha, T. Abdelzaher, K.-E. Årzén, A. Cervin, T. Baker, A. Burns, G. Buttazzo, M. Caccamo, J. Lehoczky, and A.K. MokReal-Time Systems, 2004
In this 25th year anniversary paper for the IEEE Real Time Systems Symposium, we review the key results in real-time scheduling theory and the historical events that led to the establishment of the current real-time computing infrastructure. We conclude this paper by looking at the challenges ahead of us.
- Online QoS optimization using service classes in surveillance radar systemsC.-G. Lee, C.-S. Shih, and L. ShaReal-Time Systems, 2004
Many application level qualities are functions of available computation resources. Recent studies have handled the computation resource allocation problem to maximize the overall application quality. However, such QoS problems are fundamentally multi-dimensional optimization problems that require extensive computation. Therefore, online usage of optimization procedures may significantly reduce the computation resource available for applications. This raises the question of how to best use the optimization procedures for dynamic real-time task sets. In dynamic real-time systems, it is important to improve the performance by re-allocating the resources adapting to dynamic situations. However, the overhead of changing task parameters (i.e., algorithms and frequencies) for resource re-allocation is non-negligible in many applications. Thus, too frequent change of resource allocation may not be desirable. This paper proposes a method called service classes configuration to address the QoS problem with dynamic arrival and departure of tasks. The method avoids online usage of optimization procedures by offline designing templates (called service classes) of resource allocation, which will be adaptively used depending on online situations. The service classes are designed by best trading-off the accuracy of dynamic adaptation against the overhead of resource reallocation. A simplified radar application is used as an illustrative example.
- Etherware: Domainware for wireless control networksG. Baliga, S. Graham, L. Sha, and P.R. KumarProceedings - Seventh IEEE International Symposium on Object-Oriented Real-Time Distributed Computing, 2004
The promise of middleware is to enable integration and evolution of complex systems dynamically. In demanding domains such as wireless control networks, fulfilling this promise while maintaining complete generality is extremely complicated. Understanding and exploiting the forcing functions of a domain helps manage this complexity by avoiding redundant generalizations. Domainware exploits this technique and adopts a simpler architecture to support more important non-functional requirements effectively. This paper presents Etherware, a Domainware for wireless control networks. Capitalizing on our development of a fairly complex control system testbed, commonly supported yet redundant generalizations are identified and eliminated. The resulting architecture is simple, and can support a wide range of trade-offs that can be manipulated easily at run-time. This is illustrated by showing how the performance of Control Time Protocol (CTP), an Etherware service, is optimized by the additional options available in Etherware.
- Synthesizing task periods for dwells in multi-function phased array radarsC.-S. Shih, P. Ganti, S. Gopalakrishnan, M. Caccamo, and L. ShaIEEE National Radar Conference - Proceedings, 2004
This paper addresses the problem of scheduling radar dwells in multi-function phased array radar systems. The timing constraint of radar tasks are usually modeled by the minimal and maximal temporal distance between any two consecutive dwells of a task. Such a timing constraint makes it difficult for traditional real-time scheduling techniques to provide predicatable timing guarantee, without over-consuming the resources. We propose a novel approach to model the dwells as periodic real-time tasks. The periods of the tasks are synthesized by the minimal and maximal temporal distance constraint of the dwells. The synthetic periods allow the template-based scheduling algorithm to compute efficient dwell schedules with low overhead. We evaluate the algorithms via extensive simulations. Simulation results show that this algorithm can significantly improve the resource utilization, comparing with traditional dwell scheduling algorithms.
- Dynamic clustering for acoustic target tracking in wireless sensor networksW.-P. Chen, J.C. Hou, and L. ShaIEEE Transactions on Mobile Computing, 2004
In the paper, we devise and evaluate a fully decentralized, light-weight, dynamic clustering algorithm for target tracking. Instead of assuming the same role for all the sensors, we envision a hierarchical sensor network that is composed of 1) a static backbone of sparsely placed high-capability sensors which will assume the role of a cluster head (CH) upon triggered by certain signal events and 2) moderately to densely populated low-end sensors whose function is to provide sensor information to CHs upon request. A cluster is formed and a CH becomes active, when the acoustic signal strength detected by the CH exceeds a predetermined threshold. The active CH then broadcasts an information solicitation packet, asking sensors in its vicinity to join the cluster and provide their sensing information. We address and devise solution approaches (with the use of Voronoi diagram) to realize dynamic clustering: (1×1) how CHs cooperate with one another to ensure that only one CH (preferably the CH that is closest to the target) is active with high probability, (1×2) when the active CH solicits for sensor information, instead of having all the sensors in its vicinity reply, only a sufficient number of sensors respond with nonredundant, essential information to determine the target location, and (1×3) both the packets that sensors send to their CHs and packets that CHs report to subscribers do not incur significant collision. Through both probabilistic analysis and ns-2 simulation, we show with the use of Voronoi diagram, the CH that is usually closest to the target is (implicitly) selected as the leader and that the proposed dynamic clustering algorithm effectively eliminates contention among sensors and renders more accurate estimates of target locations as a result of better quality data collected and less collision incurred.
- Enhanced Utilization Bounds for QoS ManagementC.-G. Lee, L. Sha, and A. PeddiIEEE Transactions on Computers, 2004
In many practical real-time applications, there is a given set of task frequencies (i.e., inverse of task periods) corresponding to predetermined QoS options that the applications can choose. For example, in audio applications, the typical choices of playback frequencies are for CD quality, radio quality, and phone quality. Similar configurations can be found in streaming video and in control applications. In such systems, applications dynamically arrive requesting one of the periods provided by the QoS manager. Thus, the accurate and efficient online schedulability test is essential for any task set whose periods are chosen from the QoS period set. For this purpose, this paper proposes new utilization bounds as a function of given QoS periods. As long as there is a set of given periods that can be chosen by applications, the bounds developed in this paper can be used by the QoS manager to quickly determine the schedulability of dynamic applications.
- Real-time synchronization protocolsL. Sha, and M. Caccamo2004
An important problem that arises in the context of real-time systems is the duration for which high-priority tasks are blocked by the execution of low-priority jobs. A milestone in this area was the discovery, analysis, and solutions of the unbounded priority inversion problem in the 1980s. © 2004 by CRC Press LLC.
- Real-time virtual machines for avionics software porting and developmentLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2004
Generalized rate monotonic scheduling (GRMS) theory has now been widely adopted in practice and supported by open standards. This creates strong incentives for the avionics industry to migrate from traditional cyclical executive based system to a GRMS based systems. This paper presents some of the important considerations in the migration of a cyclical executive based system to a GRMS based system. © Springer-Verlag 2004.
- An energy-aware data-centric generic utility based approach in wireless sensor networksW.-P. Chen, and L. ShaThird International Symposium on Information Processing in Sensor Networks, IPSN 2004, 2004
Distinct from wireless ad hoc networks, wireless sensor networks are data-centric, application-oriented, collaborative, and energy-constrained in nature. In this paper, formulate the problem of data transport in sensor networks as an optimization problem whose objective function is to maximize the amount of information (utility) collected at sinks (subscribers), subject to the flow, energy and channel bandwidth constraints. Also, based on a Markov model extended from [3], we derive the link delay and the node capacity in both the single and multi-hop environments, and figure them in the problem formulation. We study three special cases under the problem formulation. In particular, we consider the energy-aware flow control problem, derive an energy aware flow control solution, and investigate via ns-2 simulation its performance. The simulation results show that the proposed energy-aware flow control solution can achieve high utility and low delay without congesting the network.
2003
- Feedback control with queueing-theoretic prediction for relative delay guarantees in web serversY. Lu, T. Abdelzaher, C. Lu, L. Sha, and X. LiuReal-Time Technology and Applications - Proceedings, 2003
The use of feedback control theory for performance guarantees in QoS-aware systems has gained much attention in recent years. In this paper, we investigate merging, within a single framework, the predictive power of queueing theory with the reactive power of feedback control to produce software systems with a superior ability to achieve QoS specifications in highly unpredictable environments. The approach is applied to the problem of achieving relative delay guarantees in high-performance servers. Experimental evaluation of this approach on an Apache web server shows that the combined schemes perform significantly better in terms of keeping the relative delay on target compared to feedback control or queueing prediction alone. © 2003 IEEE.
- Automated verification of the dependability of object-oriented real-time systemsH. Ding, C. Zheng, G. Agha, and L. ShaProceedings - International Workshop on Object-Oriented Real-Time Dependable Systems, WORDS, 2003
We develop an effective approach to formally specify and automatically verify the dependability of object-oriented real-time systems based on the Actor model and Real-Time Maude. Our approach decomposes an application into functional components represented as concurrent objects or actors, and separately specifies the timing constraints using RTSynchronizer. It achieves the goal of automatically verifying the dependability and timing properties of the target system by implementing the operational semantics of Actor and RTSynchronizer in Real-Time Maude, which supports executable specification and various temporal model checking analysis. We demonstrate the effectiveness of our approach by an annotated case study of the Simplex architecture. © 2003 IEEE.
- Template-based real-time dwell scheduling with energy constraintC.-S. Shih, S. Gopalakrishnan, P. Ganti, M. Caccamo, and L. ShaReal-Time Technology and Applications - Proceedings, 2003
This paper addresses the scheduling problem of radar dwells in multi-function phase array radars. Well-known and new challenges make it difficult to provide predictable performance for real-time dwell scheduling. We developed the template-based scheduling algorithm to guarantee the performance requirement with low on-line overhead. Simulation results show that the template-based scheduling approach increases utilization and provides predictable performance. © 2003 IEEE.
- Radar dwell scheduling considering physical characteristics of phased array antennaC.-G. Lee, P.-S. Kang, C.-S. Shih, and L. ShaProceedings - Real-Time Systems Symposium, 2003
This paper proposes novel techniques for scheduling radar dwells in phased array radar systems. In order to handle complex physical characteristics such as dwell interleaving, transmitting duty cycle constraint, and energy constraint, we propose a notion of schedulability envelope. The schedulability envelope designed offline hides the details of complex radar dwell scheduling and provides a simple measure for the schedulability check. Using the schedulability envelope, the proposed technique can efficiently perform the admission control for dynamic target tracking tasks. The simulation results show that the proposed approach can significantly improve the system utilization by taking advantage of dwell interleaving while guaranteeing the schedulability and physical constraints.
- Scheduling real-time dwells using tasks with synthetic periodsC.-S. Shih, S. Gopalakrishnan, P. Ganti, M. Caccamo, and L. ShaProceedings - Real-Time Systems Symposium, 2003
This paper addresses the problem of scheduling real-time dwells in multi-function phase array radar systems. To keep track of targets, a radar system must meet its timing and energy constraints. We propose a new task model for radar dwells to accurately characterize their timing parameters. We develop an algorithm of transforming every dwell task as a semi-period task so the dwell task can meet its timing constraint and the interarrival times of the task may not a constant. We also develop an enhanced template-based scheduling algorithm to schedule such tasks to meet the timing and energy constraints. Simulation results show that this algorithm can significantly improve the resource utilization.
- Optimal QoS sampling frequency assignment for real-time wireless sensor networksX. Liu, Q. Wang, L. Sha, and W. HeProceedings - Real-Time Systems Symposium, 2003
How to allocate computing and communication resources in a way that maximizes the effectiveness of control and signal processing has been an important area of research. The characteristic of a multi-hop Real-Time Wireless Sensor Network raises new challenges. First, the constraints are more complicated and a new solution method is needed. Second, we need a distributed solution to achieve scalability. This paper presents solutions to both of the new challenges. The first solution to the optimal frequency allocation is a centralized solution that can handle the more general form of constraints as compared with prior research. The second solution is a distributed version for large networks using a pricing scheme. It is capable of incremental adjustment when utility functions change.
- Design and analysis of an MST-based topology control algorithmN. Li, J.C. Hou, and L. ShaProceedings - IEEE INFOCOM, 2003
In this paper, we present a minimum spanning tree (MST) based topology control algorithm, called local minimum spanning tree (LMST), for wireless multi-hop networks. In this algorithm, each node builds its local minimum spanning tree independently and only keeps on-tree nodes that are one-hop away as its neighbors in the final topology. We analytically prove several important properties of LMST: (1) the topology derived under LMST preserves the network connectivity; (2) the node degree of any node in the resulting topology is bounded by 6; and (3) the topology can be transformed into one with bidirectional links (without impairing the network connectivity) after removal of all uni-directional links. These results are corroborated in the simulation study. © 2003 IEEE.
- A Bluetooth loop scatternet formation algorithmH. Zhang, J.C. Hou, and L. ShaIEEE International Conference on Communications, 2003
Bluetooth is a promising new wireless technology that enables portable devices to form short-range wireless ad hoc networks. In this paper, we present a new, distributed Bluetooth scatternet formation algorithm, called loop scatternet formation, that forms scatternets with slave/slave bridges only. In addition to meeting the criteria of maintaining connectivity, minimizing the number of piconets and the maximum degree of devices, the proposed algorithm formalizes the notion of network diameter and node contention. The loop scatternet thus formed incurs a much smaller network diameter and the number of node pairs for which a device has to serve as a relay node is significantly smaller than that in the other types of scatternets. To validate the design, we derive the bounds of the number of piconets, the network diameter, and the maximum node contention. We also conduct ns-2 simulation to evaluate the performance of loop scatternets. Both analytical and simulation results validate the desirable features of loop scatternets.
- On the scheduling of flexible and reliable real-time control systemsR. Chandra, X. Liu, and L. ShaReal-Time Systems, 2003
Many real-time systems, such as manufacturing plants, have long life cycles. To enable the realization of technological innovations and to mitigate the risk and cost of bringing new control technologies into functioning systems, flexible and reliable real-time software architectures such as Simplex have been developed. There is also an emerging trend that integrates the design of controllers and schedulers. For example, algorithms that identify the optimal frequencies of control tasks subject to schedulability constraints have been developed, and the notion of feedback schedulers has been investigated. However, the optimization of the performance of flexible and reliable architectures with analytically redundant software controllers has remained an open problem. In fact, a direct application of the existing optimization methods would not yield the optimal frequencies. In this paper, we present a method that correctly finds the optimal frequencies for systems using analytically redundant controllers. We also show that the proposed method is robust against inaccuracies in the estimation of failure rates of the controllers.
- Enhanced processor budget for QoS management in multimedia systemsC.-G. Lee, and L. ShaProceedings - International Parallel and Distributed Processing Symposium, IPDPS 2003, 2003
Resource reservation and QoS negotiation is a common way to guarantee timely progress of programs in distributed multimedia systems. For this, determining the available resource capacity, resource budget, is important. The resource budget depends on resource characteristics (e.g., processor, memory, disk, and network bandwidth) and scheduling algorithms. The paper provides an improved processor budget for the fixed-priority scheduling algorithm, which is most common in commercial real-time operating systems. The improvement is possible by noting that, in multimedia systems, there is a prefixed set of task periods for the finite set of QoS options and parameters. Our approach explicitly takes these periods into account and calculates the tight bound of the processor budget using the linear programming technique. This bound significantly improves Liu and Layland bound (Liu and Layland, 1973) and also it is proved to be better than any other bounds in the literature. We also show how this bound is effectively used for resource reservation and QoS re-negotiation for adapting to dynamic workload. © 2003 IEEE.
- Dynamic clustering for acoustic target tracking in wireless sensor networksW.-P. Chen, J.C. Hou, and L. ShaProceedings - International Conference on Network Protocols, ICNP, 2003
In the paper, we devise and evaluate a fully decentralized, light-weight, dynamic clustering algorithm for target tracking. Instead of assuming the same role for all the sensors, we envision a hierarchical sensor network that is composed of (a) a static backbone of sparsely placed high-capability sensors which assume the role of a cluster head (CH) upon triggered by certain signal events; and (b) moderately to densely populated low-end sensors whose function is to provide sensor information to CHs upon request. A cluster is formed and a CH becomes active, when the acoustic signal strength detected by the CH exceeds a pre-determined threshold. The active CH then broadcasts an information solicitation packet, asking sensors in its vicinity to join the cluster and provide their sensing information. We address and devise solution approaches (with the use of Voronoi diagram) to realize dynamic clustering: (I1) how CHs cooperate with one another to ensure that for the most of time only one CH (preferably the CH that is closest to the target) is active; (I2) when the active CH solicits for sensor information, instead of having all the sensors in its vicinity reply, only a sufficient number of sensors respond with non-redundant, essential information to determine the target location; and (I3) both packets with which sensors respond to their CHs and packets that CHs report to subscribers do not incur significant collision. Through both probabilistic analysis and ns-2 simulation, we show with the use of Voronoi diagram, the CH that is usually closest to the target is (implicitly) selected as the leader and that the proposed dynamic clustering algorithm effectively eliminates contention among sensors and renders more accurate estimates of target locations as a result of better quality data collected and less collision incurred. © 2003 IEEE.
- Upgrading real-time control software in the fieldIEEE, 2003
The new millennium heralds the convergence between computing, communication, and the intelligent control of our physical environments. Embedded systems often have a long life cycle. This paper reviews our research group’s work on how to upgrade embedded control systems without shutting them down, and how to protect the system from bugs and attacks that could be introduced by software upgrades. © 2003 IEEE.
- Online response time optimization of apache web serverX. Liu, L. Sha, Y. Diao, S. Froehlich, J.L. Hellerstein, and S. ParekhLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2003
Properly optimizing the setting of configuration parameters can greatly improve performance, especially in the presence of changing workloads. This paper explores approaches to online optimization of the Apache web server, focusing on the MaxClients parameter (which controls the maximum number of workers). Using both empirical and analytic techniques, we show that MaxClients has a concave upward effect on response time and hence hill climbing techniques can be used to find the optimal value of MaxClients. We investigate two optimizers that employ hill climbing - one based on Newton’s Method and the second based on fuzzy control. A third technique is a heuristic that exploits relationships between bottleneck utilizations and response time minimization. In all cases, online optimization reduces response times by a factor of 10 or more compared to using a static, default value. The trade-offs between the online schemes are as follows. Newton’s method is well known but does not produce consistent results for highly variable data such as response times. Fuzzy control is more robust, but converges slowly. The heuristic works well in our prototype system, but it may be difficult to generalize because it requires knowledge of bottleneck resources and an ability to measure their utilizations. © Springer-Verlag Berlin Heidelberg 2003.
- Acoustic target tracking using tiny wireless sensor devicesQ. Wang, W.-P. Chen, R. Zheng, K. Lee, and L. ShaLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2003
With the advancement of MEMS technologies, wireless networks consist of tiny sensor devices hold the promise of revolutionizing sensing in a wide range of application domains because of their flexibility, low cost and ease of deployment. However, the constrained computation power, battery power, storage capacity and communication bandwidth of the tiny devices pose challenging problems in the design and deployment of such systems. Target localization using acoustic signal with tiny wireless devices is a particularly difficult task due to the amount of signal processing and computation involved. In this paper, we provide an in-depth study of designing such wireless sensor networks for real-world acoustic tracking applications. We layout a cluster-based architecture to address the limitations of the tiny sensing devices. To achieve effective utilization of the scarce wireless bandwidth, a quality-driven paradigm to suppress redundant information and resolve contention is proposed. One instance of the quality-driven approach is implemented in the acoustic tracking system, where the quality of the tracking reports can be quantified numerically. We demonstrate the effectiveness of our proposed architecture and protocols using a sensor network testbed based on UCBerkeley mica motes. Considering the performance limitations of tiny sensor devices, the achieved acoustic target tracking accuracy is extraordinarily good. Our experimental study also shows that the acoustic target tracking quality can be indeed measured and used to assist resource allocation decisions. This application-driven design and implementation exercises also serve to identify important areas of further work in in-network processing and communications. © Springer-Verlag Berlin Heidelberg 2003.
- Reliable upgrade of group communication software in sensor networksP.V. Krishnan, L. Sha, and K. Mechitov1st IEEE International Workshop on Sensor Network Protocols and Applications, SNPA 2003, 2003
Communication is critical between nodes in wireless sensor networks. Upgrades to their communication software need to be done reliably because residual software errors in the new module can cause complete system failure. We present a software architecture, called cSimplex, which can reliably upgrade multicast-based group communication software in sensor networks. Errors in the new module are detected using statistical checks and a stability definition that we propose. Error recovery is done by switching to a well-tested, reliable safety module without any interruption in the functioning of the system. cSimplex has been implemented and demonstrated in a network of acoustic sensors with mobile robots functioning as base stations. Experimental results show that faults in the upgraded software can be detected with an accuracy of 99.71% on average. The architecture, which can be easily extended to other reliable upgrade problems, will facilitate a paradigm shift in system evolution from static design and extensive testing to reliable upgrades of critical communication components in networked systems, thus also enabling substantial savings in testing time and resources. © 2003 IEEE.
- Real-time communication and coordination in embedded sensor networksJ.A. Stankovic, T.F. Abdelzaher, C. Lu, L. Sha, and J.C. HouIEEE, 2003
Sensor networks can be considered distributed computing platforms with many severe constraints, including limited CPU speed, memory size, power, and bandwidth. Individual nodes in sensor networks are typically unreliable and the network topology dynamically changes, possibly frequently. Sensor networks also differ because of their tight interaction with the physical environment via sensors and actuators. Because of this interaction, we find that sensor networks are very data-centric. Due to all of these differences, many solutions developed for general distributed computing platforms and for ad-hoc networks cannot be applied to sensor networks. After discussing several motivating applications, this paper first discusses the state of the art with respect to general research challenges, then focuses on more specific research challenges that appear in the networking, operating system, and middleware layers. For some of the research challenges, initial solutions or approaches are identified. © 2003 IEEE.
- Asynchronous wakeup for ad hoc networksR. Zheng, J.C. Hou, and L. ShaInternational Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), 2003
Due to the slow advancement of battery technology, power management in wireless networks remains to be a critical issue. Asynchronous wakeup has the merits of not requiring global clock synchronization and being resilient to network dynamics. This paper presents a systematic approach to designing and implementing asynchronous wakeup mechanisms in ad hoc networks. The optimal wakeup schedule design can be formulated as a block design problem in combinatorics. We propose a neighbor discovery and schedule bookkeeping protocol that can operate on the optimal wakeup schedule derived. Two power management policies, i.e. slot-based power management and on-demand power management, are studied to overlay desirable communication schedule over the wakeup schedule mandated by the asynchronous wakeup mechanism. Simulation studies indicate that the proposed asynchronous wakeup protocol is quite effective under various traffic characteristics and loads: energy saving can be as high as 70%, while the packet delivery ratio is comparable to that without power management.
2002
- An implicit prioritized access protocol for wireless sensor networksM. Caccamo, L.Y. Zhang, L. Sha, and G. ButtazzoProceedings - Real-Time Systems Symposium, 2002
Recent advances in wireless technology have brought us closer to the vision of pervasive computing where sensors/actuators can be connected through a wireless network. Due to cost constraints and the dynamic nature of sensor networks, it is undesirable to assume the existence of base stations connected by a wired backbone. In this paper, we present a network architecture suitable for sensor networks along with a medium access control protocol based on Earliest Deadline First.
- Handling execution overruns in hard real-time control systemsM. Caccamo, G. Buttazzo, and L. ShaIEEE Transactions on Computers, 2002
In many real-time control applications, the task periods are typically fixed and worst-case execution times are used in schedulability analysis. With the advancement of robotics, flexible visual sensing using cameras has become a popular alternative to the use of embedded sensors. Unfortunately, the execution time of visual tracking varies greatly. In such environments, control tasks have a normally short computation time, but also an occasional long computation time; therefore, the use of worst-case execution time is inefficient for control performance optimization. Nevertheless, to maintain the control stability, we still need to guarantee the schedulability of the task set, even if the worst case arises. In this paper, we propose an integrated approach to control performance optimization and task scheduling for control applications where the execution time of each task can vary greatly. We present an innovative approach to overrun management that allows us to fully utilize the processor for optimizing the control performance and yet guaranteeing the schedulability of all tasks under worst-case conditions.
-
- Upgrading embedded software in the field: Dependability and survivabilityLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2002
The new millennium heralds the convergence between computing, communication and the intelligent control of our physical environments. Computers embedded in roads, bridges, buildings and vehicles tend to have a long life cycle. Application needs will change and computing, communication and control technologies willevolve rapidly. To keep systems modern, we need technologies to dependably and securely upgrade embedded software in the field. This paper provides a review of our work on how to upgrade embedded control systems without shutting them down, and how to protect the system from bugs and attacks that could be introduced by software upgrades. © Springer-Verlag Berlin Heidelberg 2002.
- Queueing model based network server performance controlL. Sha, X. Liu, Y. Lu, and T. AbdelzaherProceedings-Real-Time Systems Symposium, 2002
Controlling the timing performance of a network server is a challenging problem. This paper presents a Queueing Model Based Feedback Control approach to keep the timing performance of a network server close to the service level specification. We show that in an instrumented Apache server, combining feedback control with a queueing model leads to better tracking of QoS specifications than with feedback control alone or queueing model based feed forward control alone.
2001
- Trade-off analysis of real-time control performance and schedulabilityD. Seto, J.P. Lehoczky, L. Sha, and K.G. ShinReal-Time Systems, 2001
Most real-time computer-controlled systems are developed in two separate stages: controller design followed by its digital implementation. Computational tasks that implement the control algorithms are usually scheduled by treating their execution times and periods as unchangeable parameters. Task schedulability therefore depends only on the limited computing resources available. On the other hand, controller design is primarily based on the continuous-time dynamics of the physical system being controlled. The set of tasks resulting from this controller design may not be schedulable with the limited computing resources available. Even if the given set of tasks is schedulable, their overall performance may not be optimal in the sense that they do not make a full use of the computing resources. In this paper, we propose an integrated approach to controller design and task scheduling. Specifically, task frequencies (or periods) are allowed to vary within a certain range as long as such changes do not affect critical control functions such as the maintenance of system stability. We present an algorithm that determines the task frequencies such that a prescribed aspect of system performance is optimized subject to satisfaction of computing resource constraints. The tasks are then scheduled with the chosen frequencies. The proposed approach also addresses the issue of choosing controller processors.
- What are the top ten most influential parallel and distributed processing concepts of the past millenium?M.D. Theys, S. Ali, H.J. Siegel, M. Chandy, K. Hwang, K. Kennedy, L. Sha, K.G. Shin, M. Snir, L. Snyder, and T. SterlingJournal of Parallel and Distributed Computing, 2001
This is a report on a panel titled "What are the top ten most influential parallel and distributed processing concepts of the last millenium?" that was held at the IEEE Computer Society sponsored "14th International Parallel and Distributed Processing Symposium (IPDPS 2000)." The panelists were chosen to represent a variety of perspectives and technical areas. After the panelists had presented their choices for the top ten, an open discussion was held among the audience and panelists. At the end of the discussion, a ballot was distributed for the audience to vote on the top ten concepts (in arbitrary order). The voting identified the following ten most influential parallel and distributed processing concepts of the last millenium: (1) Amdahl’s law and scalability, (2) Arpanet and Internet, (3) pipelining, (4) divide and conquer approach, (5) multiprogramming, (6) synchronization (including semaphores), (7) load balancing, (8) message passing and packet switching, (9) cluster computing, and (10) multithreaded (lightweight) program execution. © 2001 Academic Press.
- Aperiodic servers with resource constraintsM. Caccamo, and L. ShaProceedings - Real-Time Systems Symposium, 2001
Integrating soft and hard activities in a real-time environment has been an active area of research both under fixed priority scheduling and dynamic priority scheduling. Most of the existing work, however, has been done under the assumption that soft real-time tasks and hard real-time tasks are independent. The paper presents an efficient method that allows soft real-time aperiodic tasks and hard real-time tasks to share resources.
- Service class based online QoS management in surveillance radar systemsC.-G. Lee, C.-S. Shih, and L. ShaProceedings - Real-Time Systems Symposium, 2001
Many application level qualities are functions of available computation resources. Recent studies have handled the computation resource allocation problem to maximize the overall application quality. However, such QoS problem is fundamentally a multi-dimensional optimization problem that requires extensive computation. Therefore, online usage of optimization procedures may significantly reduce the computation resource available for applications. This raises the question of how to best use the optimization procedures for dynamic real-time task sets. This paper proposes a method called service classes configuration to address the QoS problem with dynamic arrival and departure of tasks. A simplified radar application is used as an illustrative example.
- Scheduling tasks with variable deadlinesC.-S. Shih, L. Sha, and J. LiuReal-Time Technology and Applications - Proceedings, 2001
Traditional real-time scheduling algorithms typically assume that deadlines of tasks do not change with time. Task deadlines in many real-time command, control and communication applications change over time. Deadlines of some tasks in these applications depend on the states of the objects monitored and controlled by the applications. Because the states of the objects change with time, the deadlines may also change with time. This paper discusses the problem of scheduling tasks with variable deadlines and presents preliminary results and on-going work.
2000
- Elastic feedback controlM. Caccamo, G. Buttazzo, and L. ShaProceedings - Euromicro Conference on Real-Time Systems, 2000
In many real time control applications, the task periods are typically fixed and worst case execution times are used in schedulability analysis. With the advancement of robotics, flexible visual sensing using cameras has become a popular alternative to the use of embedded sensors. Unfortunately, the execution time of visual tracking varies greatly. In such environments, control tasks have a normally short computation time but also an occasional long computation time; therefore, the use of worst case execution time is inefficient for controlling performance optimization. Nevertheless, to maintain the control stability, we still need to guarantee the task set, even if the worst case arises. We propose an integrated approach to control performance optimization and task scheduling for control applications where the execution time of each task can vary greatly. We create an innovative approach to elastic control that allows us to fully utilize the processor to optimize the control performance and yet guarantee the schedulability of all tasks under worst case conditions. © 2000 IEEE.
- Capacity sharing for overrun controlMarco Caccamo, Giorgio Buttazzo, and Lui ShaProceedings - Real-Time Systems Symposium, 2000
In this paper, we present a general scheduling methodology for managing overruns in a real-time environment, where tasks may have different criticality and flexible timing constraints. The proposed method achieves isolation among tasks through a resource reservation mechanism which bounds the effects of task interference, but also performs efficient reclaiming of the unused computation times to relax the utilization constraints imposed by isolation. The enhancements achieved by the proposed approach resulted to be very effective with respect to classical reservation schemes. The performance has been evaluated by implementing the algorithm on a real-time kernel. The runtime overhead introduced by the scheduling mechanism has also been investigated with specific experiments, in order to be taken into account in the schedulability analysis. However, it resulted to be negligible in most practical cases.
- Integration of CORBA services with a dynamic real-time architectureA. Polze, J. Schwarz, K. Wehner, and L. ShaReal-Time Technology and Applications - Proceedings, 2000
The Common Object Request Broker Architecture (CORBA) is the most successful representative for an object based distributed computing architecture. Although CORBA simplifies the implementation of complex, distributed systems significantly, support of techniques for reliable fault-tolerant software, such as online-replacement or replication is not within the scope of today’s CORBA or Real-time CORBA. The Simplex architecture developed at the Software Engineering Institute at Carnegie Mellon University supports the online replacement of software components within a fault-tolerant, real time control application. Simplex middleware consists of both a hard real time publisher/subscriber service and a soft real time service to support human-in-the-loop supervisor control activities. We have replaced Simplex’ soft real time services with CORBA services while retaining its hard real time publication and subscription service. This composite approach allows us to take advantage of both the strength of CORBA in distributed computing and Simplex’ strength in hard real time control applications. The authors discuss the trade-off issues in this composite system approach and present the tele-laboratory experiment, which is based on the extended Simplex system. © 2000 IEEE.
- A distributed real time coordination protocolL. Sha, and D. SetoLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2000
When the communication channels are subject to interruptions such as jamming, the coordination of the real time motions of distributed autonomous vehicles becomes a challenging problem, that differs significantly with fault tolerance communication problems such as reliable broadcast. In this paper, we investigate the issues on the maintenance of the coordination in spite of arbitrarily long interruptions to the communication. © 2000 Springer-Verlag Berlin Heidelberg.
- Online control optimization using load driven schedulingL. Sha, X. Liu, M. Caccamo, and G. ButtazzoIEEE Conference on Decision and Control, 2000
In many real-time control applications, the task periods are typically fixed and worst-case execution times are used in schedulability analysis. With the advancement of robotics, flexible visual sensing using cameras becomes a popular alternative to the use of embedded sensors. Unfortunately, the execution time of visual tracking varies greatly. In this paper, we integrate load driven online scheduling with direct digital designs to optimize control performance as a function of varying workload.
- An introduction to control and scheduling co-designK.-E. Årzén, A. Cervin, J. Eker, and L. ShaIEEE Conference on Decision and Control, 2000
The paper presents the emerging field of integrated control and CPU-time scheduling, where more general scheduling models and methods that better suit the needs of control systems are developed. This creates possibilities for dynamic and flexible integrated control and scheduling frameworks, where the control design methodology takes the availability of computing resources into account during design and allows on-line trade-offs between control performance and computing resource utilization.
1999
- High assurance bio-medical device controlAnnual International Conference of the IEEE Engineering in Medicine and Biology - Proceedings, 1999
Software faults are a major reliability concern in medical devices. They are the result of logical complexity that exceeds developers’ ability to specify, design, develop and verify. An effective way is to use simple and reliable control software to ensure the essential functionality and integrity of medical devices controlled by complex software.
- Flexible bio-medical instrumentationAnnual International Conference of the IEEE Engineering in Medicine and Biology - Proceedings, 1999
The rapid advancement in the biomedical field leads to the need of frequently modifying instrumentation and control software. In applications where the restart of an experiment is costly, the ability to reinstrumentation as needed online was shown to be beneficial. A dynamic real time component binding architecture that permits the flexible addition or replacement of software modules is presented.
- On scheduling tasks in reliable real-time control systemsRamesh Chandra, and Lui ShaProceedings - Real-Time Systems Symposium, 1999
Feedback control is one of the most common applications of real time systems. However, the design of controller frequencies, task scheduling and reliability engineering are often done separately, resulting in suboptimal results. Here, an overview of an integrated approach to reliable real-time controller design by optimizing the system control performance subject to schedulability and software reliability constraints.
- A fault tolerant real-Time publisher/subscriber inter-process communication architectureX. He, and L. ShaProceedings - 6th International Conference on Real-Time Computing Systems and Applications, RTCSA 1999, 1999
Real-Time publication and subscription (P/S) is an important communication paradigm that can be used to support the flexible integration and upgrading of distributed real-Time systems. In this paper, we present a fault-Tolerant extension of a real-Time P/S service in the context of a duplex architecture. We also present the rationale for the key design decisions and show how to improve the response time of the real-Time P/S service with an acceptable impact on control tasks that share the processor with the P/S service. © 1999 IEEE.
1998
- Task period selection and schedulability in real-time systemsDanbing Seto, John P. Lehoczky, and Lui ShaProceedings - Real-Time Systems Symposium, 1998
In many real-time applications, especially those involving computer-controlled systems, the application tasks often have a maximal acceptable latency, and small latency is preferred to large. The interaction between choosing task periods to meet the individual latency requirements and scheduling the resulting task set was investigated in [4] using dynamic priority scheduling methods. In this paper, we present algorithms based on static priority scheduling methods to determine optimal periods for each task in the task set. The solution to the period selection problem optimizes a system-wide performance measure subject to meeting the maximal acceptable latency requirements of each task. This paper also contributes to a new aspect of rate monotonic scheduling, the optimal design of task periods in connection with application related timing specifications and task set schedulability.
- Dependable system upgradeProceedings - Real-Time Systems Symposium, 1998
The rate of innovations in technologies has far exceeded the rate of adopting them in at least the past 20 years. To fully realize the potential of innovations, a paradigm shift is needed, from a focus on enabling technologies for completely new installations to one which is designed to mitigate the risk and cost of bringing new technologies into functioning systems. In this paper, we show that real time control software can be dependably upgrade online via the use of analytically redundant controllers.
- The simplex architecture for safe on-line control system upgradesD. Seto, B. Krogh, L. Sha, and A. ChutinanAmerican Control Conference, 1998
In this paper, we describe the Simplex architecture, a real-time software technology which supports the safe, reliable introduction of control system upgrades while the system is running. We introduce its basic structure in control systems, discuss its fault-tolerance feature, and investigate the control issues when the technology is employed. Application of the Simplex architecture is demonstrated for a plasma-enhanced chemical vapor deposition (PECVD) system, a standard process in semiconductor manufacturing. We conclude the paper with a discussion of the potential impact that the Simplex architecture can make on future control applications. © 1998 AACC.
- Composite objects: Real-time programming with CORBAA. Polze, and L. ShaProceedings - 24th EUROMICRO Conference, EURMIC 1998, 1998
The Common Object Request Broker Architecture is a successful, standardized system integration framework based on distributed object technologies. An ongoing effort concerns extensions to CORBA to incorporate realtime computing needs. Although the use of objects in real-time computing is straightforward, the technical challenge lies in the replacement of static real-time computing infrastructures with a jexible real-time computing infrastructure, in which distributed real-time client and :sewer objects can be created and connected as needed during runtime. We propose the concept of "Composite Objects" for the integration of real-time and non-real-time computing into a single object-based framework. Within the papel; we present a methodology for creating objects which interface to CORBA without violating real-time assumptions. We present a example scenario which integrates a real-time computing architecture and CORBA components via "Composite Objects ". Finally, we discuss implementation-related issues based on our experiences with "Composite Objects". © 1998 IEEE.
1997
- Analysis of dual-link networks for real-time applicationsL. Sha, S.S. Sathaye, and J.K. StrosniderIEEE Transactions on Computers, 1997
Next-generation networks are expected to support a wide variety of services. Some services such as video, voice, and plant control traffic have explicit timing requirements on a per-message basis rather than on the average. In this paper, we develop a general model of dual-link networks to support real-time communication. We examine the desirable properties of this network and the difficulties in achieving these properties. We then introduce the concept of coherence and develop a theory of coherent dual-link networks. We show that a coherent dual-link network can be analyzed as though it is a centralized system. We then discuss practical considerations in implementing a dual-link network, and implications of this work to address problems observed in the IEEE 802.6 metropolitan area network standard. ©1997 IEEE.
1996
- On task schedulability in real-time control systemsDanbing Seto, John P. Lehoczky, Lui Sha, and Kang G. ShinProceedings - Real-Time Systems Symposium, 1996
Most real-time computer-controlled systems are built in two separate steps, each in isolation: controller design and its digital implementation. Computational tasks that realize the control algorithms are usually scheduled by treating their execution times and periods as unchangeable parameters. Task scheduling therefore depends only on the limited computing resources available. On the other hand, controller design is primarily based on the continuous-time dynamics of the physical system being controlled. The set of tasks resulting from this controller design may not be schedulable with the limited computing resources available. Even if the given set of tasks is schedulable, the overall control performance may not be optimal in the sense that they do not make a full use of the computing resource. In this paper, we propose an integrated approach to controller design and task scheduling. Specifically, task frequencies (or periods) are allowed to vary within a certain range as long as such a change doesn’t affect critical control functions such as maintenance of system stability. We present an algorithm that optimizes task frequencies and then schedules the resulting tasks with the limited computing resources available. The proposed approach is also applicable to failure recovery and reconfiguration in real-time control systems.
- Designing for evolvability: Building blocks for evolvable real-time systemsM. Gagliardi, R. Rajkumar, and L. ShaReal-Time Technology and Applications - Proceedings, 1996
Fielded real-time systems including many defense systems, manufacturing plants and commercial aircraft avionics typically have long lifetimes ranging from a few years to even a few decades. Available technologies, system needs and customer goals change over this lifetime, and changes to a deployed system become very desirable. We argue that such evolution must and can be supported with new system abstractions, and that real-time systems designed with these abstractions can be evolved and incrementally tested. We present two possible run-time abstractions which can act as basic building blocks to construct «evolvable real-time systems». These building blocks can be used to evolve deployed systems in general and real-time systems in particular. First, the replaceable unit abstraction alloys an existing software module to be replaced online by another module with similar or enhanced functionality. Such replacement is transparent to the rest of the system. Secondly, the «cell» abstraction represents a protected module which cannot be harmed by other modules. Based on this notion is an «extensible cell», which allows a deployed module to be extended functionally without the fear of hurting its (fully certified) functionality even when the extensions can fail in unexpected ways. These two abstractions have been implemented in a real-time POSIX testbed used in the Simplex architecture and our findings are reported. Both abstractions are built on the Real-Time Publisher/Subscriber communication model with modifications necessitated by safe evolutionary requirements. We conclude that guaranteed enforcement of the semantics of these two building blocks can only be provided using operating system enforced resource reservation and communication rights. © 1996 IEEE.
- Evolving dependable real-time systemsLui Sha, Ragunathan Rajkumar, and Michael GagliardiIEEE Aerospace Applications Conference Proceedings, 1996
The use of commercial off the shelf (COTS) components in the development of dependable real-time systems, while effective in keeping systems affordable, also introduces vendor-driven upgrade problem. The dependable real-time computing community therefore needs to develop approaches that can introduce new hardware and software components into deployed systems safely, reliably and easily, in spite of the inevitable bugs in some of the new COTS components. This paper gives an informal review of the Simplex Architecture, which has been developed to support safe and reliable online upgrade of dependable computing systems.
1995
- The Deferrable Server Algorithm for Enhanced Aperiodic Responsiveness in Hard Real-Time EnvironmentsJ.K. Strosnider, J.P. Lehoczky, and L. ShaIEEE Transactions on Computers, 1995
Most existing scheduling algorithms for hard realtime systems apply either to periodic tasks or aperiodic tasks but not to both. In practice, real-time systems require an integrated, consistent approach to scheduling that is able to simultaneously meet the timing requirements of hard deadline periodic tasks, hard deadline aperiodic (alert-class) tasks, and soft deadline aperiodic tasks. This paper introduces the Deferrable Server (DS) algorithm which will be shown to provide improved aperiodic response time performance over traditional background and polling approaches. Taking advantage of the fact that, typically, there is no benefit in early completion of the periodic tasks, the Deferrable Server (DS) algorithm assigns higher priority to the aperiodic tasks up until the point where the periodic tasks would start to miss their deadlines. Guaranteed alert-class aperiodic service and greatly reduced response times for soft deadline aperiodic tasks are important features of the DS algorithm, and both are obtained with the hard deadlines of the periodic tasks still being guaranteed. The results of a simulation study performed to evaluate the response time performance of the new algorithm against traditional background and polling approaches are presented. In all cases, the response times of aperiodic tasks are significantly reduced (often by an order of magnitude) while still maintaining guaranteed periodic task deadlines. © 1995 IEEE
- Real-time publisher/subscriber inter-process communication model for distributed real-time systems: design and implementationRagunathan Rajkumar, Mike Gagliardi, and Lui ShaReal-Time Technology and Applications - Proceedings, 1995
Distributed real-time systems are becoming more pervasive in many domains including process control, discrete manufacturing, defense systems, air traffic control, and on-line monitoring systems in medicine. The construction of such systems, however, is impeded by the lack of simple yet powerful programming models and the lack of efficient, scalable, dependable and analyzable interfaces and their implementations. We argue that these issues need to be resolved with powerful application-level toolkits similar to that provided by ISIS [2]. In this paper, we consider the inter-process communication requirements which form a fundamental block in the construction of distributed real-time systems. We propose the real-time publisher/subscriber model, a variation of group-based programming and anonymous communication techniques, as a model for distributed real-time inter-process communication which can address issues of programming ease, portability, scalability and analyzability. The model has been used successfully in building a software architecture for building upgradable real-time systems. We provide the programming interface, a detailed design and implementation details of this model along with some preliminary performance benchmarks. The results are encouraging in that the goals we seek look achievable.
1994
- Generalized Rate-Monotonic Scheduling Theory: A Framework for Developing Real Time SystemsL. Sha, R. Rajkumar, and S.S. SathayeIEEE, 1994
Real-time computing systems are used to control telecommunication systems, defense systems, avionics, and modern factories. Generalized rate-monotonic scheduling theory is a recent development that has had large impact on the development of real time systems and open standards. In this paper we provide an up-to-date and self-contained review of generalized rate-monotonic scheduling theory. We show how this theory can be applied in practical system development, where special attention must be given to facilitate concurrent development by geographically distributed programming teams and the reuse of existing hardware and software components. © 1994 IEEE
1993
- Control reconfiguration in the presence of software failuresM. Bodson, J. Lehoczky, R. Rajkumar, L. Sha, D. Soh, M. Smith, and J. StephanIEEE Conference on Decision and Control, 1993
In this paper, we discuss a special approach for software fault tolerance in control applications. A full-function, high-performance, but complex control system is complemented by an error-free implementation of a highly reliable control system of lower functionality. When the correctness of the high-performance controller is in doubt, the reliable control system takes over the execution of the task. An innovative feature of the approach is the disparity between the two control systems, which is used to exploit the relative advantages of the simple/reliable vs. complex/ high-performance systems. Another innovative feature is the fault detection mechanism, which is based on measures of performance and of safety of the control system. The example of a ball and beam system is used to illustrate the concepts, and experimental results obtained on a laboratory set-up are presented.
- Real-time issues in the design of the Data Management System for the Space Station FreedomS. Davari, Jr. Leibfried, S. Natarajan, D. Pruett, L. Sha, and W. ZhaoProceedings - IEEE Workshop on Real-Time Applications, RTA 1993, 1993
Space Station Freedom (SSF) faces several challenges which are unique in the NASA history of manned and unmanned spacecraft systems. The challenges include a 30 year life, and evolution in capability of the station as a science platform and potentially a transportation node for manned missions to the solar system. The Data Management System (DMS) is a key element of SSF. DMS is the computational infrastructure of SSF, responsible for integrating information onboard into a cooperative whole. Its primary role is to provide integrated data processing and communication services for both the core function and the payloads. The authors focus on some real-time issues in the design of DMS. © 1993 IEEE.
- A Systematic Approach to Designing Distributed Real-Time SystemsL. Sha, and S.S. SathayeComputer, 1993
Real-time computing systems are critical to the technological infrastructure ture of an industrialized nation. Modern telecommunication systems, factories, defense systems, aircraft and airports, spacecraft, and high-energy physics experiments cannot operate without them. Indeed, real-time computing systems control the very systems that keep us productive, safeguard our liberty, and enable us to explore new frontiers of science and engineering. In real-time applications, a computation’s correctness depends not only on its results but also on when its outputs are generated. The measures of merit in a real-time system include. © 1993 IEEE
1992
- Scheduling real-time communication on dual-link networksL. Sha, S.S. Sathaye, and J.K. StrosniderProceedings - Real-Time Systems Symposium, 1992
Next-generation networks are expected to support a wide variety of services. Some services such as video, voice, and plant control traffic have explicit timing requirements on a per-message basis rather than on the average. In this paper we develop a general model of reservation-based dual-link networks to support realtime communication. We examine the desirable properties of this network and the difficulties in achieving these properties. We then introduce the concept of coherence and develop a theory of coherent dual-link networks. Finally, we show that a coherent dual-link network can be analyzed as though it is a centralized system. © 1992 IEEE.
1991
- Real-time scheduling support in FuturebusLui Sha, Ragunathan Rajkumar, and John LehoczkyProceedings - Real-Time Systems Symposium, 1991
A simple but efficient architecture for building multiprocessors is to connect several processors to a common backplane bus. The backplane acts as a shared resource in this architecture and contention for its use by different bus modules must be resolved. In a real-time system, this backplane must also provide scheduling support such that the timing behavior of the resulting system is analyzable. In addition, the support primitives for real-time scheduling on a backplane bus must also be constrained by the economic considerations associated with a bus standard that is intended to support both time sharing and real-time applications. The authors review the design considerations to support real-time systems in the IEEE Futurebus+ backplane specification and describe how this backplane can be used to satisfy timing constraints in priority-driven real-time systems.
- Maintaining global time in futurebus+R.A. Volz, L. Sha, and D. WilcoxReal-Time Systems, 1991
IEEE 896 Futurebus+ is a major open standard that is supported by the VME International Trade Association, the MultiBus Manufacturer’s Group and the US Navy. Futurebus+ supports, among other things, the notion that each processor can have its own clock. In this paper, we present design considerations of the Futurebus+ clocks and their synchronization. © 1991 Kluwer Academic Publishers.
- A Real-Time Locking ProtocolL. Sha, R. Rajkumar, S.H. Son, and C.-H. ChangIEEE Transactions on Computers, 1991
When a database system is used in a real-time application, the concurrency control protocol must not only satisfy the consistency of shared data but also observe the priorities of the transactions. This is because priority scheduling is a commonly used method. In this paper, we examine a prioritydriven two-phase lock protocol called the read/write priority ceiling protocol. We show that this protocol leads to freedom from mutual deadlock. In addition, a high-priority transaction can be blocked by lower priority transactions for at most the duration of a single embedded transaction. These properties can be used by schedulability analysis to guarantee that a set of periodic transactions using this protocol can always meet their deadlines. Finally, we examine the performance of this protocol for randomly arriving transactions using simulation studies. © 1991 IEEE
1990
- Real-time scheduling support in Futurebus+L. Sha, R. Rajkumar, and J. LehoczkyProceedings - Real-Time Systems Symposium, 1990
As real-time applications become more demanding, multiprocessors are being called upon to meet their increasingly stringent requirements. A simple but efficient architecture for building multiprocessors is to connect several processors to a common backplane bus. The backplane acts as a shared resource in this architecture and contention for its use by different bus modules must be resolved. In a real-time system, this backplane must also provide scheduling support such that the timing behavior of the resulting system is analyzable. In addition, the support primitives for real-time scheduling on a backplane bus must also be constrained by the economic considerations associated with a bus standard that is intended to support both time sharing and real-time applications. In this paper, we review the design considerations to support realtime systems in the IEEE Futurebus+ backplane specification and describe how this backplane can be used to satisfy timing constraints in priority-driven realtime systems. © 1990 IEEE.
- Priority Inheritance Protocols: An Approach to Real-Time SynchronizationL. Sha, R. Rajkumar, and J.P. LehoczkyIEEE Transactions on Computers, 1990
A direct application of commonly used synchronization primitives such as semaphores, monitors, or the Ada rendezvous can lead to uncontrolled priority inversion, a situation in which a higher priority job is blocked by lower priority jobs for an indefinite period of time. In this paper, we investigate two protocols belonging to the class of priority inheritance protocols, called the basic priority inheritance protocol and the priority ceiling protocol. We show that both protocols solve this uncontrolled priority inversion problem. In particular, the priority ceiling protocol reduces the worst case task blocking time to at most the duration of execution of a single critical section of a lower priority task. In addition, this protocol prevents the formation of deadlocks. We also derive a set of sufficient conditions under which a set of periodic tasks using this protocol is schedulable. © 1990 IEEE
1989
- Mode change protocols for priority-driven preemptive schedulingL. Sha, R. Rajkumar, J. Lehoczky, and K. RamamrithamReal-Time Systems, 1989
In many real-time applications, the set of tasks in the system, as well as the characteristics of the tasks, change during system execution. Specifically, the system moves from one mode of execution to another as its mission progresses. A mode change is characterized by the deletion of some tasks, addition of new tasks, or changes in the parameters of certain tasks, for example, increasing the sampling rate to obtain a more accurate result. This paper discusses how mode changes can be accommodated within a given framework of priority driven real-time scheduling. © 1989 Kluwer Academic Publishers.
- Rate monotonic scheduling algorithm: Exact characterization and average case behaviorJohn Lehoczky, Lui Sha, and Ye DingProceedings - Real-Time Systems Symposium, 1989
An exact characterization of the ability of the rate monotonic scheduling algorithm to meet the deadlines of a periodic task set is represented. In addition, a stochastic analysis which gives the probability distribution of the breakdown utilization of randomly generated task sets is presented. It is shown that as the task set size increases, the task computation times become of little importance, and the breakdown utilization converges to a constant determined by the task periods. For uniformly distributed tasks, a breakdown utilization of 88% is a reasonable characterization. A case is shown in which the average-case breakdown utilization reaches the worst-case lower bound of C. L. Liu and J. W. Layland (1973).
- Aperiodic task scheduling for Hard-Real-Time systemsB. Sprunt, L. Sha, and J. LehoczkyReal-Time Systems: The International Journal of Time-Critical Computing Systems, 1989
A real-time system consists of both aperiodic and periodic tasks. Periodic tasks have regular arrival times and hard deadlines. Aperiodic tasks have irregular arrival times and either soft or hard deadlines. In this article, we present a new algorithm, the Sporadic Server algorithm, which greatly improves response times for soft deadline aperiodic tasks and can guarantee hard deadlines for both periodic and aperiodic tasks. The operation of the Sporadic Server algorithm, its performance, and schedulability analysis are discussed and compared with previously published aperiodic service algorithms. © 1989, Kluwer Academic Publishers. All rights reserved.
1988
- Exploiting unused periodic time for aperiodic service using the extended priority exchange algorithmBrinkley Sprunt, John Lehoczky, and Lui Sha1988
Real-time scheduling algorithms that provide responsive aperiodic service in the presence of hard real-time periodic tasks require the creation of a high-priority periodic server task for servicing aperiodic requests. The authors describe the extended priority exchange algorithm, which can provide better aperiodic response than previous aperiodic service algorithms, particularly for cases where the worst-case periodic load is high and little or no utilization is left for a server task. The extended-priority-exchange (EPE) algorithm attains better aperiodic responsiveness by exploiting unused time allocated to periodic tasks for aperiodic service. The average aperiodic response times for the EPE algorithm and four other aperiodic service algorithms (background, polling, deferrable server, and priority exchange) are compared for a range of periodic and aperiodic loads. Simulation results show that for a difference between the average and worst-case periodic load of only 12.5%, the EPE algorithm provides significantly better response times for aperiodic tasks.
- Real-time synchronization protocols for multiprocessorsRagunathan Rajkumar, Lui Sha, and John P. Lehoczky1988
The authors investigate the synchronization problem in the context of priority-driven preemptive scheduling on shared-memory multiprocessors. Unfortunately, a direct application of synchronization mechanisms such as the Ada rendezvous, semaphores, or monitors can lead to uncontrolled priority inversion: a high priority job being blocked by a lower priority job for an indefinite period of time. A task allocation scheme based on the generalized protocol is outlined.
- Priority inversion and its control: An experimental investigationD. Locke, L. Sha, R. Rajkurnar, J. Lehoczky, and G. Burns2nd International Workshop on Real-Time Ada Issues, IRTAW 1988, 1988
To successfully engineer a large scale real-time system project, we need a disciplined approach to both the logical complexity and the timing complexity inherent in these systems. The logical complexity is managed by the software engineering methodology embodied in Ada while the timing complexity is supported by formal real-time scheduling algorithms [3,4,5,6]. From a software engineering point of view, formal scheduling algorithms translate the complex timing constraints into simple resource utilization constraints. As long as the utilization constraints of CPU, I/O channel and communication media are observed, the deadlines of periodic tasks and the response time requirements of aperiodic tasks will both be met. There is considerable freedom to modify software provided that the resource utilization remains within specified bounds. Furthermore, should there be a transient overload, the number of tasks missing their deadlines will be a function of the magnitude of the overload and the order in which the tasks miss deadlines is pre-defined [7]. From an implementation point of view, these algorithms belong to the class of static priority algorithms which can be implemented efficiently. Task priorities are functions of the timing constraints, computation requirements and relative importance of tasks. The priorities need not be computed at run-time unless task parameters are modified. © 1988 ACM.
- Concurrency Control for Distributed Real-Time DatabasesL. Sha, R. Rajkumar, and J.P. LehooczkyACM SIGMOD Record, 1988
The concurrency control of transactions in a real-time database must satisfy not only the consistency constraints of the database but also the timing constraints of individual transactions. In this paper, we present a real-time concurrency control protocol that can be used in a distributed and decomposable real-time database. The protocol is based on the integration of a modular concurrency control theory with a real-time scheduling protocol called the priority ceiling protocol. This protocol supports the replication of data objects and avoids the formation of deadlocks. Finally, an analysis of the performance of this protocol is presented. © 1988, ACM. All rights reserved.
- Modular Concurrency Control and Failure RecoveryL. Sha, J.P. Lehoczky, and E.D. JensenIEEE Transactions on Computers, 1988
This paper presents an approach to concurrency control based on the decomposition of both the database and the individual transactions. This approach is a generalization of serializability theory in that the set of permissible transaction schedules contains all the serializable schedules. In addition to providing a higher degree of concurrency than that provided by serializability theory, this approach retains three important properties associated with serializability: the consistency of the database is preserved, the individual transactions are executed correctly, and the concurrency control approach is modular, a concept formalized in this paper. The associated failure recovery procedure is also presented as is the concept of failure safety. © 1988 IEEE
- PRIORITY-DRIVEN, PREEMPTIVE I/O CONTROLLERS FOR REAL-TIME SYSTEMS.Brinkley Sprunt, David Kirk, and Lui Sha1988
The effect of three I/O controller architectures on schedulable utilization, which is the highest attainable resource utilization at or below which all deadlines can be guaranteed, is examined. FIFO (first-in-first-out) request queuing, priority queuing, and priority queuing with preemptable service are simulated for a range of CPU computation to I/O traffic ratios. The results show that, for I/O-bound task sets and zero preemption costs, priority queuing with preemptable service can provide a level of schedulable utilization 35% higher than that attainable with FIFO queuing, and 20% higher than priority queuing and nonpreemptable service. Although the potential gain for priority queuing with preemptable service is large, further simulations that incorporate a time penalty for each preemption show that the gain is very sensitive to preemption cost. With preemption cost represented as a ratio of preemption time to the minimum-task period, the level of schedulable utilization for priority queuing with preemptable service degrades to that of priority queuing with nonpre-emptable service, for a preemption cost ratio of 0. 04. A high-level design of a preemptable I/O controller is described and the issues determining preemption cost are detailed, along with techniques for its minimization.
1987
- TASK SCHEDULING IN DISTRIBUTED REAL-TIME SYSTEMS.Lui Sha, John P. Lehoczky, and Ragunathan Rajkumar1987
A review of some practical problems associated with the use of static priority scheduling is given. An approach to stabilize the rate-monotonic algorithm in the presence of transient processor overloads is presented. Also presented is a class of algorithms to handle aperiodic tasks which improve response times while guaranteeing the deadlines of periodic tasks. The problem of integrated processor and data I/O scheduling is studied as well as the problem of scheduling of messages over a bus with insufficient priority levels and multiple buffers.
- ON COUNTERING THE EFFECTS OF CYCLE-STEALING IN A HARD REAL-TIME ENVIRONMENT.Ragunathan Rajkumar, Lui Sha, and John P. Lehoczky1987
The authors investigate the problem of cycle-stealing and its impact on the scheduling of tasks in processors with data input/output. They identify the parameters and policies that govern cycle-stealing and analyze its effects on real-time systems with hard deadlines. They study different policies to resolve memory across conflicts between the processor and I/O devices and compare their effectiveness in improving task schedulability. They investigate various memory interleaving schemes and the level of interleaving required to adequately counter the effects of cycle-stealing.
- ENHANCED APERIODIC RESPONSIVENESS IN HARD REAL-TIME ENVIRONMENTS.John P. Lehoczky, Lui Sha, and Jay K. Strosnider1987
Current systems either relegate aperiodic tasks to background status when aperiodic response time is not critical or use polling when it is. The authors develop a novel approach to this scheduling problem, which they characterize as bandwidth-preserving algorithms. The goal is to develop algorithms with superior performance that also have predictable behavior and can be used by a real-time-system developer in constructing and integrating systems. The authors develop a full analysis of the worst-case behavior and stability of this family of algorithms.
- Task Scheduling in Distributed Real-Time SystemsProceedings of SPIE - The International Society for Optical Engineering, 1987
In this paper, we give a comprehensive review of a number of practical problems associated with the use of static priority scheduling. We first present a new approach to stabilize the rate-monotonic algorithm in the presence of transient processor overloads. We also present a new class of algorithms to handle aperiodic tasks which improve the response times to aperiodic tasks while guaranteeing the deadlines of periodic tasks. We then study the problem of integrated processor and data I/O scheduling. Finally we review the problem of scheduling of messages over a bus with insufficient priority levels but with multiple buffers. © 1987 SPIE.
- Limitations of Ada® for real-time schedulingD. Cornhill, L. Sha, J.P. Lehoczky, R. Rajkumar, and H. Tokuda1st International Workshop on Real-Time Ada Issues, IRTAW 1987, 1987
The goal in real-time scheduling is to satisfy the timing requirements of application jobs which often have hard deadlines. There are two aspects to Ada’s scheduling policies which are detrimental to achieving this goal. First, Ada’s constraints on the language’s implementation limit the definition of priority and the task scheduling algorithm to preclude the use of the best algorithms for scheduling jobs with hard deadlines. Second, information about task priority is not used when selecting a task from an entry queue or when choosing among branches of a selective wait statement. Instead, FIFO and arbitrary disciplines are used, respectively, which can unnecessarily lead to missed deadlines, even for very low levels of processor utilization. We suggest some areas for change to make the language more suitable for building real-time systems. © 1987 ACM.
1986
- SOLUTIONS FOR SOME PRACTICAL PROBLEMS IN PRIORITIZED PREEMPTIVE SCHEDULING.Lui Sha, John P. Lehoczky, and Ragunathan Rajkumar1986
A comprehensive treatment of a number of practical problems associated with the use of the rate monotonic algorithm is presented. These include the scheduling of messages over a bus with insufficient priority levels but with multiple buffers. Methods to handle aperiodic tasks are reviewed and a new approach to stabilize the rate monotonic algorithm in the presence of transit processor overloads is presented. The problem of integrated processor and data I/O scheduling is addressed. The results clearly establish the importance of using an integrated approach to schedule both the processor and data I/O activities.
1983
- Distributed co-operating processes and transactionsL. Sha, E.D. Jensen, R.F. Rashid, and J.D. NorthcultSymposium on Communications Architectures and Protocols, SIGCOMM 1983, 1983
As part of our research in the Archons [Jensen 82] project on decentralized computers, we have developed a relational model of data consistency to replace the conventional serialization model for reasoning about the relationships among distributed system data objects in general and state variables in particular, We not only permit but encourage such relationships to be probabilistic, in the interest of efficiency. This model leads to a new formulation of co-operating processes, and thence to the notion ol cooperating transactions: co-operating processes whose actions are made atomic for the sake of reliability. We believe that cooperating processes are valuable in a computer network, but essential in a decentralized computer [Jensen 82] where the conceptually singular but physically dispersed global operating system requires a transaction facility in the kernel [Jensen 80], These ideas are illustrated by examples from our initial experience in applying the model to the Accent network operating system and other system software of the Spice personal computing network. This document is intended to be an overview of the synchronization effort in the Archons project, and future publications will elaborate on many of the individual points touched on here. © 1983 ACM.