My primary area of research interest is distributed systems and data management.
Recent Projects
-
Interaction History Management: In this project, we intend to determine design strategies for interaction management systems: systems that capture and manage sequences of user interactions (interaction histories) that determine the behaviors of an interactive system. This work is supported an NSF award. -
Performance Modeling and Prediction for Concurrent Database Workloads: We study the problems of performance prediction and workload characterization that arise within the context of concurrent DBMS workloads.
-
Resource Management for Cloud Data Services : We study the problems of minimizing the operational cost of running a data services (e.g., DBMS, stream processing, data flows) on IaaS cloud infrastructures. We propose resource provisioning and SLA management techniques that identify a collection of minimum-cost cloud resources that can collectively satisfy a predicted time-varying database workload within target QoS expectations. -
Engineering Query Optimizers: Recent work has shown that the design of a query optimizer can benefit from the consideration of properties that distinguish the underlying data management system (e.g., distributed shared-nothing architecture rather than centralized, flash memory storage rather than disks, and column-based data layout rather than row-based). In this project we explore the design, development and evaluation of a development environment that facilitates the engineering of system-specific optimizer designs by supporting the rapid prototyping, evaluation and refinement of query optimizer components.
Past Projects
-
XPORT: XPORT is a general-purpose infrastructure that provides the core functionalities of large-scale stream processing and dissemination applications. It can be extended to support diverse processing logic, stream types, and performance targets and, given these specifications, it automatically creates and optimizes a data stream acquisition, processing and overlay network. Its optimization is driven by metric-independent operations, which refine the structure of the overlay network as well as efficiently distribute processing across the network.
-
Sharing-aware in-network stream processing: In shared processing environments, run-time reconfigurations of the existing query deployment must be well-coordinated in order to satisfy strict, and potentially conflicting, query QoS expectations. We designed a proactive approach, where nodes maintain and propagate metadata regarding alternative deployments of the registered queries. Whenever dynamic changes cause QoS violations, nodes validate the metadata and make fast operator placement decisions that can resolve any existing violations.
-
SemCast: SemCast investigates efficient content-based data filtering and dissemination over conventional multicast channels. SemCast splits input data streams into multiple pieces and spreads the pieces across multiple multicast channels for delivery. This approach eliminates the need for content-based filtering and routing at interior nodes of the overlay.
-
Pulse: Pulse is a framework for processing continuous queries over continuous-time data models. Pulse translates regular queries to work on continuous-time inputs, to reduce overhead and latency while meeting user-specified error bounds on query results.
-
Borealis: Borealis is a distributed stream processing engine developed by Brandeis University, Brown University, and MIT. It deploys a network of cooperating Borealis stream engines, distributes query processing across multiple machines, and maintains integrity and correct operation as the network is dynamically mutated.