COSI 228 Schedule

 




Date

Lecture Description

Presenter

28Aug

Cloud Computing overview and course logistics

Olga

1 Sept

Introduction to database  & distributed systems I

Olga

4 Sept

Introduction to database  & distributed systems II

Olga

8 Sept

Introduction to database  & distributed systems III

Olga

11 Sept

The Amazon Cloud

Cloud Architectures, J. Varia, Amazon White Paper, 2008
An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS, S. L. Garfinkel, Technical Report TR-08-07, Harvard University, 2007
Attend NEDS meeting at 4pm, Volen 101 (reception at 3pm)
No reviews due.

Charu

15 Sept

The Google and Yahoo Clouds

Web Search for a Planet: The Google Cluster Architecture, L. A. Barroso,  J. Dean, U. Holzle, IEEE Micro, Volume 23, 2003
Building a Cloud for Yahoo! B. F. Cooper et al,  IEEE Data Engineering Bulletin Special Issue on Data Management in Cloud Computing Platforms, 2009
No reviews due.

Sujith

18 Sept

Parallel programming I (MapReduce)

MapReduce: Simplified Data Processing on Large Clusters, J. Dean, S. Ghemawat, OSDI, 2004 (Also the ACM Communications 2008 version.)

Hui

22 Sept

Parallel programming II (MapReduce) (

Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters, H. Yang, A. Dasdan, R. Hsiao,  D. Parker, SIGMOD 2007.
Related: A Comparison of Approaches to Large-Scale Data Analysis, A. Pavlo, E. Paulson, A. Rasin, D. Abadi, D. DeWitt, S. Madden, M.Stonebraker (not for review)

Yuanzhe, Qing

25 Sept

Parallel programming III

HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads, A. Abouzeid, K Bajda-Pawlikowski, D. Abadi, A. Rasin, A.Silberschatz
Project proposal due

Josh

29 Sept

No class (Brandeis Monday)

 

2 Oct

Data analysis I

Interpreting the Data: Parallel Analysis with Sawzall, Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan, Scientific Programming Journal, 2005

Sowmya,
Roland

6 Oct

Data analysis II

Pig Latin: A Not-So-Foreign Language for Data Processing. C. Olston, B. Reed, U. Srivastava, R. Kumar and A. Tomkins., SIGMOD 2008
Related: Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience, A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C. Olston, B. Reed, S. Srinivasan, U. Srivastava, VLDB 2009 (not for review)

Emma,
Diane

9 Oct

Parallel processing I

SCOPE - Easy and Efficient Parallel Processing of Massive Data Sets. R, Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, J. Zhou,  VLDB 2008.

Sowmya

13 Oct

Parallel processing II

Adaptively Parallelizing Distributed Range Queries, Y. Vigfusson, A. Silberstein, B. Cooper, R. Fonseca, VLDB 2009

Yuanzhe

16 Oct

Parallel processing III

Parallel Evaluation of Composite Aggregate Queries, Lei Chen, Christopher Olston, and Raghu Ramakrishnan, ICDE 2008.

Joel

20 Oct

Scalable data storage I

The Google File System, Ghemawat et al. SOSP 2003

Sujith

23 Oct

Scalable data storage II

Dynamo: Amazon's Highly Available Key-value Store, G. DeCandia, D. Hastorun, M.Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, W. Vogels, SOSP, 2007

David

27 Oct

Data Storage III

Bigtable: A Distributed Storage System for Structured Data. Chang et al. OSDI 2006.

Joel

30 Oct

Databases in the Cloud I

Building a Database on S3. M. Brantner, D. Florescu, D. Graf, D. Kossmann, T. Kraska, SIGMOD 2008
Mid-term project report due

Hui

3 Nov

Databases in the Cloud II

Consistency Rationing in the Cloud: Pay only when it matters,  T. Kraska, M. Hentschel, G.Alonso, D.Kossmann,  VLDB 2009

Roland

6 Nov

Database as a Service I

PNUTS: Yahoo!'s Hosted Data Serving Platform. B. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H. Jacobsen, N. Puz, D. Weaver and R. Yerneni. VLDB 2008

Diane

10 Nov

Database as a Service II

Multi-Tenant Databases for Software as a Service: Schema-Mapping Techniques , S. Aulbach, T. Grust, D. Jacobs, A. Kemper, J. Rittinger. SIGMOD 2008

Dihan

13 Nov

Large Scale Data Analysis

Asynchronous View Maintenance for VLSD Databases. Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and Raghu Ramakrishnan. SIGMOD 2009

Emma

17 Nov

Parallel Databases 

Parallel Database Systems: The Future of High Performance Database Systems. David J. DeWitt and Jim Gray., Communications of the ACM 35(6), 1992.

Qing,

20 Nov

Parallel programming IV

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks, M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, EuroSys, 2007.

Computation and data management in the cloud

Charu,
Josh

 

24 Nov

Clustera: An Integrated Computation and Data Management System, David J. DeWitt, Eric Robinson, Srinath Shankar, Erik Paulson, Jeffrey Naughton, Andrew Krioukov, and Joshua Royalty. VLDB 2008

Dihan , David

1 Dec

Project Presentations

 

8 Dec

Final project report due