Business Intelligence and Data Warehousing

The course covers the foundations of Business Intelligence (Data Warehousing) from architectural, algorithmic and practical perspective. In addition it covers new trends such as Data Warehouse Appliances, Cloud Analytics, Column Stores and modern storage technologies. The course replaces its predecessor "Data Warehouses", expands on its content and adds a practical exercise.

Lecture contents: Motivation, BI Architectures, BI Modeling, Star Schema, Multi-Dimensional Models, Special DW/BI Operators, Optimization of DWs: partitioning, aggregates, histograms and other query optimization techniques, Special index methods, Smart implementation of operators, Back room operations, ETL processes: data extraction, data cleansing, data loading, Column-Oriented Databases in Business Intelligence, Data Warehousing Appliances, Cloud Data Analytics

Exercise contents: Loading Data (ETL), Indexing Data, Working with Multi-Dimensional Data (CUBE, Special Query Languages), Visualization, Finding Interesting Data Faster (Histograms, Skyline), Using the Cloud for Map-Reduce, Storing Data in Columns Instead of Rows, In-Memory Solutions

News

Exam dates for winter semester 2015/2016 cab be found on our main page.

The post-exam review for BI/DW 2015 will take place on Tuesday, 10.11. 10:00-12:00 in room E202.

The lecure of Summer 2015 starts one week later on 21. April.

Slides

Final lecture slides of summer 2015 (PDF, 3.8MB)
Available on the university network only.

Slides of the Column Store Tutorial (Harizopoulos, Abadi, and Boncz 2009) (PDF, 2.1MB)

Exercises

The exercise material will be published via Moodle. Due to the practical nature of the exercises most of them require a computer to be worked on. As we are going to start working on them in the classroom, please bring your laptop to the exercise session.

Papers

A list of papers discussed throughout the course. (Continously updated!)

  1. Goetz Graefe. A survey of b-tree locking techniques. ACM Transactions on Databases Systems, 35(5):16:1-16:26, July 2010. http://dl.acm.org/citation.cfm?id=1806908
  2. Ming-Chuan Wu and Alejandro P. Buchmann. 1998. Encoded Bitmap Indexing for Data Warehouses. In Proceedings of the Fourteenth International Conference on Data Engineering (ICDE '98). IEEE Computer Society, Washington, DC, USA, 220-230. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=655780
  3. Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. 1996. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. In Proceedings of the Twelfth International Conference on Data Engineering (ICDE '96), Stanley Y. W. Su (Ed.). IEEE Computer Society, Washington, DC, USA, 152-159. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=492099
  4. Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, Stanley B. Zdonik. 2005. C-store: A column-oriented DBMS. In Proceedings of the 31st international conference on Very large data bases (VLDB '05). VLDB Endowment 553-564. http://www.vldb2005.org/program/paper/thu/p553-stonebraker.pdf
  5. Yannis Sismanis, Antonios Deligiannakis, Nick Roussopoulos, and Yannis Kotidis. 2002. Dwarf: shrinking the PetaCube. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data (SIGMOD '02). ACM, New York, NY, USA, 464-475. http://dl.acm.org/citation.cfm?id=564745
  6. Viswanath Poosala, Venkatesh Ganti, Yannis E. Ioannidis. Approximate Query Answering using Histograms. IEEE Data Eng. Bull. 22(4): 5-14 (1999) http://sites.computer.org/debull/99dec/poosala.ps
  7. Yannis Ioannidis. 2003. The history of histograms (abridged). In Proceedings of the 29th international conference on Very large data bases - Volume 29 (VLDB '2003) http://www.vldb.org/conf/2003/papers/S02P01.pdf
  8. Stephan Borzsonyi, Donald Kossmann, and Konrad Stocker. 2001. The Skyline Operator. In Proceedings of the 17th International Conference on Data Engineering. IEEE Computer Society, Washington, DC, USA, 421-430. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=914855
    (you may also coose to consider the following additional references: "Progressive skyline computation in database systems" and "R-trees: a dynamic index structure for spatial searching")
  9. Juchang Lee, Michael Muehle, Norman May, Franz Faerber, Vishal Sikka, Hasso Plattner, Jens Krueger, and Martin Grund. 2013. High-Performance Transaction Processing in SAP HANA. In IEEE Data Eng. Bull. 36(2): 28–33 (June 2013). http://sites.computer.org/debull/A13june/hana1.pdf

Additional References

You might find it useful to additionally consider the following OPTIONAL refrences (will be continously updated!):
  • This chapter highlights aspects of the OLAP operators and their implementation in different database systems. O'Reilly. SQL in a Nutshell. Chapter 4
  • Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard Seeger. 2005. Progressive skyline computation in database systems. ACM Trans. Database Syst. 30, 1 (March 2005), 41-82. (PDF, 890KB)
  • Antonin Guttman. 1984. R-trees: a dynamic index structure for spatial searching. SIGMOD Rec. 14, 2 (June 1984), 47-57 (PDF, 1.1MB)

Course Information

TUCaN-Link 20-00-0594-iv
Lecture Tue. 13:30-15:10
in S2|07 167
Exercise Fri. 13:30-15:10
in S2|02 C120
CP (SWS) 6 (2+2)
Language English
Exam written exam
Wed. 18. March 2015
13:00-15:00
in S2|06 030
Office hours by arrangement
Forum Fachschaftsforum

Organizers

Prof. Alejandro Buchmann Daniel Bausch
A A A | Print | Contact | Legal note | Search