CS 6303 – Big Data
Analytics
Email: linh@uhd.edu |
Office: S-717 |
Office Telephone: 713-221-2781 |
Office Hours: Tuesday & Thursday 1:00-2:30pm |
Gatormail is the official student email of UHD. All email
correspondence from you and to you will occur using Gatormail or
your personal email that you choose to use.
Catalog Description: The course introduces concepts and techniques in managing and analyzing
large data sets for data discovery and modeling. Topics include big data
storage systems, parallel processing platforms, and scalable machine learning
algorithms.
Course
Prerequisites: Credit in CS 5310 (Data Mining), or with consent of the
instructor.
Learning Objectives: After taking this course, students should be
able to
LO1. Explain how big-data has become a new
norm of life and what challenges are associated with it in building models and
ultimately, discovering knowledge.
LO2. Master the topics on the tools,
algorithms and platforms required to store and analyze big data. In particular, the students will obtain
knowledge on parallel processing platforms such as Hadoop and Spark. They will learn several data storage methods
such as Hadoop Distributed File System (HDFS), HBase, document database and
graph database. They will be introduced scalable machine learning algorithms
via software tools such as to Apache’s Mahout.
LO3. Develop visualizations for large
datasets using JavaScript and D3 (Data-Driven Documents). Typically, these
web-based visual representations are designed to communicate important elements
of the data.
LO4. Design highly scalable systems for
storing and analyzing large volumes of unstructured data.
Required Textbooks
·
Mining
of Massive Datasets, by Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman,
Cambridge University Press, ISBN: 978-1-107-07723-2. References ·
Big Data Analytics: From Strategic Planning to
Enterprise Integration with Tools, Techniques, NoSQL, and Graph, by David Loshin, 2013, 1st
Edition, Morgan Kaufmann publisher. |
|
·
Hadoop: The Definitive Guide by Tom White, 2015,
4th Edition, O’Reilly Media.
|
|
·
Mahout
in Action, by John Foreman by Sean Owen, Robin Anil, Ted Dunning, 2011, 1st
Edition, Manning Publications. |
|
·
Big Data in Omics and Imaging: Integrated Analysis and Causal
Inference, by Momiao Xiong, 1st
Edition, Chapman and Hall/CRC, ISBN 9780815387107 |
|
Course
Topics:
The following topics will be covered as time
permits.
1. Introduction to Big Data Analytics
2. Big Data Analytics Platforms
3. Big Data Storage and Processing
4. Big Data Analytics Algorithms:
Recommendations
5. Big Data Analytics Algorithms:
Clustering
6. Big Data Analytics Algorithms:
Classification
7. Big Data Visualization
Workload: 5-7 hours/week
Course
Grade: Course
grades will be determined as follows:
Assignment |
Weight |
Exam-1 |
20 % |
Exam-2 |
20 % |
Final Exam |
30 % |
Labs |
10 % |
Research Topic Review
(10-page report 10% + 15 minute Presentation 10%) |
20 % |
The
grade on research writing and presentation will be determined as 50% writing and
50% presentation. The presentation will be peer-evaluated by anonymous survey.
In order to ensure the validity of the survey, the missing portion of the
evaluation from non-participating peers will be filled in by the instructor’s
evaluation.
Your
final course grade will be determined by the standard college formula based on
your course average:
90-100 à
"A", 80-89 à
"B", 70-79 à
"C", 60-69 à
"D", 0-59 à
"F"
Topic Prerequisites: The course
is essentially self-contained. The necessary material from statistics is
integrated into the course.
Course
Format:
This
course is hybrid. Following the departmental guidelines about hybrid courses,
we will use a mixture of synchronous face-to-face and asynchronous interaction
modes to handle the teaching/learning.
For
the synchronous face-to-face mode, we will meet every Tuesday and Thursday
5:30-7:30pm in the classroom. Exams will be also scheduled in the synchronous
face-to-face mode time slots.
For
the asynchronous mode, we will assign materials for self-paced study as the
course work, but do not schedule any meetings, either face-to-face
or online.
The
split of the course work between synchronous and asynchronous modes is 50-50
percent.
A few
must-know items that need your attention:
1.
All assignments will be handled in Blackboard, which means only submissions in
blackboard will be accepted and graded. Please do not submit any assignment to
Blackboard message box. Late submissions in principle are not accepted.
2.
All email communications please be directed to my UHD email address:
linh@uhd.edu. Please do not send any messages in Blackboard.
The
course syllabus, PPTs, and lecture recordings can be found in the Blackboard
course shell.
Online
Course Support: The
Blackboard system (https://bb.uhd.edu/) will
be used for online course material. As
the semester progresses, various materials will be posted there including
lecture notes, projects, and course announcements.
MAKE-UP POLICIES
·
Course projects/Homework assignments: are to be
completed and turned in by the due date.
For each late day, 15% of the total
possible points will be deducted (a day ends at the due time). No work will
be accepted more than 5 days late.
·
Exams: Make-up exams will only be given in cases of
documented emergencies. It is your
responsibility to contact your instructor with documentation of your emergency
as soon as possible.
·
Quizzes: No Make-ups
for quizzes.
·
All missed grades will be recorded as zeros.
CLASS POLICIES
·
Academic Dishonesty: For this class, all work must be done individually -- no group work is
allowed. You are encouraged to generally discuss assignments with fellow
students, but may not copy their solution or code. Doing so constitutes
academic dishonesty which will be sanctioned with a grade of F in the course.
See http://www.uhd.edu/about/hr/PS03A19.pdf
for more information on UHD’s policy on academic dishonesty.
·
Statement on Reasonable Accommodations: The University of
Houston-Downtown complies with Section 504 of the Rehabilitation Act of 1973 and
the Americans with Disabilities Act of 1990, pertaining to the provision of
reasonable academic adjustments/auxiliary aids for students with a
disability. In accordance with Section 504 and ADA guidelines, UHD
strives to provide reasonable academic adjustments/auxiliary aids to students
who request and require them. If you believe that you have a documented
disability requiring academic adjustments/auxiliary aids, please contact the
Office of Disability Services, One Main St., Suite 409-South, Houston, TX
77002.
Contact info: 713-226-5227, disabilityservices@uhd.edu, www.uhd.edu/disability/
CS 6303 -
Course Schedule
(This schedule
is subject to update. You should check the schedule regularly for assignments
and due dates)
Week |
Monday (virtual) |
Tuesday |
Wednesday (virtual) |
Thursday |
1 |
7/10 |
7/11 Chapter 2 |
7/12 Chapter 2 Chapter 3 |
7/13 Chapter 3 Hadoop lab |
2 |
7/17 Chapter 3 |
7/18 Chapter 3 review Chapter 4 |
7/19 Chapter 4 |
7/20 Midterm Exam 1 Chapter 4 |
3 |
7/24 |
7/25 Chapter 5 Spark lab |
7/26 Chapter 5 |
7/27 Chapter 5 |
4 |
7/31 Chapter 5 |
8/1 Chapter 4,5,7 review Chapter 7 SparkR lab |
8/2 Chapter 7 |
8/3 Midterm Exam 2 Chapter 9 |
5 |
8/7 Chapter 9 |
8/8 Writing projects presentation |
8/9 |
8/10 Final Exam |