University of Houston-Downtown

 

WELCOME TO

CS 5310 Data Mining

3 CREDIT HOURS

Spring 2021

 

Instructor: Hong Lin

Email: linh@uhd.edu

Office: S-717

Office Telephone: 713-221-2781

Office Hours: Tuesday 2-4pm Virtual

Class Location: Zoom Meeting Room 154 491 636, Link: https://uhd.zoom.us/j/154491636 (Password: !a2Ts9)

 

 

 

 

 

 

 

My preferred method of contact is email (linh@uhd.edu).  Responses to messages received will be sent within 24-48 hours.  However, please note that emails sent over the weekend may not receive a reply until the next business day. It is important to plan accordingly.  Repeated emails about a matter that has been addressed already may not be responded.

 

Gatormail is the official student email of UHD.  All email correspondence from you and to you will occur using Gatormail or your personal email that you choose to use. 

 

Course Description:

The course introduces the fundamental techniques of data mining for extracting knowledge from data, and provides the students with the ability of applying these techniques on real-world problems via well-known data mining tools such as R. The topics include data exploration, model building, and model validation.

 

Prerequisites:

Credit or enrollment in CS 5318 (Database Management Systems), and STAT 5301 (Foundation of Data Analysis with SAS Program), or with consent of the instructor.

 

Learning Objectives:

After taking this course, students should be able to

LO1.   Identify the challenges and constrains of real-world data, and be able to apply data exploration techniques to address these challenges.

LO2.   Design data mining models for classifying, groping, associating, and detecting anomaly in data. The algorithms for data classification include decision trees, neural network, and Bayesian classifiers. The algorithms for data clustering include K-means clustering, hierarchical clustering, and density-based clustering. The algorithms for data association include association rule-based learning.

LO3.   Evaluate the models’ performance via various validation techniques.

LO4.   Apply the state-of-art data mining tools in Python for designing and evaluating the models.

LO5.   Analyze real-world datasets beginning with the data exploration to gaining knowledge from the data.

 

Textbook & Course Materials:

 

Required Text(s):

Machine Learning with R - Third Edition

Machine Learning with R, 3rd Edition by Brett Lantz, Packt Publishing.

ISBN-13: 9781788295864

 

A student of this institution is not under any obligation to purchase a textbook from a university affiliated bookstore. The same textbook may also be purchased from an independent retailer, including an online retailer.

 

Recommended (Optional) Readings:

·       Data Mining: Concepts and Techniques by Jiawei Han and Micheline Kamber, 2011, 3rd Edition, The Morgan Kaufmann Series in Data Management Systems.
ISBN-13: 978-0123814791

·     Python for Everybody – Exploring Data in Python 3, 1st edition, by Charles R. Severance, CreateSpace Independent PublishingISBN: 9781530051120.

·       Python for Data Analysis - Data Wrangling with Pandas, NumPy, and Ipython, 2nd edition, by William McKinneyO'Reilly MediaISBN-10: 1491957662, ISBN-13: 978-1491957660.

·       Data Analytics Using Microsoft® Excel®, 2nd Edition, by Joseph M. Manzo, FlatWorld, ISBN: 978-1-4533-9522-6. https://students.flatworldknowledge.com/course/2593605

 

Required Technology:  

We will use Python 3 to do all programming work in this course. Anaconda 3 is strongly recommended as the IDE to edit and test Python codes. Please download and install Anaconda 3 from https://anaconda.org on your computer.

To maximize your success in online courses at UHD, you should have access to a desktop or laptop computer running an up-to-date Windows or macOS operating system, using the latest Firefox or Chrome browsers. A built-in or add-on webcam is also often required in certain courses where multimedia tools (Zoom, VoiceThread, etc.) and/or exam proctoring tools (Lockdown Browser, Monitor, etc.) are used.  Chromebooks  and some other tablets are not compatible with test proctoring tools such as ProctorU or Lockdown Browser.  While the Blackboard App (e.g., on your phone) can be helpful for some course features, UHD recommends that you do not use it for working on or submitting graded activities. 

To avoid being disconnected at critical moments, we encourage you to access courses, in particular exams, on a computer that is hardwired to the Internet router (via Ethernet using a Cat 5 or Cat 6 cable) as opposed to depending on Wi-Fi whenever possible. Additionally, certain courses may require additional software downloads and installs, so you may need a machine with permission to do that.  For more information on taking Blackboard tests, see this guide. If you are experiencing challenges with technology, please communicate with your instructor in a timely manner and seek help from our UHD IT Support to identify possible solutions.

Course Format:

This course is fully on-line. Following the departmental guidelines about on-line courses, we will use a mixture of synchronous and asynchronous interaction modes to handle the teaching/learning.

For the synchronous mode, we will maintain a virtual "face-to-face" meeting in a meeting time determined at the beginning of the semester. We will meet in Zoom to go over the lectures, do closed labs, and have discussions. Exams will be also scheduled in the synchronous mode time slots. If for any reason you are unable to join the Zoom meeting, you can use the recorded lectures to help the study.

For the asynchronous mode, we will assign materials for self-paced study as the course work, but do not schedule Zoom meetings.

The split of the course work between synchronous and asynchronous modes is 50-50 percent.

A few must-know items that need your attention:

1. All assignments will be handled in Blackboard, which means only submissions in blackboard will be accepted and graded. Please do not submit any assignment to Blackboard message box. Late submissions in principle are not accepted.

2. All email communications please be directed to my UHD email address: linh@uhd.edu. Please do not send any messages in Blackboard.

The course syllabus, PPTs, and lecture recordings can be found in the Blackboard course shell.

Students new to online may find these resources particularly valuable to determine your readiness for and understand general expectations in an online course:

·        Online Readiness Self-Assessment (Link): Complete this self-assessment to receive specific feedback based on a student’s individual needs. This self-assessment has 22 questions, and it shouldn't take more than a few minutes for you to complete.

·        Realistic Preview of Online Learning (Video): In this brief video, hear from UHD students on what to expect in an online class and how to overcome common challenges.

·        Blackboard Orientation: After logging into Blackboard, students can complete an orientation on the foundations of Blackboard.

Teaching Philosophy:

This course aims to let students learn machine learning and data mining. The following are the main points that will feature the course:

1.      Students are expected to have basic knowledge in Python and feel comfortable in coding.

2.      Every machine learning algorithm is accompanied by a case study and a programming assignment (also called “lab” in this course).

3.      Students need to complete the study of the theoretical part of the chapter before the synchronous class meeting. Emphasis is on the case study and the lab during the synchronous class meeting.

Course Requirements:

Course grades will be determined as follows:

Assignment

Weight

Exam-1 (ML Chapters 1-4)

17.5%

Exam-2 (ML Chapters 5-6)

17.5%

Final Exam (Comprehensive ML Chapters 1-9)

25%

Labs (about 12)

40%

 

Grading Scale: 

Your final course grade will be determined by the standard college formula based on your course average:

90‐100 à "A", 80‐89 à "B", 70‐79 à "C", 60‐69 à "D", 0‐59 à "F"

 

Course Objectives Mapping:

Learning Objectives

Activities/Assessments

Data exploration

Case studies and labs include data exploration components

Design data mining models

Each chapter presents 1-2 machine learning algorithms

Evaluate the models’ performance

Case studies and labs

Apply the state-of-art data mining tools in Python

Case studies are illustrated using Python coding. Labs are done in Python

Analyze real-world datasets

Service learning project

 

Course Policies & Procedures:

 

Late Work:

All coursework, including programming assignments (or labs), must be submitted by the deadline. No late submissions will be accepted.

 

Make-Up Exams:

Make‐up exams will only be given in cases of documented emergencies. It is your responsibility to contact your instructor with documentation of your emergency at least 3 days before the exam date.

 

Feedback & Grading Policy:

Submitted coursework will be graded within one week. Multiple submissions are allowed before in the deadline. However, resubmission is not accepted after the deadline.

 

Any correspondence regarding your participation or grades can only be sent to your Gatormail or your personal email address that you have used in communicating with the instructor. Please note that all communications with the instructor must be directed to email address: linh@uhd.edu.

 

Participation Policy:

Student attendance in the synchronous class meetings is expected. For those who miss the class meeting, video recording of the meeting will be placed in the Blackboard course shell to be reviewed. It is strongly recommended that the students attend closed lab sessions. Attendance in exams is mandatory.

 

Strategies for Student Success:

A successful student in this course should exhibit in-depth understanding of machine learning and data mining, and demonstrate working knowledge in most of the rudimentary machine learning algorithms.  The students are encouraged to exploit relevant student services provided by the University or the CST College, including: Reading & Writing Center, Math & Stat Center, Supplemental Instruction, Library, Career Development Center, and Student Counseling Services.

UHD Student Support Services:

UHD has developed many resources to support your learning.  We have developed a website that will offer a “one stop shop” for access to many of the resources you might need this semester to support your educational goals.  Please access this website to get started:  https://tinyurl.com/SSR2020.  If you do not find the resource you need on this website, please contact your instructor, who will make every effort to connect you with the help you need.

Student Challenges & Emergencies:

In case of any emergency or challenge with personal matters that impact a student’s ability to succeed in the course, please contact appropriate UHD services such as Student Counseling, Registrar, Financial Aid, etc., while notifying the instructor.  University policy allows an incomplete grade be issued in case a student has been performing well throughout the semester but is not able to complete the last portion of the coursework due to an emergency matter. Such an incomplete grade must be removed in a limited time frame.

 

University Requirements: Disruptions, COVID Reporting, and Safety

To address issues related to disruption of university functions, COVID reporting, and safety protocols, as well as mandatory engagement with classes by the 10th class day, UHD has prepared a general set of requirements that can be found here https://www.uhd.edu/administration/environmental-health-safety/Pages/General-Policy-Requirements.aspx

These requirements are part of the expectations for this course. Any updates to the website will be communicated to students via their Gatormail accounts.

University Policies:

All students are subject to the policies listed below as well as all other university-wide policies and procedures as set forth in the UHD University Catalog and Student Handbook.

 

Accessibility and Statement of Reasonable Accommodations: The University of Houston-Downtown (UHD), is committed to creating a learning environment that meets the needs of its diverse student population.  Accordingly, UHD strives to provide reasonable academic accommodations to students who request and are eligible, as specified by Section 504 and ADA guidelines. Students with disabilities may work with the Office of Disability Services to discuss a range of options to removing barriers in this course, including official accommodations.   If you have a disability, or think you may have a disability, please contact the Office of Disability Services, to begin this conversation or request an official accommodation.  Office of Disability Services, One Main St., Suite GSB 314, Houston, TX 77002.  (Office Phone) 713-221-5078 (Website) www.uhd.edu/disability/ (Email) disabilityservices@uhd.edu

It is important for students to understand that no accommodation can be made by an individual instructor for a student without specific direction from the Office of Disability Services.

 

Academic Integrity (PS 03.A.19 and UHD Student Handbook): The UHD Academic Honesty Policy states, "Students must be honest in all academic activities and must not tolerate dishonesty."  Students are responsible for doing their own work and avoiding all forms of academic dishonesty. The most common academic honesty violations are cheating and plagiarism. Cheating includes, but is not limited to:  submitting material that is not one's own, submitting substantially similar material in more than one course, even if it is one’s own work, without the instructor’s permission, using information or devices that are not allowed by the faculty member, obtaining and/or using unauthorized material, Fabricating information, Violating procedures prescribed to protect the integrity of a test, or other evaluation exercise, Collaborating with others on assignments without the faculty member's consent, Cooperating with or helping another student to cheat, Having another person take an examination in the student's place, Altering exam answers and requesting that the exam be re-graded, Communicating with any person during an exam, other than the faculty member or exam proctor.  Plagiarism includes, but is not limited to directly quoting the words of others without using quotation marks or indented format to identify them, using sources of information (published or unpublished) without identifying them, and/or paraphrasing materials or ideas of others without identifying the sources. 

 

End-of-Course Student Surveys (IDEA):

During the last week of the course, you will be asked to complete an end of course survey.  Your thoughtful and honest responses to the survey are extremely important. We learn best what works, and what doesn’t, by listening to our students.  The survey is your chance to help us improve.

 

Syllabus Subject to Change:

This syllabus is tentative and subject to change. Changes, if any, will be updated at the course website: http://cms.dt.uh.edu/Faculty/LinH/courses/CS5310/index.htm.

 

Course Calendar:

The Course calendar below contains only the general outline of the activities and assignments that you are responsible for each week. Specific instructions for each week are provided in Blackboard. Note that the course calendar is subject to updates during the semester. It is your responsibility to check this calendar prior to planning any course events.

 

Week

Tuesday (Synchronous)

Thursday (Asynchronous)

1

1/19
Chapter 1

1/21
Chapter 2

2

1/26

Chapter 2 Lab

1/28

Chapter 3

3

2/2

Chapter 3 Lab

2/4

Chapter 4

4

2/9

Chapter 4 Lab

2/11

1st Test Review

5

2/16

1st Test

2/18

Chapter 5 Part 1

6

2/23

Chapter 5 Part 1 Lab

2/25
Chapter 5 Part 2

7

3/2

Chapter 5 Part 2 Lab

3/4

Chapter 6 Part 1

8

3/9

Chapter 6 Part 1 Lab

3/11

Chapter 6 Part 2

 

3/16

Spring break

3/18

Spring break

9

3/23

Chapter 6 Part2 Lab

3/25

2nd Test Review

10

3/30
2nd Test

4/1

Chapter 7 Part 1

11

4/6

Chapter 7 Part 1 Lab

4/8
Chapter 7 Part 2

12

4/13

Chapter 7 Part 2 Lab

4/15

Chapter 8

13

4/20

Chapter 8 Lab

4/22
Chapter 9

14

4/27

Chapter 9 Lab

4/29

Chapter 10

15

5/4

Review

5/6

Reading day

16

5/11

Final exam 5:30pm-7:30pm

5/13