WELCOME TO
CS 5310 Data Mining
3 CREDIT HOURS
Spring 2021
Email: linh@uhd.edu |
Office: S-717 |
Office Telephone:
713-221-2781 |
Office Hours: Tuesday
2-4pm Virtual Class Location: Zoom
Meeting Room 154 491 636, Link: https://uhd.zoom.us/j/154491636 (Password: !a2Ts9) |
My preferred method of contact is email
(linh@uhd.edu). Responses to messages
received will be sent within 24-48 hours. However, please note that emails sent over
the weekend may not receive a reply until the next business day. It is
important to plan accordingly. Repeated
emails about a matter that has been addressed already may not be responded.
Gatormail is the official
student email of UHD. All email
correspondence from you and to you will occur using Gatormail
or your personal email that you choose to use.
The course introduces the
fundamental techniques of data mining for extracting knowledge from data, and
provides the students with the ability of applying these techniques on
real-world problems via well-known data mining tools such as R. The topics
include data exploration, model building, and model validation.
Prerequisites:
Credit or enrollment in CS
5318 (Database Management Systems), and STAT 5301 (Foundation of Data Analysis
with SAS Program), or with consent of the instructor.
Learning Objectives:
After taking this course, students should be able to
LO1. Identify the challenges and constrains of real-world data, and be able to apply data exploration techniques to
address these challenges.
LO2. Design data mining models for classifying, groping, associating, and
detecting anomaly in data. The algorithms for data classification include
decision trees, neural network, and Bayesian classifiers. The algorithms for
data clustering include K-means clustering, hierarchical clustering, and
density-based clustering. The algorithms for data association include
association rule-based learning.
LO3. Evaluate the models’ performance via various validation
techniques.
LO4. Apply the state-of-art data mining tools in Python for designing
and evaluating the models.
LO5. Analyze real-world datasets beginning with the data exploration to
gaining knowledge from the data.
Textbook & Course Materials:
Required
Text(s):
|
Machine Learning with R, 3rd Edition by Brett Lantz, Packt Publishing. ISBN-13: 9781788295864 |
A student of this
institution is not under any obligation to purchase a textbook from a
university affiliated bookstore. The same textbook may also be purchased from
an independent retailer, including an online retailer.
Recommended
(Optional) Readings:
· Data
Mining: Concepts and Techniques by Jiawei Han
and Micheline Kamber, 2011, 3rd Edition,
The Morgan Kaufmann Series in Data Management Systems. |
|
· Python for Everybody –
Exploring Data in Python 3, 1st edition, by Charles R.
Severance, CreateSpace Independent Publishing, ISBN: 9781530051120. ·
Python for Data Analysis
- Data Wrangling with Pandas, NumPy, and Ipython, 2nd edition, by William McKinney, O'Reilly
Media, ISBN-10: 1491957662, ISBN-13: 978-1491957660. ·
Data Analytics Using Microsoft® Excel®, 2nd
Edition, by Joseph M. Manzo, FlatWorld, ISBN: 978-1-4533-9522-6.
https://students.flatworldknowledge.com/course/2593605 |
Required
Technology:
We will use Python 3 to do all programming work in this course. Anaconda 3 is strongly recommended as the IDE to edit and test Python codes. Please download and install Anaconda 3 from https://anaconda.org on your computer.
To maximize your success in online courses at UHD, you should have access to a desktop or laptop computer running an up-to-date Windows or macOS operating system, using the latest Firefox or Chrome browsers. A built-in or add-on webcam is also often required in certain courses where multimedia tools (Zoom, VoiceThread, etc.) and/or exam proctoring tools (Lockdown Browser, Monitor, etc.) are used. Chromebooks and some other tablets are not compatible with test proctoring tools such as ProctorU or Lockdown Browser. While the Blackboard App (e.g., on your phone) can be helpful for some course features, UHD recommends that you do not use it for working on or submitting graded activities.
To avoid being disconnected at critical moments, we
encourage you to access courses, in particular exams, on a computer that is
hardwired to the Internet router (via Ethernet using a Cat 5 or Cat 6 cable) as
opposed to depending on Wi-Fi whenever possible. Additionally, certain courses
may require additional software downloads and installs, so you may need a
machine with permission to do that. For more information on taking Blackboard
tests, see this guide. If you are experiencing challenges with
technology, please communicate with your instructor in a timely manner and seek
help from our UHD IT
Support to identify possible
solutions.
Course
Format:
This course is fully
on-line. Following the departmental guidelines about on-line courses, we will
use a mixture of synchronous and asynchronous interaction modes to handle the
teaching/learning.
For the synchronous
mode, we will maintain a virtual "face-to-face" meeting in a meeting
time determined at the beginning of the semester. We will meet in Zoom to go
over the lectures, do closed labs, and have discussions. Exams will be also
scheduled in the synchronous mode time slots. If for any reason you are unable
to join the Zoom meeting, you can use the recorded lectures to help the study.
For the asynchronous
mode, we will assign materials for self-paced study as the course work, but do
not schedule Zoom meetings.
The split of the
course work between synchronous and asynchronous modes is 50-50 percent.
A few must-know
items that need your attention:
1. All assignments
will be handled in Blackboard, which means only submissions in blackboard will
be accepted and graded. Please do not submit any assignment to Blackboard
message box. Late submissions in principle are not accepted.
2. All email
communications please be directed to my UHD email address: linh@uhd.edu. Please
do not send any messages in Blackboard.
The course syllabus,
PPTs, and lecture recordings can be found in the Blackboard course shell.
Students new to online may find
these resources particularly valuable to determine your readiness for and
understand general expectations in an online course:
·
Online
Readiness Self-Assessment (Link): Complete this self-assessment to receive specific feedback
based on a student’s individual needs. This self-assessment has 22 questions,
and it shouldn't take more than a few minutes for you to complete.
·
Realistic
Preview of Online Learning (Video): In this brief
video, hear from UHD students on what to expect in an online class and how to
overcome common challenges.
·
Blackboard
Orientation: After logging into Blackboard, students can complete an
orientation on the foundations of Blackboard.
Teaching
Philosophy:
This course aims to let students learn
machine learning and data mining. The following are the main points that will
feature the course:
1. Students are expected to have basic knowledge in
Python and feel comfortable in coding.
2.
Every machine
learning algorithm is accompanied by a case study and a programming assignment
(also called “lab” in this course).
3.
Students need to
complete the study of the theoretical part of the chapter before the
synchronous class meeting. Emphasis is on the case study and the lab during the
synchronous class meeting.
Course
Requirements:
Course grades will be determined as follows:
Assignment |
Weight |
Exam-1 (ML Chapters 1-4) |
17.5% |
Exam-2 (ML Chapters 5-6) |
17.5% |
Final Exam (Comprehensive ML Chapters 1-9) |
25% |
Labs (about 12) |
40% |
Grading
Scale:
Your final course grade will
be determined by the standard college formula based on your course
average:
90‐100 à "A",
80‐89 à "B",
70‐79 à "C",
60‐69 à "D",
0‐59 à "F"
Course
Objectives Mapping:
Learning Objectives |
Activities/Assessments |
Data exploration |
Case studies and labs include data
exploration components |
Design data mining models |
Each chapter presents 1-2 machine
learning algorithms |
Evaluate the models’ performance |
Case studies and labs |
Apply the state-of-art data mining tools in Python |
Case studies are illustrated using
Python coding. Labs are done in Python |
Analyze real-world datasets |
Service learning project |
Course
Policies & Procedures:
Late
Work:
All coursework, including programming
assignments (or labs), must be submitted by the deadline. No late submissions
will be accepted.
Make-Up
Exams:
Make‐up exams will only be given in cases
of documented emergencies. It is your responsibility to contact your instructor
with documentation of your emergency at least 3 days before the exam date.
Feedback
& Grading Policy:
Submitted coursework will be graded within
one week. Multiple submissions are allowed before in the deadline. However,
resubmission is not accepted after the deadline.
Any correspondence regarding your
participation or grades can only be sent to your Gatormail
or your personal email address that you have used in communicating with the
instructor. Please note that all communications with the instructor must be
directed to email address: linh@uhd.edu.
Participation Policy:
Student attendance in the synchronous
class meetings is expected. For those who miss the class meeting, video
recording of the meeting will be placed in the Blackboard course shell to be
reviewed. It is strongly recommended that the students attend closed lab
sessions. Attendance in exams is mandatory.
Strategies
for Student Success:
A successful
student in this course should exhibit in-depth understanding of machine
learning and data mining, and demonstrate working
knowledge in most of the rudimentary machine learning algorithms. The students are encouraged to exploit
relevant student services provided by the University or the CST College,
including: Reading & Writing Center, Math & Stat Center, Supplemental
Instruction, Library, Career Development Center, and Student Counseling
Services.
UHD
Student Support Services:
UHD has
developed many resources to support your learning. We have developed a website that will offer a
“one stop shop” for access to many of the resources you might need this
semester to support your educational goals.
Please access this website to get started: https://tinyurl.com/SSR2020. If you do not find the resource you need on
this website, please contact your instructor, who will make every effort to
connect you with the help you need.
Student
Challenges & Emergencies:
In case of any emergency or challenge with
personal matters that impact a student’s ability to succeed in the course,
please contact appropriate UHD services such as Student Counseling, Registrar,
Financial Aid, etc., while notifying the instructor. University policy allows an incomplete grade be issued in case a student has been performing well
throughout the semester but is not able to complete the last portion of the
coursework due to an emergency matter. Such an incomplete grade must be removed
in a limited time frame.
University
Requirements: Disruptions, COVID Reporting, and Safety
To address issues related to disruption of university
functions, COVID reporting, and safety protocols, as well as mandatory
engagement with classes by the 10th class day, UHD has prepared a
general set of requirements that can be found here https://www.uhd.edu/administration/environmental-health-safety/Pages/General-Policy-Requirements.aspx
These requirements are part of the expectations for this
course. Any updates to the website will be communicated to students via their Gatormail accounts.
University
Policies:
All students are subject to the policies listed below as well as all other university-wide policies and procedures as set forth in the UHD University Catalog and Student Handbook.
Accessibility and Statement of Reasonable
Accommodations: The University of
Houston-Downtown (UHD), is committed
to creating a learning environment that meets the needs of its diverse student
population. Accordingly, UHD strives to provide reasonable academic
accommodations to students who request and are eligible, as specified by
Section 504 and ADA guidelines. Students with disabilities may work with the
Office of Disability Services to discuss a range of options to removing
barriers in this course, including official accommodations. If you
have a disability, or think you may have a disability, please contact the
Office of Disability Services, to begin this conversation or request an
official accommodation. Office of
Disability Services, One Main St., Suite GSB 314, Houston, TX 77002. (Office Phone) 713-221-5078 (Website) www.uhd.edu/disability/ (Email) disabilityservices@uhd.edu
It is important for students
to understand that no accommodation can be made by an individual instructor for
a student without specific direction from the Office of Disability Services.
Academic
Integrity (PS
03.A.19 and UHD Student Handbook): The UHD Academic Honesty Policy states,
"Students must be honest in all academic activities and must not tolerate
dishonesty." Students are
responsible for doing their own work and avoiding all forms of academic
dishonesty. The most common academic honesty violations are cheating and
plagiarism. Cheating includes, but is not limited to: submitting material that is not one's own, submitting
substantially similar material in more than one course, even if it is one’s own
work, without the instructor’s permission, using information or devices that are not allowed by the faculty
member, obtaining and/or using unauthorized material, Fabricating information,
Violating procedures prescribed to protect the integrity of a test, or other
evaluation exercise, Collaborating with others on assignments without the
faculty member's consent, Cooperating with or helping another student to cheat,
Having another person take an examination in the student's place, Altering exam
answers and requesting that the exam be re-graded, Communicating with any
person during an exam, other than the faculty member or exam proctor. Plagiarism includes, but
is not limited to directly quoting the words of others without using quotation
marks or indented format to identify them, using sources of information
(published or unpublished) without identifying them, and/or paraphrasing
materials or ideas of others without identifying the sources.
End-of-Course Student Surveys (IDEA):
During the last week of the
course, you will be asked to complete an end of course survey. Your thoughtful and honest responses to the
survey are extremely important. We learn best what works, and what doesn’t, by
listening to our students. The survey is
your chance to help us improve.
Syllabus Subject to Change:
This syllabus is tentative and subject to
change. Changes, if any, will be updated at the course website: http://cms.dt.uh.edu/Faculty/LinH/courses/CS5310/index.htm.
Course
Calendar:
The Course calendar below contains only
the general outline of the activities and assignments that you are responsible
for each week. Specific instructions for each week are provided in Blackboard.
Note that the course calendar is subject to updates during the semester. It is
your responsibility to check this calendar prior to planning any course events.
Week |
Tuesday (Synchronous) |
Thursday (Asynchronous) |
1 |
1/19 |
1/21 |
2 |
1/26 Chapter 2 Lab |
1/28 |
3 |
2/2 Chapter 3 Lab |
2/4 |
4 |
2/9 Chapter 4 Lab |
2/11 1st Test Review |
5 |
2/16 1st Test |
2/18 |
6 |
2/23 Chapter 5 Part 1 Lab |
2/25 |
7 |
3/2 Chapter 5 Part 2 Lab |
3/4 |
8 |
3/9 Chapter 6 Part 1 Lab |
3/11 |
|
3/16 Spring break |
3/18 Spring break |
9 |
3/23 Chapter 6 Part2 Lab |
3/25 2nd Test Review |
10 |
3/30 |
4/1 |
11 |
4/6 Chapter 7 Part 1 Lab |
4/8 |
12 |
4/13 Chapter 7 Part 2 Lab |
4/15 |
13 |
4/20 Chapter 8 Lab |
4/22 |
14 |
4/27 Chapter 9 Lab |
4/29 Chapter 10 |
15 |
5/4 Review |
5/6 Reading day |
16 |
5/11 Final exam 5:30pm-7:30pm |
5/13 |