Temple University - CIS 3715 Spring 2020 Data Science

We use Python 3 in all lab assignments (version jupyter notebook (tensorflow) on computers in our lab).

Lab Assignment 1 (Due on Jan. 20 9:00 am)

Description: This is a "Hello world!" lab. In this lab, you will go through a Python tutorial. You will learn about a couple of Python development environments, Python syntax, data types, control flow, functions, and advice about navigating Python. Please download the tutorial file CIS3715 (Temple - Spring 2020) PythonBasics - Lab1.ipynb, read carefully, and run all the shown code trying to understand what is happening and why. Feel free to experiment with the code and follow the provided pointers or use Google search to learn more. After you are done with the tutorial, save the resulting ipynb file as lab1-(your last name).ipynb and submit it through Canvas. This will end your first lab assignment.

Lab Assignment 2 (Due on Feb. 3 11:59pm)

Description: In this lab you will become familiar with using Python to perform Exploratory Data Analysis (EDA). The lab is in a form of a brief tutorial that illustrates how to load and visualize a tabular data set provided in the csv format: cars.csv. The name of the data set is cars.csv and it is one of the popular benchmark data science data sets. You will be asked to go through the provided document at CIS3715 (Temple - Spring 2020) Lab2.ipynb, run the code, and answer many questions. The first 12 questions are related to the pieces of code that were provided to you. In the final question, you will be expected to produce a 2-page document that uses a combination of text and plots you can produce with Python to provide a coherent story about the data set. Submit the two files (modified .ipynb and your .pdf) through the Canvas.

Lab Assignment 3 (Deadline extended. Due on Feb. 17 9:00am)

Description: More EDA: improving expertise in loading, cleaning, and analyzing data. The objective of Lab 3 is for you to become more proficient in obtaining and working with different types of data. A particular emphasis will be on dealing with text data. Note: Please use Python 3 for this lab.

Lab assignment 4 - Extra Credit (Due on Feb. 17 9:00am)

This is the extra credit homework assignment worth 50% of a typical lab assignment. In this assignment, you will be asked to learn the basics of Tableau software for data visualization and apply your knowledge to create a web page with visualization of the Auto MPG Data. The total estimated effort to accomplish this assignment is 5 hours.

Task 0: Spend a few minutes browsing Tableau gallery at https://public.tableau.com/en-us/s/gallery to get an idea what kind of web pages could be produced by Tableau.

Task 1: Go to Tableau Public web page https://public.tableau.com/s/ and download the app. The app is available for Windows and Mac machines. This should not take more than a few minutes. Open the app.

Task 2: Go to the Tableau tutorial at https://public.tableau.com/en-us/s/resources. There are about 90 minutes of video lectures that will lead you all the way from loading different types of data to Tableau software to publishing a web page with your interactive data visualizations that could be similar to examples you have seen in the Tableau gallery. Instead of just watching, you are asked to learn along by repeating everything you see using your own app. The total estimated time to accomplish this task is 3 hours.

Task 3: Using your knowledge, load the Auto MPG Data (cars.csv) you are already familiar with from your labs. Use what you learned to create at least 3 Tableau Sheets showing different views of the data. Then use those sheets to create a Tableau Dashboard and publish the dashboard as a web page. Provide 2-3 paragraphs explaining why you decided to create your dashboard the way it is and discussing what kinds of insights one could get from your visualization. The total estimated time to accomplish this is 2 hours.

Deliverables: Submit to the Blackboard a one-page document containing the link to your Tableau web page and the few paragraphs that describe it.

Lab Assignment 5 (Due on Feb. 24 9:00am)

Description: In this Lab 5, we will make first steps in doing supervised learning. in particular, we will learn about the k-Nearest Neighbor (kNN) algorithm. kNN uses a simple idea: "you are what your neighbors are". This idea work quite well in data science. In the first part of the lab, we will cover some background needed to understand the kNN algorithm. In the second part, you will be asked to apply your knowledge on another data set.

Lab Assignment 6 (The deadline is extended to Mar. 9th 9:00am)

Description: In this Lab 6, we will keep working on supervised learning. We will first learn how to train decision trees and we will see that doing this using sklearn is not much different from running kNN algorithm.

Lab Assignment 7 (Due on March 16th 9:00am)

Description: In this Lab 7 (Dataset: onetweet.json, smallNYC.json), we will learn how to read JSON files and how to perform exploratory analysis of twitter data.

Lab Assignment 8 (Extended to March 30th 9:00am)

Description: In this Lab 8-1, and Lab8-2(Extra Credit), (Dataset: shakespear200.h5), we will learn how to train a Neural Network on image and text data.

Lab Assignment 9 (Due on March. 30th 9:00am)

Description: In this Lab 9, you will gain experience with clustering. In particular, you will learn how to use two of the most popular clustering algorithms: Hierarchical Clustering and K-Means Clustering. Then, you will be asked to apply this knowledge on a document data set. Your dataset can be found here: documents.csv, groupnames.csv, newsgroup.csv, wordlist.csv.

Lab Assignment 10 (Extended to Apr. 13th 9am)

Description: In this Lab 10 dataset: (d_temple, iris, documents, groupnames, newsgroups, wordlist), you will gain more experience with ranks and Singular Value Decomposition (SVD) and learn how to use SVD in data science.

CIS 3715 Data Science - Spring 2020