MSAN 602 Announcements

Natural Language Processing

posted Oct 2, 2012, 4:01 PM by Unknown user

Some of you were interested in the visualization tools for natural
language processing of twitter feeds that I showed today as part of
the bonus session (as part of the grand finale).

The R code for producing wordclouds and undirected graphs based on
collocation of terms in each tweet of the twitter corpus is uploaded
in the NLP sub-folder under R code. The graphs that I showed in class
are attached (you can make your own Christmas cards this year using
MSAN 602 analytics code).  These codes should run without any edits
and can be used for Q2 of Assignment 6 (which is a bonus question for
extra credit).

The R code that I showed in class for creating directed graphs from
adjacency matrices (needed for Q1 of Assignment 6) is in the igraph
sub-folder under R code. This code also shows how to call the
page-rank algorithm which we covered in class today together with the
HITS algorithm.

Assignment 6

posted Oct 2, 2012, 4:01 PM by Unknown user

As requested, assignment 6 has been updated so that the NLP question is just a bonus question. All the code for question 2 should run as is.

Assignment 5 Q1.3

posted Sep 30, 2012, 5:58 PM by Unknown user

You may experience some unexpected results when you try to plot the clusters on the map.  You may want to try adding each cluster incrementally using the following code (where k is the number of clusters)
 
map(database = "world")
for (i in 1:k){
  map.cities(x = major.cities[which(major.cities.pam$clustering==i),], country = "", label = NULL, minpop = 0,maxpop = Inf, capitals = 0, cex = par("cex"), projection = FALSE,parameters = NULL, orientation = NULL, pch = 1, col=i)
}
(you need to add the cluster centers too...)

Clustering

posted Sep 26, 2012, 10:05 AM by Unknown user

To assist with homework 5, I have uploaded Tutorial 5 on clustering (using Rattle and R)  in the Tutorial folder.
Please review Section 7.4 in CT1 on the k-means and PAM algorithms. The R code that I demonstrated in class, shows different uses of clustering and can be retrieved from the R-code folder.
 
Also for Thursday's class please review Section 7.5 in CT1 on hierarchical clustering methods.
 

Quiz on Ensemble Methods

posted Sep 18, 2012, 5:49 PM by Unknown user

There will be a short closed-book quiz next Tuesday (9/25/2012) at 10:15am on ensemble classifiers. Please carefully review the lecture notes (lecture 7) and the related material in the course text books - Chapter 7 of CT2 and 6.14 of CT1. The quiz will be in the form of some word-based questions and a numerical problem that you will have to solve using your knowledge of bagging and boosting. This quiz will consist of a pen and paper question - there will be no coding involved. You may want to bring a calculator.

Ensemble Learning

posted Sep 17, 2012, 12:02 AM by Unknown user

The next class was cover ensemble learning algorithms. We touched on this briefly last class why we were discussing ROC curves. Please review Tutorial 4a and 4b before class.

Sampling with and without replacing

posted Sep 13, 2012, 6:00 PM by Unknown user

sample(x, size, replace = FALSE, prob = NULL)

By default, the sample command in R draws random numbers without replacement. Set replace= T to sample with replacement (as in the bagging example today). BTW, 'Bagging' is short for 'Bootstrap aggregation' in case you were wondering about this obscure naming convention.

Ensemble Learning Methods

posted Sep 13, 2012, 5:54 PM by Unknown user   [ updated Sep 17, 2012, 12:00 AM ]

We touched briefly on ensemble learning methods in class today. Tutorial 4a and 4b have been added to the tutorials sections of the site. Each provides an introduction and tutorial on random forest and boosting ensemble learning methods respectively. In the class, I introduced the 'decision stump' as a tree with one split and explained how this is a type of weak classifer. Boost methods typically use weak classifiers. In general, random forest methods use full decision trees.

Uploading homework

posted Sep 8, 2012, 4:47 PM by Unknown user

Please could you upload homework 3 and all subsequent homeworks to your personal folder under http://msan602.analytics.usfca.edu/submissions/<surname>? (all lower case) or just navigate to your folder from the navigation panel (when signed-in to the site).
 
 
 

Reading list for tomorrow's lecture on K-nearest neighbors

posted Sep 5, 2012, 5:46 PM by Unknown user

Section 6.91 of CT1
Chpt 2 of CT2 

1-10 of 18