PREFACE

OBJECTIVE

The objective of this book is to introduce data mining concepts, describe methods in data mining from sampling to decision trees, demonstrate the features of user-friendly data mining SAS tools, and above all, allow the book users to download data mining SAS macro call files and help them perform complete data mining.  The user-friendly SAS macro approach integrates the statistical and graphical analysis tools available in SAS systems and provides complete data mining solutions without writing SAS program codes or using the point-and-click approach.  Step-by-step instructions for using SAS macros and interpreting the results are emphasized in each chapter. Thus, by following the step-by-step instructions and downloading the user-friendly SAS macros described in the book, data analysts can perform complete data mining analysis quickly and effectively.

WHY USE SAS SOFTWARE?

SAS Institute, the industry leader in analytical and decision support solutions, offers a comprehensive data mining solution that allows you to explore large quantities of data and discover relationships and patterns that lead to intelligent decision-making.  Enterprise Miner, SAS Institute's data mining software, offers an integrated environment for businesses that need to conduct comprehensive data mining.  However, the annual licensing fee for using the ‘Enterprise Miner’ is extremely high.  Thus, small businesses, non-profit institutions, and academic universities are often unable to use this powerful analytical tool for data mining.  Also, including complete SAS codes in the book for performing comprehensive data mining solutions are not very effective because a majority of business and statistical analysts are not experienced SAS programmers.  Quick results from data mining are not feasible since many hours of code modification and debugging program errors are required if the analysts are required to work with SAS program codes.  An alternative to the point-and-click menu interface modules and the highly priced SAS enterprise minor is the user-friendly SAS macro applications for performing several data mining tasks, which are included in this book.  This macro approach integrates statistical and graphical tools available in SAS systems and provides user-friendly data analysis tools, which allow the data analysts to complete data mining tasks quickly, without writing SAS programs, by running the SAS macros in the background. 

 

COVERAGE:

The following types of analyses can be performed using the user-friendly SAS macros.

·        Converting PC databases to SAS data,

·        Sampling techniques to create training and validation samples,

·        Exploratory graphical techniques:

o       Univariate analysis of continuous response

o       Frequency data analysis for categorical data

·        Unsupervised learning:

o       Principal component

o       Factor, and cluster analysis

o       K-mean cluster analysis

o       Bi-plot display

·        Supervised learning: Prediction

o       Multiple regression models

§         Partial and VIF plots, plots for checking data and model problems

§         Lift charts

§         Scoring

§         Model validation techniques

o       Logistic Regression

§         Partial delta logit plots, ROC curves false positive/negative plots

§         Lift charts

§         Model validation techniques

Supervised learning: Classification

o       Discriminant Analysis

§         Canonical Discriminant analysis – biplots

§         Parametric Discriminant analysis

§         Non-parametric Discriminant analysis

§         Model validation techniques

o       CHAID – decisions tree methods

§         Model validation techniques

 

WHY DO I BELIEVE THE BOOK IS NEEDED?

During the last decade, there has been an explosion in the field of data warehousing and data mining for knowledge discovery.  The challenge of understanding data has led to the development of the new data mining tool.  Data mining books that are currently available mainly address data mining principles but provide no instructions and explanations to carry out a data-mining project.  Also, many existing data analysts are interested in expanding their expertise in the field of data-mining and are looking for “how-to” books on data-mining without using expensive software like the SAS ‘Enterprise miner”. Business school instructors teaching in MBA programs are currently incorporating data mining into their curriculum and are looking for “how-to” books on data mining using the available software. Therefore, this book on data mining, using SAS macro applications, easily fills the gap and complements the existing data mining book market.

KEY FEATURES OF THE BOOK

No SAS programming experience required:  This is an essential “how-to-guide”, especially suitable for data analysts to practice data mining techniques for knowledge discovery. Thirteen user-friendly SAS macros to perform data mining are described in the book.  Instructions are given in the book in regards to downloading the macro-call file and running the macro from the book’s website.  No experience in modifying SAS macros or programming with SAS is needed to run these macros.

Complete analysis in less than 10 min:  Complete predictive modeling, including data exploration, model fitting, assumption checks, validation, and scoring new data, can be performed on SAS datasets in less than 10 minuets.

Expensive SAS enterprise minor not required: The user-friendly macros work with the standard SAS modules: BASE, STAT, GRAPH and IML. No additional SAS modules or the SAS enterprise miner is required.

No experience in SAS ODS required: Options are available in the SAS macros included in the book to save data mining output and graphics in RTF, HTML, and PDF format using SAS new ODS features.

More than 100 figures included in this book: These data mining techniques stress the use of visualization to thoroughly study the structure of data and to check the validity of statistical models fitted to data.  This allows the reader to visualize the trends and patterns present in their database.

TEXTBOOK OR A SUPPLEMENTARY LAB GUIDE:

This book is suitable for adoption as a textbook for a statistical methods course in data mining and data analysis.  This book provides instructions and tools for performing complete exploratory statistical method, regression analysis, multivariate methods, and classification analysis quickly.  Thus, this book is ideal for graduate level statistical methods courses that use SAS software.

Some examples of potential courses:

·        Advanced Business Statistics

·        Research methods

·        Advanced Data analysis

POTENIAL AUDIENCE

 

ADDITIONAL RESOURCES:

            Book’s website: A website has been setup at http://www.ag.unr.edu/gf/dm. Users can find information in regards to downloading the sample data files used in the book and the necessary SAS macro-call files. Users are also encouraged to visit this page for information on any errors in the book, SAS macro updates, and links for additional resources. 

            Companion CD-ROM:  For experienced SAS programmers, there is a companion CD-ROM available for purchase that contains sample datasets, macro call-files, SAS outputs generated by these SAS macros in PDF, HTML, and RTF formats, and the ACTUAL SAS MACRO SOURCE CODES files.  This allows programmers to modify the SAS code to suit their needs, and to use it on different platforms. Active Internet connection is also not required to run the SAS macros if you have the companion CD-ROM.