PREFACE
The objective of this
book is to introduce data mining concepts, describe methods in data mining from sampling to decision trees, demonstrate the features of user-friendly data mining SAS
tools, and above all, allow the book users to download data mining SAS macro
call files and help them perform complete data mining. The user-friendly SAS macro approach
integrates the statistical and graphical analysis tools available in SAS systems
and provides complete data mining solutions without writing SAS program codes
or using the point-and-click approach.
Step-by-step instructions for using SAS macros and interpreting the
results are emphasized in each chapter. Thus, by following the step-by-step
instructions and downloading the user-friendly SAS macros described in the
book, data analysts can perform complete data mining
analysis quickly and effectively.
SAS
Institute, the industry leader in analytical and decision
support solutions, offers a comprehensive data mining solution that allows
you to explore large quantities of data and discover relationships and patterns
that lead to intelligent decision-making.
Enterprise Miner, SAS Institute's data mining software, offers an
integrated environment for businesses that need to conduct comprehensive data
mining. However, the annual licensing
fee for using the ‘Enterprise Miner’ is extremely high. Thus, small businesses, non-profit
institutions, and academic universities are often unable to use this powerful
analytical tool for data mining. Also,
including complete SAS codes in the book for performing comprehensive data
mining solutions are not very effective because a majority of business and
statistical analysts are not experienced SAS programmers. Quick results from data mining are not
feasible since many hours of code modification and debugging program errors are
required if the analysts are required to work with SAS program codes. An
alternative to the point-and-click menu interface modules and the highly priced
SAS enterprise minor is the user-friendly SAS macro applications for performing
several data mining tasks, which are included in this book. This macro approach integrates statistical
and graphical tools available in SAS systems and provides user-friendly data
analysis tools, which allow the data analysts to complete data mining tasks
quickly, without writing SAS programs, by running the SAS macros in the
background.
COVERAGE:
The following types of analyses can be performed using the user-friendly SAS
macros.
·
Converting PC databases to SAS data,
·
Sampling techniques to create training and validation samples,
·
Exploratory graphical techniques:
o
Univariate analysis of continuous response
o
Frequency data analysis for categorical data
·
Unsupervised learning:
o
Principal component
o
Factor, and cluster analysis
o
K-mean cluster analysis
o
Bi-plot display
·
Supervised learning: Prediction
o
Multiple regression models
§
Partial and VIF plots, plots for checking data and model problems
§
Lift charts
§
Scoring
§
Model validation techniques
o
Logistic Regression
§
Partial delta logit plots, ROC curves false positive/negative
plots
§
Lift charts
§
Model validation techniques
Supervised learning:
Classification
o
Discriminant Analysis
§
Canonical Discriminant analysis – biplots
§
Parametric Discriminant analysis
§
Non-parametric Discriminant analysis
§
Model validation techniques
o
CHAID – decisions tree methods
§
Model validation techniques
WHY
DO I BELIEVE THE BOOK IS NEEDED?
During the last decade, there has been an explosion in the
field of data warehousing and data mining for knowledge discovery. The challenge of understanding data has led
to the development of the new data mining tool. Data mining books that are currently available mainly address
data mining principles but provide no instructions and explanations to carry
out a data-mining project. Also, many
existing data analysts are interested in expanding their expertise in the field
of data-mining and are looking for “how-to” books on data-mining without using
expensive software like the SAS ‘Enterprise miner”. Business
school instructors teaching in MBA programs are currently incorporating data
mining into their curriculum and are looking for “how-to” books on data mining
using the available software. Therefore, this book on data mining, using SAS
macro applications, easily fills the gap and complements the existing data
mining book market.
KEY FEATURES OF
THE BOOK
No SAS programming experience required: This is an essential “how-to-guide”, especially suitable
for data analysts to practice data mining techniques for knowledge discovery.
Thirteen user-friendly SAS macros to perform data mining are described in the
book. Instructions are given in the
book in regards to downloading the macro-call file and running the macro from
the book’s website. No experience in
modifying SAS macros or programming with SAS is needed to run these macros.
Complete analysis in less than 10 min:
Complete predictive modeling, including data exploration, model fitting,
assumption checks, validation, and scoring new data, can be performed on SAS
datasets in less than 10 minuets.
Expensive SAS enterprise minor not required: The user-friendly macros work
with the standard SAS modules: BASE, STAT, GRAPH and IML. No additional SAS
modules or the SAS enterprise miner is required.
No experience in SAS ODS required: Options are available in the SAS
macros included in the book to save data mining output and graphics in RTF,
HTML, and PDF format using SAS new ODS features.
More than 100 figures included in this book: These
data mining techniques stress the use of visualization to thoroughly study the
structure of data and to check the validity of statistical models fitted to
data. This allows the reader to
visualize the trends and patterns present in their database.
TEXTBOOK OR A SUPPLEMENTARY LAB GUIDE:
This book is suitable for adoption
as a textbook for a statistical methods course in data mining and data
analysis. This book provides
instructions and tools for performing complete exploratory statistical
method, regression analysis, multivariate methods, and classification analysis quickly. Thus, this book is ideal for graduate level
statistical methods courses that use SAS software.
Some examples of potential courses:
·
Advanced
Business Statistics
ADDITIONAL RESOURCES:
Book’s website: A
website has been setup at http://www.ag.unr.edu/gf/dm.
Users can find information in regards to downloading the sample
data files used in the book and the necessary SAS macro-call files. Users are
also encouraged to visit this page for information on any errors in the book,
SAS macro updates, and links for additional resources.
Companion CD-ROM: For experienced SAS programmers,
there is a companion CD-ROM available for purchase that contains sample
datasets, macro call-files, SAS outputs generated by these SAS macros in PDF,
HTML, and RTF formats, and the ACTUAL SAS MACRO SOURCE CODES files. This allows programmers to modify the SAS
code to suit their needs, and to use it on different platforms. Active Internet
connection is also not required to run the SAS macros if you have the companion
CD-ROM.