000 06939cam a2200349 a 4500
003 OSt
005 20241101152512.0
008 100810s2011 enka b 001 0 eng
020 _a9780521875806
_q(hbk.)
020 _a0521875803
_q(hbk.)
020 _a9780521699099
_q(pbk.)
020 _a0521699096
_q(pbk.)
040 _aKUSN
_beng
_cKUSN
_erda
060 1 0 _aWA 950 M253 2011
100 1 _aMalley, James D
_95416
245 1 0 _aStatistical learning for biomedical data /
_cJames D. Malley, Karen G. Malley, Sinisa Pajevic
260 _aCambridge :
_bCambridge University Press,
_c2011
300 _axii, 285 p. :
_billustrations;
_c26cm.
490 1 _aPractical guides to biostatistics and epidemiology
504 _aIncludes bibliographical references and index
505 0 _aCONTENTS Part I. Introduction Part II. A machine toolkit Part III. Analysis fundamentals Part IV. Machine strategies
505 0 0 _gPart I.
_tIntroduction --
_g1.
_tPrologue --
_g1.1.
_tMachines that learn -- some recent history --
_g1.2.
_tTwenty canonical questions --
_g1.3.
_tOutline of the book --
_g1.4.
_tA comment about example datasets --
_g1.5.
_tSoftware --
_g2.
_tThe landscape of learning machines --
_g2.1.
_tIntroduction --
_g2.2.
_tTypes of data for learning machines --
_g2.3.
_tWill that be supervised or unsupervised? --
_g2.4.
_tAn unsupervised example --
_g2.5.
_tMore lack of supervision -- where are the parents? --
_g2.6.
_tEngines, complex and primitive --
_g2.7.
_tModel richness means what, exactly? --
_g2.8.
_tMembership or probability of membership? --
_g2.9.
_tA taxonomy of machines? --
_g2.10.
_tA note of caution -- one of many --
_g2.11.
_tHighlights from the theory --
_g3.
_tA mangle of machines --
_g3.1.
_tIntroduction --
_g3.2.
_tLinear regression --
_g3.3.
_tLogistic regression --
_g3.4.
_tLinear discriminant --
_g3.5.
_tt
_Bayes classifiers -- regular and naïve --g
_3.6.t
_Logic regression --g
_3.7.t
_k-Nearest neighbors --g
_3.8.t
_Support vector machines --g
_3.9.t
_Neural networks --g
_3.10.t
_Boosting --g
_3.11.t
_Evolutionary and genetic algorithms --g
_4.t
_Three examples and several machines --g
_4.1.t
_Introduction --g
_4.2.t
_Simulated cholesterol data --g
_4.3.t
_Lupus data --g
_4.4.t
_Stroke data --g
_4.5.t
_Biomedical means unbalanced --g
_4.6.t
_Measures of machine performance --g
_4.7.t
_Linear analysis of cholesterol data --g
_4.8.t
_Nonlinear analysis of cholesterol data --g
_4.9.t
_Analysis of the lupus data --g
_4.10.t
_Analysis of the stroke data --g
_4.11.t
_Further analysis of the lupus and stroke data --g
_Part II.t
_A machine toolkit --g
_5.t
_Logistic regression --g
_5.1.t
_Introduction --g
_5.2.t
_Inside and around the model --g
_5.3.g
_5.4.t
_Using logistic regression as a decision rule --g
_5.5.t
_Logistic regression applied to the cholesterol data --g
_5.6.t
_A cautionary note --g
_5.7.t
_Another cautionary note --g
_5.8.t
_Probability estimates and decision rules --g
_5.9.t
_Evaluating the goodness-of-fit of a logistic regression model --g
_5.10.t
_Calibrating a logistic regression --g
_5.11.t
_Beyond calibration --g
_5.12.t
_Logistic regression and reference models --g
_6.t
_A single decision tree --g
_6.1.t
_Introduction --g
_6.2.t
_Dropping down trees --g
_6.3.t
_Growing a tree --g
_6.4.t
_Selecting features, making splits --g
_6.5.t
_Good split, bad split --g
_6.6.t
_Finding good features for making splits --g
_6.7.t
_Misreading trees --g
_6.8.t
_Stopping and pruning rules --g
_6.9.t
_Using functions of the features --g
_6.10.t
_Unstable trees? --g
_6.11.t
_Variable importance -- growing on trees? --g
_6.12.t
_Permuting for importance --g
_6.13.t
_The continuing mystery of trees --g
_7.t
_Random Forests -- trees everywhere --g
_7.1.t
_Random Forests in less than five minutes --g
_7.2.t
_Random treks through the data --g
_7.3.t
_Random treks through the features --g
_7.4.t
_Walking through the forest --g
_7.5.t
_Weighted and unweighted voting --g
_7.6.t
_Finding subsets in the data using proximities --g
_7.7.t
_Applying Random Forests to the Stroke data --g
_7.8.t
_Random Forests in the universe of machines --g
_Part III.t
_Analysis fundamentals --g
_8.t
_Merely two variables --g
_8.1.t
_Introduction --g
_8.2.t
_Understanding correlations --g
_8.3.t
_Hazards of correlations --g
_8.4.t
_Correlations big and small --g
_9.t
_More than two variables --g
_9.1.t
_Introduction --g
_9.2.t
_Tiny problems, large consequences --g
_9.3.t
_Mathematics to the rescue? --g
_9.4.t
_Good models need not be unique --g
_9.5.t
_Contexts and coefficients --g
_9.6.t
_Interpreting and testing coefficients in models --g
_9.7.t
_Merging models, pooling lists, ranking features --g
_10.t
_Resampling methods --g
_10.1.t
_Introduction --g
_10.2.t
_The bootstrap --g
_10.3.t
_When the bootstrap works --g
_10.4.t
_When the bootstrap doesn't work --g
_10.5.t
_Resampling from a single group in different ways --g
_10.6.t
_Resampling from groups with unequal sizes --g
_10.7.t
_Resampling from small datasets --g
_10.8.t
_Permutation methods --g
_10.9.t
_Still more on permutation methods --g
_11.t
_Error analysis and model validation --g
_11.1.t
_Introduction --g
_11.2.t
_Errors? What errors? --g
_11.3.t
_Unbalanced data, unbalanced errors --g
_11.4.t
_Error analysis for a single machine --g
_11.5.t
_Cross-validation error estimation --g
_11.6.t
_Cross-validation or cross-training? --g
_11.7.t
_The leave-one-out method --g
_11.8.t
_The out-of-bag method --g
_11.9.t
_Intervals for error estimates for a single machine --g
_11.10.t
_Tossing random coins into the abyss --g
_11.11.t
_Error estimates for unbalanced data --g
_11.12.t
_Confidence intervals for comparing error values --g
_11.13.t
_Other measures of machine accuracy --g
_11.14.t
_Benchmarking and winning the lottery --g
_11.15.t
_Error analysis for predicting continuous outcomes --g
_Part IV.t
_Machine strategies --g
_12.t
_Ensemble methods -- let's take a vote --g
_12.1.t
_Pools of machines --g
_12.2.t
_Weak correlation with outcome can be good enough --g
_12.3.t
_Model averaging --g
_13.t
_Summary and conclusions --g
_13.1.t
_Where have we been? --g
_13.2.t
_So many machines --g
_13.3.t
_Binary decision or probability estimate? --g
_13.4.t
_Survival machines? Risk machines? --g
_13.5.t
_And where are we going?
520 _a"This book is for anyone who has biomedical data and needs to identify variables that predict an outcome, for two-group outcomes such as tumor/not-tumor, survival/death, or response from treatment. Statistical learning machines are ideally suited to these types of prediction problems, especially if the variables being studied may not meet the assumptions of traditional techniques. Learning machines come from the world of probability and computer science but are not yet widely used in biomedical research."--Publisher's website
650 0 _aMedical statistics
_xData processing
_95417
650 0 _aBiometry
_xData processing
_95418
650 1 2 _aData Interpretation, Statistical
_95419
650 2 2 _aModels, Statistical
_95420
700 1 _aMalley, Karen G
_95421
700 1 _aPajevic, Sinisa
_95422
830 0 _aPractical guides to biostatistics and epidemiology
_95423
942 _2lcc
_cBK
999 _c3904
_d3904