Communiquez avec les autres et partagez vos connaissances professionnelles

Inscrivez-vous ou connectez-vous pour rejoindre votre communauté professionnelle.

Suivre

ما الفرق بين برنامج spss وبرنامج sas?

user-image
Question ajoutée par طارق محمود محمد عبد اللطيف دراز , خبير اقتصاد وإحصاء زراعي , المنظمة العربية للتنمية الزراعية
Date de publication: 2013/10/06
ahmed abd el ghany
par ahmed abd el ghany , Statistics Research and web Administrator , Fund of Drug Control and Treatment

الفرق بين البرنامجين يكون فى تطبيق بعض المعالات الاحصائية وخاصة  الانحدار اللوجيستى والاحتمالات .

ففى حالة تبطيق نفس المعدلات على بيانات نلاحظ تغير فى النتائج ويرجع ذلك الى 

حيث يستخدم SAS قيمة أصغر (صفر ) بشكل افتراضي لتقدير احتمال ، في حين SPSS استخدام أعلى قيمة مصنفة ( واحدة) كإعداد افتراضي . وهذا الوضع الافتراضي يكون لها تأثير خطير على علامات المعلمات المقدرة، وبالتالي نسبة الأرجحية فضلا عن فترات الثقة للمعلمات نموذج .

nzar bataineh
par nzar bataineh , audit manager , Al kindy hospital

Differences Between Statistical Software Packages

( SAS, SPSS, and MINITAB )

As Applied to Binary Response Variable

 

 

 

1. INTRODUCTION AND REVIEW OF LITRATURES

 

            Several writers have recently reviewed statistical software for microcomputers and offered very useful comments to both users and vendors. Some of these reviews are comprehensive and general (Searle, S. R. (1989). Some others analyze specific program features and identify problem areas. For example, Gerard E. Dallal (1992) published a very concise paper through the American Statistician titled “The computer analysis of factorial experiments with nested factors”. Dallal used two different computing packages SAS, and SPSS to analyze unbalanced data from fixed models with nested factors. Dallal found differences between SAS and SPSS results beside some error of calculations of sums of squares in SPSS output. Followed by Dallal, several commentaries were sent to the editors of the American Statistician trying to explain the discrepancies between SAS and SPSS results. This controversy on Dallal’s paper was ended by Searle, S. R. (1994) who presented a theoretical clarification of what could be the basic cause of differences and error of results. Searle ended his paper not by a conclusion but by a prayer to all software houses asking them to provide more clearer, more detailed, and more specific descriptions of their calculations.

Okunade, A., and others (1993) compared the output of summary statistics of regression analysis in commonly statistical and econometrical packages such as SAS, SPSS, SHAZM, TSP, and BMDP.

Oster, R. A. (1998) reviewed five statistical software packages (EPI INFO, EPICURE, EPILOG PLUS, STATA, and TRUE EPISTAT) according to criteria that are of most interest to epidemiologists, biostatisticians, and others involved in clinical research.

McCullough B. D. (1998) proposed testing the accuracy of statistical software packages using Wilkinson’s Statistics Quiz in three areas: linear and nonlinear estimation, random number generation, and statistical distributions. Then, McCullough B. D. (1999) applied his methodology to the statistical packages SAS, SPSS, and S-Plus. McCullough concluded that the reliability of statistical software cannot be taken for granted because he found some weak points in all random number generators, the S-plus correlation procedures, and the one-way ANOVA and nonlinear least squares routines of SAS and SPSS.

Zhou, X., and others (1999) reviewed five software packages that can fit a generalized linear mixed model for data with more than a two-level structure and a multiple number of independent variables. These five packages are MLn, MLwiN, SAS Proc Mixed, HLM, and VARCL. The comparison between these packages were based upon some features such as data input and management, statistical model capabilities, output, user friendliness, and documentation.

Bergmann, R., and others (2000) Compared11 statistical packages on a real dataset. These packages are SigmaStat2.03, SYSTAT9, JMP3.2.5, S-Plus2000, STATISTICA5.5, UNISTAT4.53b, SPSS8, Arcus Quickstat1.2, Stata6, SAS6.12, and StatXact4. They found that different packages could give very different outcomes for the Wilcoxon-Mann-Whitney test.

The purpose of this paper is to compare three statistical software packages when applied to a binary dependent variable. These packages are SAS (Statistical Analysis System), SPSS ( Statistical Package for the Social Sciences or Superior Performing Statistical Software as the SPSS company claims now), and MINITAB. The three packages are chosen because they are well known and most frequently used by statisticians or by others for commercial applications or scientific research. Real dataset in the field of medical treatments is used to test if there is a significant difference between two alternative drugs, test and reference drugs, on plasma levels of ciprofloxacin at different times. The binary response variable is “Drug”, which is zero for test drug, and one for reference drug, and the times0.5,1.0,1.5,2.0,2.5,3.0,3.5,4.0,6.0, and8.0 are the predictor variables. 

2. STATISTICAL TREATMENT OF BINARY RESPONSE VARIABLE

 

In many areas of social sciences research, one encounter dependent variables that assume one of two possible values such as presence or absence of a particular disease; a patient may respond or not respond to a treatment during a period of time. The binary response analysis models the relationship between a binary responsevariable and one or more explanatory variables. For a binary response variable Y, it assumes:

                                                          g(p) = bx                                                      … (1)

 Where p is Prob(Y=y1) for y1 as one of two ordered levels of Y,

 b  is the parameter vector,

 x is the vector of explanatory variables,

 and g is a function of which p is assumed to be linearly related to the explanatory variables.

 The binary response model shares a common feature with a more general class of linear models that a function g = g(m) of themean of the dependent variable is assumed to be linearly related to the explanatory variables. The function g(m), oftenreferred as the link function, provides the link between the random or stochastic component and the systematic ordeterministic component of the response variable.

To assess the relationship between one or more predictor variables and a categorical response variable the following techniques are often employed:

(i)                  Logistic regression

(ii)                Probit regression

(iii)               Complementary log-log

2.1 Logistic regression

Logistic regression examines the relationship between one or more predictor  variables and a binary response.  The logistic equation can be used to examine how the probability of an event  changes as the predictor variables change. Both logistic regression and least squares regression investigate the relationship between a response variable and one or more predictors. A practical difference between them is that logistic regression techniques are used with categorical response variables, and linear regression techniques are used with continuous response variables.  Both logistic and least squares regression methods estimate parameters in the model so that the fit of the model is optimized. Least squares minimize the sum of squared errors to obtain parameter estimates, whereas logistic regression obtains maximum likelihood estimates of the parameters using an iterative-reweighted least squares algorithm  (McCullagh, P., and Nelder, J. A.,1992).

For a binary response variable Y, the logistic regression has the form:

                                           Logit(p) = loge [ p/(1-p) ] = bx                                   … (2)

or equivalently,

                                         p = [ exp(bx) ] / [1 + exp(bx) ]                                  … (3)

The logistic regression models the logit transformation of the ith observation’s event probability; pi, as a linear function of theexplanatory variables in the vector xi . The logistic regression model uses the logit as the link function.

2.2 Probit regression

Probit regression can be employed as an alternative to the logistic regression in binary response models. For a binaryresponse variable Y, the probit regression model has the form:

                                                           Φ-1(p) = bx                                                 … (4)

or equivalently,

                                                                 p = Φ (bx)                                             … (5)

Where Φ-1 is the inverse of the cumulative standard normal distribution function, often referred as probit or normit, and Φ isthe cumulative standard normal distribution function. The probit regression model can be viewed also as a special case of the generalized linear model whose link function is probit.

2.3 Complementary log-log

The complementary log-log transformation is the inverse of the cumulative distribution function F-1(p). Like the logit and probit model, the complementary log-log transformation ensures that predicted probabilities lie in the interval [0,1].

If probability of success is expressed as a function unknown parameters i.e.,

                                               pi =1 – exp{-exp( Sk bkxik )}                                   … (6)

Then the model is linear in the inverse of the cumulative distribution function, which is the log of the negative log of the complement of pi, or log{-log(1-pi)}, where

                                                  log{-log(1-pi)}= Sk bkxik                                                           … (7)

In general, there are three link functions that can be used to fit a broad class of binary response  models. These functions are : (i) the logit, which is the inverse of the cumulative logistic distribution function (logit), (ii) the normit (also called probit), the inverse of the cumulative standard normal distribution function (normit), and (iii) the gompit  (also called complementary log-log), the inverse of the Gompertz distribution function (gompit). The link functions and their corresponding distributions are summarized in Table-1:

TABLE-1

The Link Functions

  Name

Link Function

Distribution

Mean

Variance

  Logit   

     g(pi) = loge { pi/(1-pi) }

Logistic

  0   

p2 /3

  Normit (probit)   

     g(pi) = Φ-1 (pi)

Normal

  0   

1

 Gompit (Complementary log-log)

     g(pi) = loge {-loge (1-pi) }

Gompertz

-g

(Euler constant)

p2 /6

 

We can choose a link function that results in a good fit to our data. Goodness-of-fit statistics can be used to compare fits using different link functions. An advantage of the logit link function is that it provides an estimate of the odds ratios.

 

3. STATISTICAL APPLICATION WITH REAL DATA

 

Real data was obtained from “The Pharmacy Services Unit”, Faculty of Pharmacy, University of Alexandria. The dataset consists of two drugs (test and reference), each contains ciprofloxacin substance which is known to be used for nausea, vomiting, headache, skin rash, etc. Test drug is the Ciprone tablet which contains500 mg ciprofloxacin per tablet and produced by the Medical union pharmaceuticals Co., Abu Sultan-Ismailia, Egypt. Reference drug is the Ciprobay tablet, which contains500 mg ciprofloxacin per tablet and produced by Bayer AG., Germany. Data represents plasma blood levels of ciprofloxacin (mg/ml) of28 healthy human male volunteers, their ages ranged from20 to40 years and their weights ranged from61 to85 kg. Volunteers were divided into two equal groups. The first group of volunteers was administrated a single dose of500 mg ciprofloxacin as one Ciprone tablet (test product), while the second group was administrated the same dose of ciprofloxacin as one Ciprobay tablet (reference product). After one week wash-out period, the first group of volunteers was administrated one tablet of Ciprobay (reference product), while the second group was administrated one tablet of Ciprone (test product). Venous blood samples (5 ml) were taken from each volunteer at times0.5,1.0,1.5,2.0,2.5,3.0,3.5,4.0,6.0, and8.0 hours after each dose.  This data can be represented in a binary form model where the test drug (Ciprone) will be given a zero value, and the reference drug (Ciprobay) will be given a value of one as follows:

                0     if    test drug (Ciprone)

Drug   =                                                                                                                   … (8)         

                    1    if    reference drug (Ciprobay)

Our goal here is to test if there is a significant difference between test and reference drugs on plasma levels of ciprofloxacin at different times. The binary response variable is “Drug”, and the times0.5,1.0,1.5,2.0,2.5,3.0,3.5,4.0,6.0, and8.0 are the predictors. The underlying dataset was analyzed using an IBM-Compatible PC computer with a700 MHZ AMD-Processor. The three statistical software packages are the SAS system for windows version8.0, the SPSS for windows version10, and MINITAB Release13.2.

3.1 SAS OUTPUT

SAS has a variety of options that can be used to analyze data with binary response (dichotomous) variable. SAS uses the PROC statement to execute the required task. The response variable Drug is0 or1 binary (This is not a limitation. The values can be either numeric or character aslong as they are dichotomous), and the times0.5, 1.0,1.5,2.0,2.5,3.0,3.5,4.0,6.0, and8.0 are the regressors of interest, which will be written as T05, T10, T15, T20, T25, T30, T35, T40, T60, and T80 in the INPUT statement because SAS variables can not be written with special character in the middle.

3.1.1 SAS Logistic regression

To fit a logistic regression, we can use the commands:

PROC LOGISTIC;

MODEL DRUG = T05 T10 T15 T20 T25 T30 T35 T40 T60 T80 / LINK = Link function; Run;

This option of the link function can be either logit; probit; normit; or cloglog (complementary log log function).SAS PROC LOGISTIC models the probability of Drug =0 by default. In other words, SAS chooses the smaller value toestimate its probability. One way to change the default setting in order to model the probability of Drug =1 in SAS is to specifythe DESCENDING option on the PROC LOGISTIC statement. That is, to use PROC LOGISTIC DESCENDING statement. With the logit link function option we will get the following SAS output :

                         Testing Global Null Hypothesis: BETA=0

                              Intercept

                Intercept        and

  Criterion       Only       Covariates    Chi-Square for Covariates

  AIC             71.235       83.246         .        

  SC              73.147      104.278                  .

  -2LOG L        69.235       61.246       7.989 with10 DF (p=0.6299)

  Score              .             .          7.414 with10 DF (p=0.6858(

                   Analysis of Maximum Likelihood Estimates

 

              Parameter Standard    Wald       Pr >    Standardized     Odds

  Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate      Ratio

  INTERCPT1    1.6756  1.5371    1.1883    0.2757                  .           .

  T05     1    -0.8220  0.5594    2.1591    0.1417    -0.317686   0.440

  T10     1    -0.3446  0.4897    0.4951    0.4817    -0.154937   0.709

  T15     1    -0.1074  0.7071    0.0231    0.8793    -0.035235   0.898

  T20     1    0.4869  0.8078    0.3633    0.5467    0.179043   1.627

  T25     1    -0.3252  0.8270    0.1546    0.6941    -0.116906   0.722

  T30     1    -1.2505  1.0881    1.3208    0.2504    -0.336985   0.286

  T35     1    1.8015  1.3587    1.7581    0.1849    0.397790   6.059

  T40     1    -1.5482  2.0143    0.5908    0.4421    -0.314759   0.213

  T60     1    2.2656  2.6673    0.7215    0.3957    0.393059   9.637

  T80     1    -1.8445  2.1989    0.7037    0.4016    -0.309659   0.158

 

        Association of Predicted Probabilities and Observed Responses

                  Concordant =70.4%          Somers' D =0.407

                  Discordant =29.6%          Gamma     =0.407

                  Tied       = 0.0%          Tau-a     =0.207

)                       624pairs)            c         =0.704

With a normit link function option we will get the following SAS output :

                   Testing Global Null Hypothesis: BETA=0

                              Intercept

                Intercept        and

  Criterion       Only       Covariates    Chi-Square for Covariates

  AIC             71.235       83.233                  .

  SC              73.147      104.266         .        

  –2 LOG L        69.235       61.233        8.001 with10 DF (p=0.6287)

  Score              .             .          7.414 with10 DF (p=0.6858)

 

                  Analysis of Maximum Likelihood Estimates

                Parameter   Standard      Wald         Pr >      Standardized

Variable   DF    Estimate     Error    Chi-Square   Chi-Square     Estimate

INTERCPT  1      0.9692    0.9284      1.0899      0.2965              .

T05       1      -0.5121    0.3314      2.3886      0.1222      -0.358982

T10       1      -0.2025    0.2945      0.4728      0.4917      -0.165154

T15       1      -0.0534    0.4264      0.0157      0.9004      -0.031766

T20       1      0.3011    0.4922      0.3741      0.5408      0.200794

T25       1      -0.1921    0.5015      0.1466      0.7018      -0.125226

T30       1      -0.7860    0.6491      1.4663      0.2259      -0.384215

T35       1      1.1153    0.8084      1.9036      0.1677      0.446679

T40       1      -0.9203    1.1923      0.5958      0.4402      -0.339380

T60       1      1.3500    1.6172      0.6969      0.4038      0.424817

T80       1      -1.0870    1.3372      0.6608      0.4163      -0.331001

 

        Association of Predicted Probabilities and Observed Responses

                  Concordant =70.5%          Somers' D =0.412

                  Discordant =29.3%          Gamma     =0.413

                  Tied       = 0.2%          Tau-a     =0.210

624)                   pairs)            c         =0.706

Similar results to the logit option can be obtained if we use the default of PROC PROBIT statement :  PROC PROBIT;  CLASS Drug;

      MODEL DRUG = T05 T10 T15 T20 T25 T30 T35 T40 T60 T80 ; Run ;

But this procedure does not show the odds ratio in its default.

3.1.2 SAS Probit regression

PROC PROBIT statement can be used to fit a logistic regression by specifying LOGISTIC as the cumulative distribution typein the MODEL statement. To fit a logistic regression model, we can use:

PROC PROBIT;  CLASS Drug;

MODEL DRUG = T05 T10 T15 T20 T25 T30 T35 T40 T60 T80 / d = LOGISTIC ;

Run;

                             Probit Procedure

 

         Variable  DF   Estimate  Std Err ChiSquare  Pr>Chi Label/Value

 

         INTERCPT  11..537092 1.188317 0.2757 Intercept

         T05       1 -0..559442 2.159073 0.1417

         T10       1 -0..489681 0.495117 0.4817

         T15       1 -0..707068  0.02307 0.8793

         T20       10..807787 0.363313 0.5467

         T25       1 -0..827013 0.154631 0.6941

         T30       1 -1..088066 1.320776 0.2505

         T35       1  1.8015141.358686 1.758075 0.1849

         T40       1 -1.5482052 2.01432 0.590745 0.4421

         T60       12..667343 0.721467 0.3957

         T80       1 -1..198877 0.703652 0.4016

 

Logistic regression can also be modeled as a class of Generalized Linear Models by the GENMOD procedure, where the response probability distribution function is binomial and the link function is logit. The PROCGENMOD for a logistic regression, is:         PROCGENMOD;

MODEL DRUG = T05 T10 T15 T20 T25 T30 T35 T40 T60 T80 / dist=binomial link=logit ; Run; .

Another type of SAS PROC statement is the SAS CATMOD (CATegorical data MODeling) procedure, which fits logistic regression as follows:

PROC CATMOD;

DIRECT MODEL T05 T10 T15 T20 T25 T30 T35 T40 T60 T80 ;

RESPONSE Logits;

MODEL DRUG = T05 T10 T15 T20 T25 T30 T35 T40 T60 T80 ;

Run;

where the regressors are continuous quantitative variables and must be specified  in the DIRECT statement. These procedures will give the same results as in the PROC LOGISTIC with no odds ratios in the output.

3.1.3 Complementary log-log

If we use the PROC LOGISTIC; with the option of link function = cloglog (Complementary log-log), we will get the following portion of SAS output :

                 Analysis of Maximum Likelihood Estimates

 

                Parameter  Standard     Wald        Pr >     Standardized

  Variable  DF   Estimate    Error   Chi-Square  Chi-Square    Estimate

 

  INTERCPT 1     0.5370   1.0284     0.2727     0.6015       .

  T05      1     -0.5959   0.4189     2.0235     0.1549     -0.325696

  T10      1     -0.1646   0.3349     0.2417     0.6230     -0.104700

  T15      1     -0.1784   0.4831     0.1364     0.7119     -0.082784

  T20      1     0.4836   0.5566     0.7551     0.3849     0.251503

  T25      1     -0.1630   0.5680     0.0823     0.7742     -0.082846

  T30      1     -0.9015   0.7196     1.5698     0.2102     -0.343593

  T35      1     1.2004   0.8937     1.8040     0.1792     0.374853

  T40      1     -1.0825   1.4928     0.5259     0.4684     -0.311252

  T60      1     1.4476   1.8657     0.6020     0.4378     0.355162

  T80      1     -0.9800   1.5312     0.4096     0.5222     -0.232675

 

 

3.2 SPSS OUTPUT

Unlike SAS procedure, the SPSS procedure LOGISTIC REGRESSION models the probability of Drug =1 or  higher sorted value by default. In other words, SPSS chooses the higher value toestimate its probability, while on the contrary SAS uses the smaller value.

3.2.1 SPSS Logistic regression

To fit SPSS logistic regression, we can use either the menu of BINARY LOGISTIC or ORDINAL REGRESSION.

Binary Logisticcan be obtained from the Analyze menu, and selecting Regression option and from Regression menu select Binary Logistic. In the Binary Logistic dialog box select the variable Drug as a dependent variable and the times T0.5, T1.0, T1.5, T2.0, T2.5, T3.0, T3.5, T4.0, T6.0, and, T8.0 as covariates which will give the following portion of SPSS output : 

 

PLUM - Ordinal Regression

Ordinal regression can be used to model the dependence of a polytomous ordinal (PLUM) response on a set of predictors, which can be factors or covariates. Ordinal regression can be obtained from the Analyze menu, then selecting Regression option and from Regression menu select Ordinal regression. In the Ordinal regression dialog box select the variable Drug as a dependent variable and the times T0.5, T1.0, T1.5, T2.0, T2.5, T3.0, T3.5, T4.0, T6.0, and, T8.0 as covariates, and choose Logit from the options to get the following SPSS output : 

 

 

3.2.2 SPSS Probit regression

To fit SPSS Probit regression, we can use the menu of ORDINAL REGRESSION as before with the selection of Probit from the options to get the following SPSS OUTPUT:

 

3.2.3 SPSS Complementary log-log

In a similar way, we can use the menu of ORDINAL REGRESSION as before with the selection of Complementary log-log from the options to get the following SPSS OUTPUT:

However, if we use the same menu of ORDINAL REGRESSION as before but with the selection option of Negative log-log we will get the following SPSS OUTPUT:

 

3.3 MINITAB OUTPUT

Minitab provides three link functions that can be used to fit binary response models. These functions are the logit, which is the default, the normit (probit), and the gompit (complementary log-log). These link functions can be obtained from the Stat menu, and by selecting the Binary Logistic Regression . In the Binary Logistic dialog box choose the variable Drug as the response variable and in the Model box select the times T0.5, T1.0, T1.5, T2.0, T2.5, T3.0, T3.5, T4.0, T6.0, and, T8.0 as the covariates. To specify the link function type, click in front of the required link function from the options box. This will give the following Minitab output :

3.3.1 Minitab Logistic regression

Selecting the option of logit link function, we will get the following portion of Minitab Binary Logistic Regression.

Logistic Regression Table

                                                   Odds       95% CI

Predictor       Coef    SE Coef        Z     P    Ratio    Lower    Upper

Constant      -1.676     1.537    -1.090.276

T.5          0.8220    0.5594    1.470.142    2.28    0.76    6.81

T1.0         0.3446    0.4897    0.700.482    1.41    0.54    3.69

T1.5         0.1074    0.7071    0.150.879    1.11    0.28    4.45

T2.0         -0.4869    0.8078    -0.600.547    0.61    0.13    2.99

T2.5         0.3252    0.8270    0.390.694    1.38    0.27    7.00

T3.0          1.250     1.088    1.150.250    3.49    0.41   29.46

T3.5          -1.802     1.359    -1.330.185    0.17    0.01    2.37

T4.0          1.548     2.014    0.770.442    4.70    0.09  243.77

T6.0          -2.266     2.667    -0.850.396    0.10    0.00   19.34

T8.0          1.845     2.199    0.840.402    6.32    0.08  470.73

Log-Likelihood = -30.623

Test that all slopes are zero: G =7.989, DF =10, P-Value =0.630

 

Goodness-of-Fit Tests

Method                Chi-Square    DF      P

Pearson                  49.795   39 0.115

Deviance                 61.246   39 0.013

Hosmer-Lemeshow           5.820    8 0.667

Measures of Association:

(Between the Response Variable and Predicted Probabilities)

 

Pairs           Number  Percent     Summary Measures

Concordant        438   70.2%     Somers' D              0.41

Discordant        184   29.5%     Goodman-Kruskal Gamma  0.41

Ties                2    0.3%     Kendall's Tau-a        0.21

Total             624  100.0%

 

3.3.2 Probit regression

Binary Logistic Regression with the normit link function gives the following part of Minitab output :

            Logistic Regression Table                                                

Predictor       Coef    SE Coef        Z     P

Constant     -0.9692    0.9284    -1.040.296

T.5          0.5121    0.3314    1.550.122

T1.0         0.2025    0.2945    0.690.492

T1.5         0.0534    0.4264    0.130.900

T2.0         -0.3011    0.4922    -0.610.541

T2.5         0.1921    0.5015    0.380.702

T3.0         0.7860    0.6491    1.210.226

T3.5         -1.1153    0.8084    -1.380.168

T4.0          0.920     1.192    0.770.440

T6.0          -1.350     1.617    -0.830.404

T8.0          1.087     1.337    0.810.416

 

3.3.3 Complementary log-log

Gompit link function with the Binary Logistic Regression gives the following portion of Minitab output:

            Logistic Regression Table

                                                   

Predictor       Coef    SE Coef        Z     P

Constant      -1.736     1.101    -1.580.115

T.5          0.5724    0.3516    1.630.104

T1.0         0.3230    0.3373    0.960.338

T1.5         -0.0893    0.4937    -0.180.856

T2.0         -0.1943    0.5687    -0.340.733

T2.5         0.2597    0.5657    0.460.646

T3.0         0.9555    0.7655    1.250.212

T3.5         -1.3859    0.9642    -1.440.151

T4.0          1.135     1.218    0.930.351

T6.0          -1.884     1.885    -1.000.318

T8.0          1.704     1.559    1.090.275

 

 

4. INTERPRETATION OF THE STATISTICAL FINDINGS

Using the three statistical software packages SAS, SPSS, and Minitab to estimate the three specified models, Logistic regression model, Probit regression model, and the Complementary log-log model gave the following results :

4.1 SAS RESULTS

SAS gives three different sets of results with three different link functions, logit, normit, and Complementary log-log.

4.1.1 Logit Link Function

The output of the logit function can be obtained by either PROC LOGISTIC as a default, or by the determination of logistic distribution option in the PROC PROBIT, PROC GENMOD, and PROC CATMOD. Response Information displays6 missing observations and the number of observations that fall into each of the two response categories are,26 for the Test drug, and24 for the Reference drug. Next, the –2 log-likelihood (–2 LOG L) from the maximum likelihood iterations is displayed along with the Chi-Square statistic. This statistic tests the null hypothesis that all the coefficients associated with predictors equal zero versus these coefficients not all being equal to zero. In the plasma blood levels data, c2 =7.989, with10 degrees of freedom and a p-value of0.6299,  indicating that there is no sufficient evidence that any one of the coefficients is different from zero, which means that there is no significant difference of plasma blood levels of ciprofloxacin between test and reference drug at the specified different times.

SAS output shows that the estimated logit link function :

Logit(p) = B0 + B1 T0.5 + B2 T1.0 + B3 T1.5 + B4 T2.0 + B5 T2.5 + B6 T3.0

                      + B7 T3.5+ B8 T4.0+ B9 T6.0+ B10 T8.0                              … (9)

is :

         Logit(p) =1.676 –0.822 T0.5 –0.345 T1.0 –0.107 T1.5 +0.487 T2.0

       ( p-value )               (0.142)         (0.482)          (0.879)           (0.547)

          –0.325 T2.5 –1.251 T3.0 +1.802 T3.5 –1.548 T4.0 +2.266 T6.0 –1.845 T8.0

            (0.694)          (0.250)          (0.185)          (0.442)            (0.396)         (0.402)

                                                                                                                              … (10)

where, p is the probability of the test drug = Prob( Drug =0 ).

From the analysis of maximum likelihood Table we can find the estimated coefficients (parameter estimates), standard error of the coefficients, Wald’s Chi-Square values, p-values, standardized estimate, and the odds ratio.  Testing the null hypothesis that each coefficient equal to zero, i. e., H0 = Bi =0 for i =1,2, ...,10. Results shows that the p-value for every coefficient is not less than a =5%, which means that none of the predictors is significant.

The estimated coefficients represent the change in the log odds for one unit increase in times. The odds ratio is the ratio of odds for one unit change in time. The odds ratio can be computed by exponentiating the log odds, i.e. EXP(log odds) or EXP(estimated coefficient), which is EXP(-0.822) =0.440 for T0.5, and equal to EXP(-0.3446) =0.709 for T1.0 and so on. 

Association of predicted probabilities and observed responses are given in the last Table of the output. The number of concordant, discordant, and tied pairs is calculated by pairing the observations with different response values. Here, we have26 observation of the Test drug and24 of the Reference drug, resulting in26 *24 =624 pairs with different response values. In this  data,70.4% of pairs are concordant and29.6% are discordant. Somers’ D, Goodman-Kruskal Gamma, Kendall ’ s Tau-a, and c-correlation are summarized in the same table. These measures most likely lie  between0 and1 where larger values indicate that the model has a better predictive ability. In this data, the measures are0.407,0.407,0.207, and0.704 respectively  which implies less than desirable predictive ability.

4.1.2 Normit Link Function

The normit link function is the inverse of the cumulative standard normal distribution function, and can be obtained by using the option normit in the PROC LOGISTIC statement. Response Information is the same as for the logit output. The Chi-square test statistic for testing the null hypothesis that all the coefficients associated with predictors equal zero is c2 =8.001, with a p-value of0.6287,  indicating that there is no sufficient evidence that any one of the coefficients is different from zero, which means that there is no significant difference of plasma blood levels of ciprofloxacin between test and reference drug at the specified different times.

The estimated normit link function is :

         Normit(p) =0.969 –0.512 T0.5 –0.203 T1.0 –0.053 T1.5 +0.301 T2.0

          ( p-value )               (0.122)         (0.491)          (0.900)           (0.541)

          –0.192 T2.5 –0.786 T3.0 +1.115 T3.5 –0.920 T4.0 +1.350 T6.0 –1.087 T8.0

            (0.702)           (0.226)           (0.168)          (0.440)           (0.404)         (0.416)

                                                                                                                              … (11) where, p is the probability of the test drug = Prob( Drug =0 ).

We have similar output from the table of the maximum likelihood estimates. The estimated coefficients, standard error of the coefficients, Wald’s Chi-Square values, p-values, standardized estimate, and there is no odds ratio.  We also obtained similar results when testing the null hypothesis that each coefficient equal to zero, i. e.,        H0 = Bi =0 for i =1,2, ...,10. The p-value for every coefficient is not less than a =5%, which means that all predictors are not significant.

Association of predicted probabilities and observed responses are given in the last Table of the output. The number of concordant, discordant, and tied pairs is624 pairs with different response values.70.5% of pairs are concordant and29.3% are discordant. Somers’ D, Goodman-Kruskal Gamma, Kendall ’ s Tau-a, and c-correlation are summarized in the same table of SAS output. These measures0.412,0.413,0.210, and0.706 respectively which means that we do not have a very strong predictive ability of this model.

4.1.3 The Complementary log-log Link Function

The complementary log-log (gompit/cloglog) link function is obtained by using the option “cloglo” in the PROC LOGISTIC statement. Response Information is the same as for the logit and normit output. The Chi-square test statistic for testing the null hypothesis that all the coefficients associated with predictors equal zero is c2 =7.721, with10 degrees of freedom and a p-value of0.6560,  indicating that there is no sufficient evidence that any one of the coefficients is different from zero, which means that the effect of test (Ciprone) and reference (Ciprobay) drug is the same on plasma blood levels of ciprofloxacin at the specified different times. The estimated complementary log-log “cloglog” link function is :

         “cloglog” (p) =0.5370 –0.596 T0.5 –0.165 T1.0 –0.174 T1.5 +0.484 T2.0

          ( p-value )                     (0.155)          (0.623)           (0.712)           (0.385)

          –0.163 T2.5 –0.902 T3.0 +1.200 T3.5 –1.083 T4.0 +1.448 T6.0 –0.980 T8.0

             (0.774)          (0.210)          (0.179)          (0.468)          (0.438)         (0.522)

                                                                                                                              … (12)

where, p is the probability of the test drug = Prob( Drug =0 ).

From the Table of the maximum likelihood estimates, we can find the estimated coefficients (parameter estimates), standard error of the coefficients, Wald’s Chi-Square values, p-values, and the standardized estimate.  Testing the null hypothesis that each coefficient equal to zero, i. e., H0 = Bi =0 for i =1,2, ...,10. Results are similar to the previous cases, where the p-value for every coefficient is greater than5%, which means that all predictors are not significant.

Association of predicted probabilities and observed responses reveals that he number of concordant, discordant, and tied pairs is624 pairs with different response values.71.0% of pairs are concordant and28.8% are discordant. Somers’ D, Goodman-Kruskal Gamma, Kendall ’ s Tau-a, and c-correlation are0.421,0.422,0.215, and0.711 respectively  which means that we do not have a very strong predictive ability of this model.

4.2 SPSS RESULTS

SPSS is similar to SAS, where SPSS gives three different sets of results with three different link functions, logit, normit, and Complementary log-log.

4.2.1 Logit Link Function

The output of the logit function can be obtained by either Binary Logistic Regression menu as a default, or by the determination of logistic distribution option in the Ordinal Regression menu. The main advantage of the Binary Logistic Regression command is that, we get the odds ratio beside the regular output. From the Binary Logistic Regression output, we can find the Case processing summary, which indicates that we have56 cases with6 missing cases. In the initial classification table there are26 for the Test drug, and24 for the Reference drug. The omnibus tests of the model coefficients shows that the Chi-square test statistic for testing the null hypothesis that all the coefficients associated with predictors equal zero is c2 =7.989, with10 degrees of freedom and a p-value of0.630,  which is the same result obtained by SAS. The classification table of SPSS output, shows that we have74% of correct classification.

From the variables in equation table we can find the estimated coefficients (B), standard error of the coefficients (SE), Wald’s Chi-Square values, Degrees of freedom (df), p-values (Sig), and the odds ratio {Exp(B)}. The estimated SPSS logit link function is :

         Logit(p) = -1.676 +0.822 T0.5 +0.345 T1.0 +0.107 T1.5 -0.487 T2.0

       ( p-value )               (0.142)           (0.482)           (0.879)           (0.547)

          +0.325 T2.5 +1.251 T3.0 -1.802 T3.5 +1.548 T4.0 -2.266 T6.0 +1.845 T8.0

             (0.694)          (0.250)          (0.185)           (0.442)        (0.396)          (0.402)

                                                                                                                              … (13)

The difference between Equation (10) of SAS and Equation (13) of SPSS output, is that, they have an opposite corresponding signs, that is because, SAS considers the probability p = Prob( Drug =0 ) which is the probability of the test drug, as its default, while SPSS  considers p = Prob( Drug =1 ) which is the probability of the reference drug, as its default. That is why the odds ratio of SPSS output is shown as the reciprocal of the odds ratio of SAS output. The computation of the odds ratio is EXP(log odds) or EXP(estimated coefficient), which is EXP(-0.822) =0.440 for T0.5 using SAS, while the odds ratio is EXP(0.822) =2.275 =1/{EXP(-0.822)} =1/0.440 for the same time T0.5 using SPSS. Also, the odds ratio is EXP(-0.345) =0.709 for T1.0 using SAS, while when using SPSS, the odds ratio is EXP(0.345) =1.411 =1/{EXP(-0.345)} =1/0.709 for the same time T1.0, and so on for the other odds ratio. 

Additional output results are provided by SPSS when we use the logit as a link function option. Goodness of fit information is given for Pearson and Deviance tests using the Chi-square test statistic, c2 =49.795, with39 degrees of freedom and a p-value of0.115 for the Peasron test, and c2 =61.248, with39 degrees of freedom and a p-value of0.013 for the Deviance test. Also, a95% confidence interval is provided for every parameter. According to Pearson’s result only, we can conclude that the model fits data adequately, because the p-value =11.5% which is less not than5%.

4.2.2 Normit Link Function

The normit link function is obtained from the probit regression option in the ordinal regression menu. It provides the inverse of the cumulative standard normal distribution function. From the model fitting information, the Chi-square test statistic for testing the null hypothesis that all the coefficients associated with predictors equal zero is c2 =8.001, with10 degrees of freedom and a p-value of0.629,  indicating that we fail to reject the null hypothesis. SPSS parameter estimates of the normit link function is :

         Normit(p) =0.969 +0.512 T0.5 +0.203 T1.0 +0.053 T1.5 -0.301 T2.0

          ( p-value )               (0.122)          (0.491)           (0.900)           (0.541)

          +0.192 T2.5 +0.786 T3.0 -1.115 T3.5 +0.920 T4.0 -1.350 T6.0 +1.087 T8.0

            (0.702)           (0.226)           (0.168)          (0.440)           (0.404)         (0.416)

                                                                                                                              … (14)   

Equation (14) of SPSS is the same as Equation (11) of SAS, but with opposite signs for the estimated coefficients, because, p which is the probability of the reference drug = Prob( Drug =1 ) as a default of SPSS. Goodness of fit information is given for Pearson test , c2 =49.506, with df =39 and a p-value of0.121, and for the Deviance test c2 =61.233, with df =39 and a p-value of0.013.

 

 

4.2.3 The Complementary log-log Link Function

The complementary log-log link function is obtained by selecting it from the ordinal regression menu. Model fitting information table shows that c2 =7.721, with10 degrees of freedom and a p-value of0.6560, indicating that there is no sufficient evidence that any one of the coefficients is different from zero, which is the same result as SAS. The estimated “cloglog” link function is :

         “cloglog” (p) =0.5370 +0.596 T0.5 +0.165 T1.0 +0.174 T1.5 -0.484 T2.0

          ( p-value )                     (0.155)          (0.623)           (0.712)           (0.385)

          +0.163 T2.5 +0.902 T3.0 -1.200 T3.5 +1.083 T4.0 -1.448 T6.0 +0.980 T8.0

             (0.774)          (0.210)          (0.179)          (0.468)          (0.438)         (0.522)

                                                                                                                              … (15)

Equation (15) of SPSS is the same as Equation (12) of SAS, but again with opposite signs for the estimated coefficients, because, p which is the probability of the reference drug = Prob( Drug =1 ) as a default of SPSS. Goodness of fit information is given for Pearson and Deviance tests using the Chi-square test statistic, c2 =48.936, with df =39 and a p-value of0.132, while for the Peasron test, and c2 =61.513, with df =39 and a p-value of0.012 for the Deviance test. Also, a95% confidence interval is provided for every parameter. It worth noting that SPSS does not provide any information about association of predicted probabilities and observed responses as we found in the SAS output.

4.3 MINITAB RESULTS

Minitab gives different sets of results for the three link functions the logit, which is the default, the normit (probit), and the gompit (complementary log-log) by selecting the Binary Logistic Regression from the Stat menu.

4.3.1 Logit Link Function

Minitab results looks like a combination of SAS and SPSS output, where Minitab output for the logit link function includes a response information table exactly as in SAS output, logistic regression table very similar to SPSS, goodness of fit table similar to SPSS, and measures of association very similar to SAS. Response information table shows that we have26 event for the reference drug and24 for the test drug. Logistic regression table provides the estimated coefficients (Coef), standard error of the coefficients (SE Coef), Z values, p-values, odds ratio, and95% CI’s for the B’s.  The estimated Minitab logit link function is exactly as Equation (13) of SPSS output. Testing the null hypothesis that all slopes are zero, is done through a G test, which gives the same results as SPSS. Also, testing, H0 = Bi =0 for i =1,2, ...,10 is the same with same conclusions of SPSS and SAS although it is done using the normal approximation and the Z test.

A95% confidence interval is provided for every parameter. The values of these CI’s are different from SPSS because they are computed using the normal approximation and the standard normal Z-table, while SPSS uses the chi-square tables. The odds ratios calculated by Minitab are exactly as SPSS results.

Pearson and Deviance tests are provided by Minitab as well as by SPSS as tests for goodness of fit. In addition to Pearson, Deviance Minitab calculates Hosmer-Lemeshow tests. The Chi-square test statistic, c2 =49.795, with df =39 and a p-value of0.115 for the Peasron test, c2 =61.248, with df =39 and a p-value of0.013 for the Deviance test, and c2 =5.820, with df =8 and a p-value of0.667 for the Hosmer-Lemeshow test.

Very similar to SAS, association of predicted probabilities and observed responses are given in the last table of Minitab output. The number of concordant, discordant, and tied pairs is624 pairs.70.2% of pairs are concordant and29.5% are discordant. Somers’ D, Goodman-Kruskal Gamma, and Kendall ’ s Tau-a are summarized in one table of Minitab output. These measures are0.41,0.41, and0.21 respectively.

4.3.2 Normit Link Function

The normit link function is obtained through the probit regression option using Minitab. Response information table is exactly as in SAS output. The logistic regression table provides the estimated coefficients, the standard error of the estimates, the Z and p- values for every estimates. The estimated normit link function is exactly as Equation (14) in SPSS output with one exception, where the constant term has a negative sign opposite to SPSS result. Testing that all slops are zero, is exactly the same as SAS and SPSS. Goodness of fit is similar to SPSS but with the addition of Hosmer-Lemeshow , where c2 =5.927, with df =8 and a p-value of0.655, which means that the model fits data adequately.

4.3.3 The Complementary log-log Link Function

Surprisingly the Minitab output of the complementary log-log link function is completely different from the corresponding output of both SAS and SPSS. The estimated “cloglog” link function is :

         “cloglog” (p) = -1.736 +0.572 T0.5 +0.323 T1.0 -0.089 T1.5 -0.194 T2.0

          ( p-value )                      (0.104)           (0.338)           (0.856)       (0.733)

          +0.260 T2.5 +0.956 T3.0 -1.386 T3.5 +1.135 T4.0 -1.884 T6.0 +1.704 T8.0

              (0.646)          (0.212)         (0.151)          (0.351)         (0.318)           (0.275)

                                                                                                                              … (15)

Consequently, all goodness of fit tests, and measures of association are different from SAS and SPSS. The G-test for testing that all slopes are zero is8.685 with df =10 and p-value0.562. The Chi-square test statistic for testing goodness of fit is c2 =50.284, with df =39 and a p-value of0.106 for the Peasron test, c2 =60.550, with df =39 and a p-value of0.015 for the Deviance test, and c2 =6.427, with df =8 and a p-value of0.600 for the Hosmer-Lemeshow test. Measures of association of predicted probabilities and observed responses show that, number of concordant, discordant, and tied pairs is624 pairs.71.5% of pairs are concordant and28.2% are discordant. Somers’ D, Goodman-Kruskal Gamma, and Kendall ’ s Tau-a are0.43,0.43, and0.22 respectively.

It worth noting that this Minitab results of the complementary log-log link function can be obtained exactly using SPSS but with the selection of the Negative log-log option as previously shown in the SPSS output.

 

5. CONCLUSIONS AND RECOMMONDATIONS

Application of the three software packages on binary response data gave some similar and some other different results for the three link functions, logit, normit, and complementary logo-log functions. Table-2 demonstrate a summary of the main differences and similarities between SAS, SPSS, and MINITAB.

(1)       The most important difference between these three software is the default probability of the binary dependent or the response variable, where SAS uses the smaller value (zero) by default to estimate its probability, while SPSS and MINITAB use the higher sorted value (one) as a default. This default situation will have a serious effect on the signs of the estimated parameters, and consequently the odds ratio as well as the confidence intervals for the model parameters.

(2)        Hence, SPSS and MINITAB will give the same signs for the estimated parameters, while SAS will give an opposite sign for every corresponding estimated parameter, which will have a very different meaning in the results interpretation.

(3)       Also, the odds ratio from SAS output will be EXP(B) for every predictor, while it will be the reciprocal value, i.e., {1/EXP(B)}= EXP(-B) for every corresponding predictor in SPSS and MINITAB output.

(4)       Although SPSS and MINITAB have the same values of the estimated parameters, the95% confidence interval bounds are not equal, that is because SPSS uses Wald’s Chi-Square values, while MINITAB uses the approximation of the standard normal distribution. SAS does not provide C.I’s by default for the model parameters.

(5)       MINITAB is the best in providing goodness of fit tests. Pearson, Deviance, and Hosmer-Lemeshow Chi-square tests are available by default. In the SPSS output, only the first two tests are available, while none of them is provided by SAS.

TABLE-2

Comparison between SAS, SPSS, and MINITABCriterion

SAS

SPSS

MINITAB

Model fitting: testing all B’s =0

       Same result

      Same result

      Same result

Values of the estimated parameters

      Same values

      Same values

      Same values

Signs of the estimated parameters

      Opposite signs

      Same signs

      Same signs

Odds ratio

EXP(Bi)

1/{EXP(Bi)}

1/{EXP(Bi)}

C.I’s for the B’s

X

Calculated using Wald’s c2

Calculated using Z-values

Goodness of fit tests

X

X

X

     Pearson test

     Deviance test

X

     Pearson test

     Deviance test

Hosmer-Lemeshow test

Measures of Association

Concordant & Discordant pairs.

        Somers’D

       Gamma

       Kendall’s Tau-a

C

 

X

X

X

X

X

Concordant & Discordant pairs.

         Somers’D

         Gamma

         Kendall’s Tau-a

X

Default for the binary response variable y

P( y =0 )

P( y =1 )

P( y =1 )

Software Command (Menu) :

                    Logit link function

                    Normit link function

                    Complementary log-log

 

PROC LOGISTIC

NORMIT option

CLOGLOG option

 

Binary Logistic

Ordinal Regr./Probit

Ordinal Regression / Complementary log-log

 

Binary logistic

Binary logistic/Probit

Binary logistic / Negative log-log

(X) Means not available by default.

(6)       SAS is the best in providing measures of association between response variable and predicted probabilities, number of concordant, discordant, and tied pairs, Somers’ D, Goodman-Kruskal Gamma, Kendall ’ s Tau-a, and c-correlation. MINITAB also provides them all with the exception of the c-correlation value.  While, SPSS provides none of these measures.

(7)       It worth noting also, to say that MINITAB and SPSS are user friendly software, while SAS which is very powerful statistical package, requires hard work and learning experience in writing its program.

(8)       This paper urge the statistical software users to be aware of the default setup of these software because data interpretation will be totally influenced by this default. Also, this paper agrees with Searls (1994), who demanded the software houses to provide a very clear, and more detailed descriptions of their calculations.

(9)       Results of this paper suggest the use of binary response models as an alternative approach for testing the statistical differences between the effect of a test and a reference drug in the pharmaceutical or medical studies, where nonsignificant estimated parameters means that the corresponding predictor variables could not distinguish between the medical effect of the test and reference drug, which means that both drugs have the same medical effect.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

REFERENCES

 

Agresti, A. (1990), “Categorical Data Analysis,” John Wiley & Sons, Inc.

 

Bergmann, R., Ludbrook, J., and Spooren, W. (2000), “Different Outcomes of the Wilcoxon-Mann-Whitney Test From Different Statistical Packages,” The American Statistician,54,72-77.

 

Dallal, G. E. (1992), “The Computer Analysis of Factorial Experiments With Nested Factors” The American Statistician,46,240.

 

Hauck, W., and Donner, A. (1977), “ Wald’s Test As Applied to Hypotheses in Logit  Analysis,” Journal of the American Statistical Association72,851-853.

 

Hoffman, D. L. (1991), “Comparisons of Four Correspondence Analysis Programs for the IBM PC,” The American Statistician,39,279-285.

 

McCullough, B. D. (1998), “ Assessing the Reliability of Statistical Software: Part I,” The American Statistician,52,358-366.

 

McCullough, B. D. (1999), “ Assessing the Reliability of Statistical Software: Part II,” The American Statistician,53,149-159.

 

McCullagh, P., and Nelder, J. A. (1992), “Generalized Linear Models,” Chapman & Hall.

 

Okunade, A., Chang, C., and Evans, R. (1993), “Comparative Analysis of Regression Output Summary Statistics in Common Statistical Packages,” The American Statistician,47,298-303.

 

Oster, R. A. (1998), “ An examination of Five Statistical Software Packages for Epidemiology,”  The American Statistician,52,267-280.

 

Press, S., and S. Wilson, S. (1978), “ Choosing Between Logistic Regression and  Discriminant Analysis, ” Journal of the American Statistical Association73,699-705.

 

Searle, S. R. (1989), “Statistical Computing Packages: Some Words of Caution,” The American Statistician,43,189-190.

 

Searle, S. R. (1994), “Analysis of Variance Computing Package Output for Unbalanced Data From Fixed Effects Models with Nested Factors,” The American Statistician,48,148-153.

 

Uyar, B., and Erdem, O. (1990), “Regression Procedures in SAS : Problems?” The American Statistician,44,296-301.

 

 

Zhou, X., Perkins, A., and Hui, S. (1999), “Comparisons of Software Packages for Generalized Linear Multilevel Models,”  The American Statistician,53,282-290.

More Questions Like This