Linear Regression:

It's a hypothetical model of the relationship between two variables.Here the model is linear
The two variables involved in this are called the Predictor(independent) and the response(dependent) variable.

Fit of the model: We use fitness of a model to show how good a model is in associating the variables.

Analysing the Regression Model:

R 2:

The R2 is the proportion of variance in the outcome (response) variable which can be explained by the predictor (independent) variables. This gives the usefulness of the model.
An R2 of 1 means the independent variable explains 100% of the variance in the dependent variable. Conversely 0 means it explains none.


R 2 vs Adjusted R 2


Analysis of Variance(ANOVA):

It shows whether the regression model is better at predicting the response variable than using the mean.


Beta Values:

It gives the unit change of the response value with respect to the predictor value.
Standardized Beta Values:Gives the value in terms of standard deviation.Useful in case of comparing models.

Linear Regression equation:

Equation: y= b 0+ b 1X
where, b 0 is the intercept and b 1 is the regression coefficent.
Coefficients:

Coefficients gives the values to construct the linear equation(i.e. The intecept and the regression coefficent)

Standardized Coefficients:

These are used to find the standardised scores of the coefficients, which can be used to compare different models.

Multiple Linear Regression:

The relationship is described using a variation of the equation of a straight line.

Equation: y= b 0+ b 1X 1+b 2X 2+b 3X 3

where, b1 to bn are the regression coefficient for variable 1 to n

Dummy Variables:

Categorical variables are hard to predict because they do not have any scale and it would be impossible to associate the change of it like a continous variable.
So we transform the categorical variable into a series of dummy variables which indicate whether a particular case has that particular characteristic. They are also called as indicator variables.


Reporting Multiple linear Regression(Model 3):

Multiple regression analysis was conducted to determine the student's final math grade(mG3).. Marks obtained in second grade(mG2),going for higher studies(higher) were used as predictor variables In order to include the higher education in the regression model it was recorded dummy variable higher_edu (0 for no, 1 for yes) and an interaction term was introduced by multiplying (inthigher * mG2) .Examination of the histogram, normal P-P plot of standardised residuals and the scatterplot of the dependent variable, academic satisfaction, and standardised residuals showed that the some outliers existed. However, examination of the standardised residuals showed that none could be considered to have undue influence (95% within limits of -1.96 to plus 1.96 and none with Cook’s distance >1 as outlined in Field (2013). Examination for multicollinearity showed that the tolerance and variance influence factor measures were outside acceptable levels (tolerance < 0.4, VIF > 2.5 ) as outlined in Tarling (2008).

Logistic Regression:

ROC curve:

Research Question 6:

Can we predict whether a student will go for higher studies(higher.m) with various predictors using logistic Regression ?

Report for Binary Logistic Regression Model:

Logistic regression analysis was conducted with higher studies as the outcome variable and sex and romantic(is student in relationship?) was used as predictors. The data met the assumption for independent observations. Examination for multicollinearity showed that the tolerance and variance influence factor measures were within acceptable levels (tolerance >0.4, VIF < 2.5 ) as outlined in Tarling (2008). The Hosmer Lemeshow goodness of fit statistic did not indicate any issues with the assumption of linearity between the independent variables as and the log odds of the model (x2(n=1)=0.008, p =0.92).

Dimension Reduction

Factor Analysis Principal Component Analysis (PCA)
FA tries to achieve parsimony by explaining the maximum amount of common variance in a correlation matrix using the smallest number of explanatory constructs. PCA tries to explain the maximum amount of total variance in a correlation matrix.
These ‘explanatory constructs’ are called factors. It does this by transforming the original variables into a set of linear components.

Loadings:

Correlation between a specific observed variable and a specific factor.
Higher values mean a closer relationship. They are equivalent to standardised regression coefficients (β weights) in multiple regression.
Higher the value the better.
Loading of magnitude above 0.3 (irrespective of sign) are considered high

Research Question 7:

Can we do dimension reduction for the "studentpIusepersonality" dataset ?
  • Step 4: Decide which components to retain (PRINCIPAL COMPONENTS ANALYSIS)
    Code:
    ##Step 4: Decide which components to retain (PRINCIPAL COMPONENTS ANALYSIS)
    #Create the scree plot
    plot(pc1$values, type = "b") 
    #Print the variance explained by each component
    pc1$Vaccounted 
    #Print the Eigenvalues
    pc1$values
    
    factoextra::fviz_eig(pcf, addlabels = TRUE, ylim = c(0, 50))#Visualize the Eigenvalues
    factoextra::fviz_pca_var(pcf, col.var = "black")
    factoextra::fviz_pca_var(pcf, col.var = "cos2",
                                gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), 
                                repel = TRUE # Avoid text overlapping
    )
    
    #Print the loadings above the level of 0.3
    psych::print.psych(pc1, cut = 0.3, sort = TRUE)
    #create a diagram showing the components and how the manifest variables load
    fa.diagram(pc1) 
    #Show the loadings of variables on to components
    fa.sort(pc1$loading)
    #Output the communalities of variables across components (will be one for PCA since all the variance is used)
    pc1$communality 
    #Visualize contribution of variables to each component
    var <- factoextra::get_pca_var(pcf)
    corrplot::corrplot(var$contrib, is.corr=FALSE) 
    
    # Contributions of variables to PC1
    factoextra::fviz_contrib(pcf, choice = "var", axes = 1, top = 10)
    # Contributions of variables to PC2
    factoextra::fviz_contrib(pcf, choice = "var", axes = 2, top = 10)
                        
    Output:
    > pc1$Vaccounted 
                            PC1       PC2        PC3       PC4        PC5       PC6       PC7        PC8        PC9       PC10       PC11       PC12       PC13
    SS loadings           4.6301443 2.5118390 1.64789162 1.6142941 1.14270649 1.0190601 0.8782421 0.83430414 0.78119412 0.74092686 0.67291164 0.52329329 0.51061967
    Proportion Var        0.2315072 0.1255919 0.08239458 0.0807147 0.05713532 0.0509530 0.0439121 0.04171521 0.03905971 0.03704634 0.03364558 0.02616466 0.02553098
    Cumulative Var        0.2315072 0.3570992 0.43949374 0.5202084 0.57734377 0.6282968 0.6722089 0.71392408 0.75298379 0.79003013 0.82367571 0.84984038 0.87537136
    Proportion Explained  0.2315072 0.1255919 0.08239458 0.0807147 0.05713532 0.0509530 0.0439121 0.04171521 0.03905971 0.03704634 0.03364558 0.02616466 0.02553098
    Cumulative Proportion 0.2315072 0.3570992 0.43949374 0.5202084 0.57734377 0.6282968 0.6722089 0.71392408 0.75298379 0.79003013 0.82367571 0.84984038 0.87537136
                            PC14       PC15       PC16       PC17       PC18       PC19       PC20
    SS loadings           0.48783859 0.40717924 0.39507044 0.35229355 0.31002368 0.29697834 0.24318891
    Proportion Var        0.02439193 0.02035896 0.01975352 0.01761468 0.01550118 0.01484892 0.01215945
    Cumulative Var        0.89976329 0.92012225 0.93987578 0.95749045 0.97299164 0.98784055 1.00000000
    Proportion Explained  0.02439193 0.02035896 0.01975352 0.01761468 0.01550118 0.01484892 0.01215945
    Cumulative Proportion 0.89976329 0.92012225 0.93987578 0.95749045 0.97299164 0.98784055 1.00000000
    
    >pc1$values
     [1] 4.6301443 2.5118390 1.6478916 1.6142941 1.1427065 1.0190601 0.8782421 0.8343041 0.7811941 0.7409269 0.6729116 0.5232933 0.5106197 0.4878386 0.4071792 0.3950704
    [17] 0.3522935 0.3100237 0.2969783 0.2431889
    
    
    > psych::print.psych(pc1, cut = 0.3, sort = TRUE)
    Principal Components Analysis
    Call: principal(r = corr_df, nfactors = length(corr_df), rotate = "none")
    Standardized loadings (pattern matrix) based upon correlation matrix
        item   PC1   PC2   PC3   PC4   PC5   PC6   PC7   PC8   PC9  PC10  PC11  PC12  PC13  PC14  PC15  PC16  PC17  PC18  PC19  PC20 h2       u2 com
    D1     1  0.71 -0.41                                                                                                        0.35  1  0.0e+00 3.2
    E1    11  0.70  0.34                                                                                0.34                          1 -2.2e-16 3.7
    E2    12  0.63  0.47        0.36                                                                                      0.42        1  0.0e+00 3.9
    D2     2  0.62                                                             -0.36                                                  1 -3.1e-15 5.2
    E4    14  0.61  0.47                                                                                            0.32              1  1.1e-16 4.8
    D5     5  0.60 -0.44                          0.35                          0.32                                                  1 -3.8e-15 5.1
    D7     7 -0.59  0.43                                                                                                              1 -2.4e-15 5.5
    D9     9 -0.51  0.44                    0.33              0.33                               -0.38                                1 -1.3e-15 6.5
    D3     3  0.45        0.41              0.31                                                                                      1 -1.1e-15 8.8
    E3    13  0.37  0.51       -0.49                                                                          0.33                    1  8.9e-16 6.1
    E8    18 -0.48 -0.32  0.51                                                                                                        1  3.3e-16 6.7
    E7    17                    0.65                    0.36                                                                          1  3.3e-16 4.7
    E9    19              0.46  0.50                   -0.44                                                                          1  3.3e-16 6.4
    E5    15  0.44  0.37       -0.47                                                                                                  1  1.2e-15 7.5
    D8     8                          0.50  0.45             -0.44                                                                    1  3.3e-16 6.4
    D4     4  0.37 -0.33  0.36       -0.38  0.48                                                                                      1 -1.6e-15 7.3
    E6    16  0.38  0.47                   -0.47                                     -0.32                                            1  1.4e-15 6.9
    E10   20              0.36        0.30       -0.51  0.52                                                                          1  0.0e+00 5.6
    D6     6  0.40        0.43                                     -0.53 -0.34                                                        1  0.0e+00 6.0
    D10   10 -0.38  0.38              0.34                               -0.52                                                        1 -2.2e-16 6.8
    
                           PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8  PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18 PC19 PC20
    SS loadings           4.63 2.51 1.65 1.61 1.14 1.02 0.88 0.83 0.78 0.74 0.67 0.52 0.51 0.49 0.41 0.40 0.35 0.31 0.30 0.24
    Proportion Var        0.23 0.13 0.08 0.08 0.06 0.05 0.04 0.04 0.04 0.04 0.03 0.03 0.03 0.02 0.02 0.02 0.02 0.02 0.01 0.01
    Cumulative Var        0.23 0.36 0.44 0.52 0.58 0.63 0.67 0.71 0.75 0.79 0.82 0.85 0.88 0.90 0.92 0.94 0.96 0.97 0.99 1.00
    Proportion Explained  0.23 0.13 0.08 0.08 0.06 0.05 0.04 0.04 0.04 0.04 0.03 0.03 0.03 0.02 0.02 0.02 0.02 0.02 0.01 0.01
    Cumulative Proportion 0.23 0.36 0.44 0.52 0.58 0.63 0.67 0.71 0.75 0.79 0.82 0.85 0.88 0.90 0.92 0.94 0.96 0.97 0.99 1.00
    
    Mean item complexity =  5.9
    Test of the hypothesis that 20 components are sufficient.
    
    The root mean square of the residuals (RMSR) is  0 
     with the empirical chi square  0  with prob <  NA 
    
    Fit based upon off diagonal values = 1
    
    > fa.sort(pc1$loading)
    
    Loadings:
        PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8    PC9    PC10   PC11   PC12   PC13   PC14   PC15   PC16   PC17   PC18   PC19   PC20  
    D1   0.712 -0.414                0.285                              0.110         0.155                       0.150  0.119                0.353
    E1   0.696  0.335                             -0.254  0.101 -0.188  0.144                      -0.173 -0.199  0.342 -0.215               -0.122
    E2   0.630  0.468         0.359                                    -0.115                              0.116         0.101         0.418       
    D2   0.623 -0.189  0.235  0.143  0.289         0.256         0.237         0.154 -0.362 -0.168  0.109 -0.127 -0.124 -0.204 -0.125              
    E4   0.609  0.465 -0.144  0.291         0.139         0.101  0.173                      -0.104  0.182  0.231        -0.114  0.320 -0.156       
    D5   0.598 -0.435                0.184         0.348  0.111         0.202         0.322  0.179                              0.112        -0.230
    D7  -0.594  0.434  0.261        -0.106         0.241  0.158  0.174         0.140  0.105  0.230         0.263        -0.142 -0.264              
    D9  -0.509  0.436  0.225        -0.176  0.332  0.138  0.150  0.331                                    -0.381                0.149              
    D3   0.450 -0.127  0.407        -0.280  0.307 -0.276 -0.299         0.280 -0.253         0.190  0.241        -0.173                            
    E3   0.373  0.511  0.256 -0.486  0.203                             -0.108  0.152                0.217         0.134  0.325        -0.119 -0.135
    E8  -0.479 -0.319  0.509 -0.289        -0.176               -0.167  0.153  0.157                0.229         0.240 -0.134  0.162  0.195       
    E7  -0.216 -0.230  0.227  0.648 -0.180 -0.105         0.356 -0.282  0.214        -0.189         0.123                0.220        -0.116       
    E9  -0.122         0.459  0.500  0.108 -0.211 -0.232 -0.436  0.192 -0.238  0.185         0.156 -0.216                       0.103              
    E5   0.441  0.367  0.279 -0.475                       0.122 -0.279               -0.240  0.283 -0.230        -0.170         0.143         0.112
    D8  -0.284  0.221  0.224  0.239  0.501  0.446               -0.435 -0.134  0.125  0.201 -0.140               -0.148                            
    D4   0.372 -0.334  0.359        -0.379  0.484                                           -0.276 -0.287  0.171  0.106  0.101                     
    E6   0.384  0.465  0.193        -0.286 -0.473                       0.203  0.159  0.265 -0.316               -0.200                            
    E10 -0.133 -0.277  0.361 -0.213  0.302        -0.507  0.519  0.280                                                                             
    D6   0.397 -0.140  0.432        -0.230 -0.167  0.281  0.160 -0.100 -0.534 -0.338  0.120               -0.104                                   
    D10 -0.378  0.375  0.297  0.145  0.338         0.223 -0.110         0.280 -0.523        -0.120 -0.170                0.107                     
    
                     PC1   PC2   PC3   PC4   PC5   PC6   PC7   PC8   PC9  PC10  PC11  PC12  PC13  PC14  PC15  PC16  PC17  PC18  PC19  PC20
    SS loadings    4.630 2.512 1.648 1.614 1.143 1.019 0.878 0.834 0.781 0.741 0.673 0.523 0.511 0.488 0.407 0.395 0.352 0.310 0.297 0.243
    Proportion Var 0.232 0.126 0.082 0.081 0.057 0.051 0.044 0.042 0.039 0.037 0.034 0.026 0.026 0.024 0.020 0.020 0.018 0.016 0.015 0.012
    Cumulative Var 0.232 0.357 0.439 0.520 0.577 0.628 0.672 0.714 0.753 0.790 0.824 0.850 0.875 0.900 0.920 0.940 0.957 0.973 0.988 1.000
    
    > pc1$communality 
     D1  D2  D3  D4  D5  D6  D7  D8  D9 D10  E1  E2  E3  E4  E5  E6  E7  E8  E9 E10 
      1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
    
    
    Graphs:

    Scree Plot:

    • Our sample is > 200, so scree plot is good
    • Points of inflexion: only factors/components above this point are retained

    Loadings (PA1=Dim1=Component 1)

    • Item loadings above 0.3 is acceptable.
    • In our case almost all variables have loadings > 0.3
    • A factor/component with fewer than 3 items is typically weak.

  • Step 5: Factor Rotation:
    Code:
    #Step 5: Rotation
    #Apply rotation to try to refine the component structure
    pc2 <-  principal(corr_df, nfactors = 4, rotate = "varimax")#Extracting 4 factors
    #output the components
    psych::print.psych(pc2, cut = 0.3, sort = TRUE)
    #output the communalities
    pc2$communality
                
    Output:
    > psych::print.psych(pc2, cut = 0.3, sort = TRUE)
    Principal Components Analysis
    Call: principal(r = corr_df, nfactors = 4, rotate = "varimax")
    Standardized loadings (pattern matrix) based upon correlation matrix
        item   RC3   RC1   RC2   RC4   h2   u2 com
    E2    12  0.82                   0.75 0.25 1.2
    E4    14  0.81                   0.69 0.31 1.1
    E8    18 -0.75                   0.67 0.33 1.4
    E1    11  0.64        0.32       0.61 0.39 2.0
    E6    16  0.47              0.30 0.40 0.60 2.7
    E10   20 -0.44                   0.27 0.73 1.7
    D7     7        0.75             0.61 0.39 1.2
    D9     9        0.68             0.50 0.50 1.2
    D1     1       -0.64  0.47       0.68 0.32 2.1
    D10   10        0.63             0.39 0.61 1.0
    D5     5       -0.60  0.41       0.55 0.45 1.9
    D8     8        0.46             0.24 0.76 1.2
    D6     6              0.60       0.37 0.63 1.1
    D3     3              0.59       0.38 0.62 1.2
    D2     2       -0.32  0.57       0.50 0.50 2.0
    D4     4              0.55       0.39 0.61 1.6
    E3    13                    0.77 0.70 0.30 1.4
    E5    15              0.31  0.71 0.63 0.37 1.5
    E7    17                   -0.69 0.57 0.43 1.4
    E9    19        0.34  0.39 -0.46 0.49 0.51 2.8
    
                            RC3  RC1  RC2  RC4
    SS loadings           2.96 2.92 2.49 2.03
    Proportion Var        0.15 0.15 0.12 0.10
    Cumulative Var        0.15 0.29 0.42 0.52
    Proportion Explained  0.28 0.28 0.24 0.20
    Cumulative Proportion 0.28 0.57 0.80 1.00
    
    Mean item complexity =  1.6
    Test of the hypothesis that 4 components are sufficient.
    
    The root mean square of the residuals (RMSR) is  0.07 
        with the empirical chi square  707.62  with prob <  1.2e-85 
    
    Fit based upon off diagonal values = 0.91
    > #output the communalities
    > pc2$communality
            D1        D2        D3        D4        D5        D6        D7        D8        D9       D10        E1 
    0.6842388 0.4996312 0.3848837 0.3885244 0.5486848 0.3683538 0.6141653 0.2368362 0.5028372 0.3937291 0.6067317 
            E2        E3        E4        E5        E6        E7        E8        E9       E10 
    0.7452121 0.7021189 0.6919276 0.6327764 0.4028394 0.5714281 0.6733961 0.4857667 0.2700874 
    
  • Repeating Step 3 and 4
    Code:
    #Step3,4 
    #Factor Analysis - the default here is principal axis factoring fm=pa
    #If we know our data going in is normally distributed we use maximum likelihood
    facsol <- psych::fa(raqMatrix, nfactors=4, obs=NA, n.iter=1, rotate="varimax", fm="pa")
    
    #Create your scree plot
    plot(facsol$values, type = "b") #scree plot
    
    #Print the Variance accounted for by each factor/component
    facsol$Vaccounted
    #Output the Eigenvalues
    facsol$values 
    
    #Print the components with loadings
    psych::print.psych(facsol,cut=0.3, sort=TRUE)
    
    #Print sorted list of loadings
    fa.sort(facsol$loading)
    
    #create a diagram showing the factors and how the manifest variables load
    fa.diagram(facsol) 
                    
    Output:
                        > facsol$Vaccounted
                        PA1       PA3        PA2        PA4
    SS loadings           3.0940065 2.1994046 1.95278566 0.99801692
    Proportion Var        0.1547003 0.1099702 0.09763928 0.04990085
    Cumulative Var        0.1547003 0.2646706 0.36230984 0.41221068
    Proportion Explained  0.3752943 0.2667816 0.23686742 0.12105665
    Cumulative Proportion 0.3752943 0.6420759 0.87894335 1.00000000
    > #Output the Eigenvalues
    > facsol$values 
    [1]  4.135553164  2.010156002  1.133269200  0.965235315  0.457706517  0.329607713  0.245619359  0.160758646
    [9]  0.084517526  0.030566186  0.004509943 -0.007025198 -0.052657167 -0.081713520 -0.114231623 -0.141303473
    [17] -0.157426012 -0.197364036 -0.252625654 -0.309851679
    > #Print the components with loadings
    > psych::print.psych(facsol,cut=0.3, sort=TRUE)
    Factor Analysis using method =  pa
    Call: psych::fa(r = raqMatrix, nfactors = 4, n.iter = 1, rotate = "varimax", 
    fm = "pa", obs = NA)
    Standardized loadings (pattern matrix) based upon correlation matrix
    item   PA1   PA3   PA2   PA4   h2   u2 com
    D1     1  0.79                   0.67 0.33 1.1
    D7     7 -0.71                   0.56 0.44 1.2
    D5     5  0.68                   0.48 0.52 1.1
    D9     9 -0.62                   0.40 0.60 1.1
    D2     2  0.52                   0.42 0.58 2.2
    D10   10 -0.46                   0.25 0.75 1.3
    D4     4  0.40                   0.24 0.76 2.0
    D3     3  0.34        0.31       0.25 0.75 2.5
    D6     6  0.31                   0.23 0.77 2.9
    D8     8 -0.31                   0.12 0.88 1.6
    E2    12        0.75  0.34       0.72 0.28 1.6
    E8    18       -0.73             0.60 0.40 1.3
    E4    14        0.73             0.63 0.37 1.4
    E1    11        0.52  0.43       0.53 0.47 2.5
    E10   20       -0.32             0.11 0.89 1.2
    E3    13              0.75       0.65 0.35 1.3
    E5    15              0.66       0.49 0.51 1.2
    E6    16        0.32  0.41       0.28 0.72 1.9
    E7    17                    0.54 0.38 0.62 1.5
    E9    19                    0.47 0.23 0.77 1.1
    
                   PA1  PA3  PA2  PA4
    SS loadings           3.09 2.20 1.95 1.00
    Proportion Var        0.15 0.11 0.10 0.05
    Cumulative Var        0.15 0.26 0.36 0.41
    Proportion Explained  0.38 0.27 0.24 0.12
    Cumulative Proportion 0.38 0.64 0.88 1.00
    
    Mean item complexity =  1.6
    Test of the hypothesis that 4 factors are sufficient.
    
    The degrees of freedom for the null model are  190  and the objective function was  6.38
    The degrees of freedom for the model are 116  and the objective function was  0.92 
    
    The root mean square of the residuals (RMSR) is  0.04 
    The df corrected root mean square of the residuals is  0.05 
    
    Fit based upon off diagonal values = 0.97
    Measures of factor score adequacy             
                                               PA1  PA3  PA2  PA4
    Correlation of (regression) scores with factors   0.92 0.90 0.88 0.79
    Multiple R square of scores with factors          0.85 0.82 0.77 0.63
    Minimum correlation of possible factor scores     0.70 0.63 0.54 0.26
    > #Print sorted list of loadings
    > fa.sort(facsol$loading)
    
    Loadings:
    PA1    PA3    PA2    PA4   
    D1   0.792  0.144  0.121       
    D7  -0.714 -0.120         0.178
    D5   0.681                     
    D9  -0.620                0.119
    D2   0.522  0.154  0.276  0.226
    D10 -0.464                0.172
    D4   0.401 -0.109  0.202  0.158
    D3   0.341         0.313  0.182
    D6   0.306         0.275  0.248
    D8  -0.305                0.172
    E2   0.129  0.748  0.344  0.171
    E8  -0.159 -0.732         0.192
    E4   0.132  0.728  0.278       
    E1   0.278  0.516  0.431       
    E10        -0.318              
    E3          0.108  0.746 -0.271
    E5                 0.663 -0.194
    E6          0.319  0.413       
    E7                -0.282  0.544
    E9                        0.472
    
             PA1   PA3   PA2   PA4
    SS loadings    3.094 2.199 1.953 0.998
    Proportion Var 0.155 0.110 0.098 0.050
    Cumulative Var 0.155 0.265 0.362 0.412
    > #create a diagram showing the factors 
    
    Graphs:

    Scree Plot:

    • Our sample is > 200, so scree plot is good
    • Points of inflexion: only factors/components above this point are retained

    Loadings (PA1=Dim1=Component 1)

    • Item loadings above 0.3 is acceptable.
    • In our case almost all variables have loadings > 0.3
    • A factor/component with fewer than 3 items is typically weak.

  • Step 6: Reliability Analysis
    Code:
    psych::alpha(d_data)
                    
    Output:
    > psych::alpha(d_data)
    Some items ( D7 D8 D9 D10 ) were negatively correlated with the total scale and 
    probably should be reversed.  
    To do this, run the function again with the 'check.keys=TRUE' option
    Reliability analysis   
    Call: psych::alpha(x = d_data)
    
        raw_alpha std.alpha G6(smc) average_r  S/N   ase mean   sd median_r
            0.26      0.26    0.49     0.034 0.35 0.057  3.1 0.39   -0.097
    
        lower alpha upper     95% confidence boundaries
    0.15 0.26 0.37 
    
        Reliability if an item is dropped:
        raw_alpha std.alpha G6(smc) average_r  S/N alpha se var.r  med.r
    D1       0.20      0.19    0.40     0.026 0.24    0.063 0.075 -0.101
    D2       0.13      0.13    0.40     0.016 0.15    0.069 0.094 -0.101
    D3       0.16      0.16    0.43     0.021 0.20    0.067 0.107 -0.101
    D4       0.15      0.15    0.41     0.019 0.17    0.068 0.104 -0.113
    D5       0.19      0.18    0.41     0.024 0.23    0.064 0.085 -0.101
    D6       0.17      0.17    0.45     0.022 0.20    0.066 0.110 -0.109
    D7       0.38      0.37    0.53     0.062 0.59    0.046 0.083  0.060
    D8       0.28      0.30    0.52     0.044 0.42    0.055 0.112  0.060
    D9       0.34      0.34    0.52     0.054 0.52    0.050 0.090  0.055
    D10      0.33      0.31    0.52     0.047 0.45    0.050 0.103  0.060
    
        Item statistics 
            n raw.r std.r  r.cor r.drop mean   sd
    D1  382 0.434 0.440  0.474  0.178  3.4 1.05
    D2  382 0.552 0.535  0.509  0.280  3.3 1.19
    D3  382 0.499 0.484  0.383  0.233  3.5 1.13
    D4  382 0.517 0.510  0.445  0.254  3.4 1.13
    D5  382 0.438 0.453  0.446  0.205  3.6 0.96
    D6  382 0.467 0.481  0.352  0.236  3.6 0.97
    D7  382 0.083 0.078 -0.102 -0.201  2.8 1.11
    D8  382 0.215 0.253  0.010 -0.013  2.1 0.88
    D9  382 0.154 0.153 -0.014 -0.114  2.9 1.03
    D10 382 0.244 0.223  0.019 -0.067  2.5 1.19
    
    Non missing response frequency for each item
            0    1    2    3    4    5 miss
    D1  0.00 0.04 0.18 0.21 0.44 0.13    0
    D2  0.01 0.07 0.18 0.24 0.35 0.14    0
    D3  0.01 0.03 0.17 0.18 0.44 0.16    0
    D4  0.01 0.05 0.18 0.17 0.46 0.13    0
    D5  0.00 0.02 0.11 0.20 0.51 0.15    0
    D6  0.02 0.01 0.09 0.28 0.48 0.12    0
    D7  0.00 0.12 0.32 0.27 0.23 0.06    0
    D8  0.01 0.22 0.55 0.13 0.07 0.01    0
    D9  0.00 0.08 0.30 0.30 0.27 0.04    0
    D10 0.01 0.22 0.36 0.17 0.19 0.05    0 
    
    Result:
    • Cronbach’s alpha measurement of consistency tells you how consistently the variables behave as a scale
    • If it’s high (say .80 or .90), then we probably have one factor/component
    • Based on George and Maller rule of thumb on Cronbaac's alpha value, our variables have Unacceptable Cronbach's Values
  • Report for Dimension Reduction:

    A principal component analysis (PCA) was conducted on the 20 items with orthogonal rotation (varimax). Bartlett’s test of sphericity, Χ2(190) = 2381.718, p< 0.001, indicated that correlations between items were sufficiently large for PCA. An initial analysis was run to obtain eigenvalues for each component in the data. Seven components had eigenvalues over Kaiser’s criterion of 1 and in combination explained 68.87% of the variance. The scree plot was slightly ambiguous and showed inflexions that would justify retaining either 2 or 4 factors. The group D had an low reliability of Cronbach's α = 0.34 and the group E also had low reliability, Cronbach’s α = .54