Increased graduation rates in CSU system over past 10 years

CSU_GradRatesComparison_10yr

Overview

In this project we acquired graduation rate data on campuses within the California State University system using the Integrated Postsecondary Education Data System developed by the National Center for Education Statistics. Specifically, we obtained rates for students that graduated within 150% of normal graduating time for their particular Bachelor’s degree for years 2009 and 2019.

Below we investigate if there is a statistically significant change in graduation rates among CSU campuses recently (2019) as compared to ten years prior (2009).

Preliminaries: load required modules

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import nonparametric_stats as nps
import sklearn.svm as svm
import sklearn.metrics as mt

Build data frames

In [32]:
DATAFILE_2019 = '/home/thugwithyoyo/Documents/NullExitProjects/GradRateData/CSU_GradRates/CSV_2019.csv'
DATAFILE_2009 = '/home/thugwithyoyo/Documents/NullExitProjects/GradRateData/CSU_GradRates/CSV_2009.csv'
GR2019_df = pd.read_csv(DATAFILE_2019, header=0)
GR2009_df = pd.read_csv(DATAFILE_2009, header=0)

Check columns headers

In [3]:
GR2019_df.columns
Out[3]:
Index(['unitid', 'institution name', 'year',
       'GR200_19.4-year Graduation rate - bachelor's degree within 100% of normal time',
       'GR200_19.6-year Graduation rate - bachelor's degree within 150% of normal time',
       'GR200_19.8-year Graduation rate - bachelor's degree within 200% of normal time'],
      dtype='object')
In [33]:
GR2009_df.columns
Out[33]:
Index(['unitid', 'institution name', 'year',
       'GR200_09_RV.4-year Graduation rate - bachelor's degree within 100% of normal time',
       'GR200_09_RV.6-year Graduation rate - bachelor's degree within 150% of normal time',
       'GR200_09_RV.8-year Graduation rate - bachelor's degree within 200% of normal time'],
      dtype='object')

Inspect dataframe

In [66]:
GR2009_df
Out[66]:
unitid institution name year GR200_09_RV.4-year Graduation rate – bachelor’s degree within 100% of normal time GR200_09_RV.6-year Graduation rate – bachelor’s degree within 150% of normal time GR200_09_RV.8-year Graduation rate – bachelor’s degree within 200% of normal time coded names
0 110486 California State University-Bakersfield 2009 14.0 40.0 46.0 B
1 110495 California State University-Stanislaus 2009 19.0 52.0 57.0 Stan
2 110510 California State University-San Bernardino 2009 10.0 38.0 45.0 SB
3 110538 California State University-Chico 2009 15.0 52.0 57.0 C
4 110547 California State University-Dominguez Hills 2009 5.0 28.0 37.0 DH
5 110556 California State University-Fresno 2009 14.0 48.0 55.0 Fres
6 110565 California State University-Fullerton 2009 15.0 50.0 57.0 Full
7 110574 California State University-East Bay 2009 11.0 40.0 46.0 EB
8 110583 California State University-Long Beach 2009 10.0 47.0 57.0 LB
9 110592 California State University-Los Angeles 2009 10.0 31.0 39.0 LA
10 110608 California State University-Northridge 2009 9.0 41.0 49.0 N
11 110617 California State University-Sacramento 2009 10.0 42.0 50.0 Sac
12 111188 California State University Maritime Academy 2009 43.0 62.0 64.0 Mar
13 366711 California State University-San Marcos 2009 12.0 44.0 47.0 SM
14 409698 California State University-Monterey Bay 2009 12.0 36.0 41.0 MB
15 441937 California State University-Channel Islands 2009 NaN NaN NaN CI
In [6]:
GR2019_df.info
Out[6]:
<bound method DataFrame.info of     unitid                              institution name  year  \
0   110486       California State University-Bakersfield  2019   
1   110495        California State University-Stanislaus  2019   
2   110510    California State University-San Bernardino  2019   
3   110538             California State University-Chico  2019   
4   110547   California State University-Dominguez Hills  2019   
5   110556            California State University-Fresno  2019   
6   110565         California State University-Fullerton  2019   
7   110574          California State University-East Bay  2019   
8   110583        California State University-Long Beach  2019   
9   110592       California State University-Los Angeles  2019   
10  110608        California State University-Northridge  2019   
11  110617        California State University-Sacramento  2019   
12  111188  California State University Maritime Academy  2019   
13  366711        California State University-San Marcos  2019   
14  409698      California State University-Monterey Bay  2019   
15  441937   California State University-Channel Islands  2019   

    GR200_19.4-year Graduation rate - bachelor's degree within 100% of normal time  \
0                                                  14                                
1                                                  11                                
2                                                  12                                
3                                                  26                                
4                                                   6                                
5                                                  15                                
6                                                  22                                
7                                                  10                                
8                                                  16                                
9                                                   6                                
10                                                 13                                
11                                                  9                                
12                                                 47                                
13                                                 14                                
14                                                 23                                
15                                                 26                                

    GR200_19.6-year Graduation rate - bachelor's degree within 150% of normal time  \
0                                                  41                                
1                                                  53                                
2                                                  54                                
3                                                  66                                
4                                                  43                                
5                                                  56                                
6                                                  66                                
7                                                  42                                
8                                                  69                                
9                                                  47                                
10                                                 51                                
11                                                 48                                
12                                                 64                                
13                                                 53                                
14                                                 60                                
15                                                 59                                

    GR200_19.8-year Graduation rate - bachelor's degree within 200% of normal time  
0                                                  47                               
1                                                  60                               
2                                                  62                               
3                                                  69                               
4                                                  49                               
5                                                  63                               
6                                                  72                               
7                                                  49                               
8                                                  75                               
9                                                  56                               
10                                                 57                               
11                                                 58                               
12                                                 68                               
13                                                 58                               
14                                                 62                               
15                                                 63                               >
In [10]:
colName_2009 = GR2009_df.columns[4]
colName_2009
Out[10]:
"GR200_09_RV.6-year Graduation rate - bachelor's degree within 150% of normal time"
In [11]:
colName_2019 = GR2019_df.columns[4]
colName_2019
Out[11]:
"GR200_19.6-year Graduation rate - bachelor's degree within 150% of normal time"
In [22]:
grRatios = GR2019_df[colName_2019] / GR2009_df[colName_2009]
grRatios = grRatios[~np.isnan(grRatios)]
fullInstNames = GR2019_df["institution name"]
fullInstNames
Out[22]:
0          California State University-Bakersfield
1           California State University-Stanislaus
2       California State University-San Bernardino
3                California State University-Chico
4      California State University-Dominguez Hills
5               California State University-Fresno
6            California State University-Fullerton
7             California State University-East Bay
8           California State University-Long Beach
9          California State University-Los Angeles
10          California State University-Northridge
11          California State University-Sacramento
12    California State University Maritime Academy
13          California State University-San Marcos
14        California State University-Monterey Bay
15     California State University-Channel Islands
Name: institution name, dtype: object
In [30]:
d = np.array(["B", "Stan",  "SB", "C",  "DH", "Fres", "Full", "EB",
             "LB", "LA", "N", "Sac", "Mar", "SM", "MB", "CI"])
codedInstNames = pd.Series(name="coded names", data=d)
codedInstNames
Out[30]:
0        B
1     Stan
2       SB
3        C
4       DH
5     Fres
6     Full
7       EB
8       LB
9       LA
10       N
11     Sac
12     Mar
13      SM
14      MB
15      CI
Name: coded names, dtype: object
In [37]:
GR2009_df['coded names'] = codedInstNames.values
GR2009_df
Out[37]:
unitid institution name year GR200_09_RV.4-year Graduation rate – bachelor’s degree within 100% of normal time GR200_09_RV.6-year Graduation rate – bachelor’s degree within 150% of normal time GR200_09_RV.8-year Graduation rate – bachelor’s degree within 200% of normal time coded names
0 110486 California State University-Bakersfield 2009 14.0 40.0 46.0 B
1 110495 California State University-Stanislaus 2009 19.0 52.0 57.0 Stan
2 110510 California State University-San Bernardino 2009 10.0 38.0 45.0 SB
3 110538 California State University-Chico 2009 15.0 52.0 57.0 C
4 110547 California State University-Dominguez Hills 2009 5.0 28.0 37.0 DH
5 110556 California State University-Fresno 2009 14.0 48.0 55.0 Fres
6 110565 California State University-Fullerton 2009 15.0 50.0 57.0 Full
7 110574 California State University-East Bay 2009 11.0 40.0 46.0 EB
8 110583 California State University-Long Beach 2009 10.0 47.0 57.0 LB
9 110592 California State University-Los Angeles 2009 10.0 31.0 39.0 LA
10 110608 California State University-Northridge 2009 9.0 41.0 49.0 N
11 110617 California State University-Sacramento 2009 10.0 42.0 50.0 Sac
12 111188 California State University Maritime Academy 2009 43.0 62.0 64.0 Mar
13 366711 California State University-San Marcos 2009 12.0 44.0 47.0 SM
14 409698 California State University-Monterey Bay 2009 12.0 36.0 41.0 MB
15 441937 California State University-Channel Islands 2009 NaN NaN NaN CI
In [38]:
GR2019_df['coded names'] = codedInstNames.values

Plot recent and past rates on identity scatter

In [71]:
xLims = np.array([20, 75])
yLims = xLims
fig, ax = plt.subplots(nrows=1, ncols=1)
fig.set_size_inches(9,9)
ax.plot([xLims[0], xLims[1]], [yLims[0], yLims[1]], '--', 
        color='gray', alpha=0.5)
ax.scatter(GR2009_df[colName_2009], GR2019_df[colName_2019], marker='.')
for i, txt in enumerate(GR2009_df["coded names"]):
    ax.annotate(txt, 
                (GR2009_df[colName_2009][i], GR2019_df[colName_2019][i]),
                size=15
               )
    
ax.set_aspect(1)
ax.set_xlabel('2009 Graduation rate (%)',size=17)
ax.set_xlim(xLims)
ax.set_ylabel('2019 Graduation rate (%)',size=17)
ax.set_ylim(yLims)
ax.set_title("CSU Graduation Rates by Campus over 10-year span (2009, 2019)\nBachelor\'s Degree within 150% of normal time\n", size=20)
Out[71]:
Text(0.5, 1.0, "CSU Graduation Rates by Campus over 10-year span (2009, 2019)\nBachelor's Degree within 150% of normal time\n")

Generate histogram and Q-Q plot of rate ratios

In [80]:
fig2, ax2 = plt.subplots(nrows=1, ncols=2)
fig2.set_size_inches(11,5)
#ax2[0].hist(grRatios)
nps.histPlotter(10, grRatios, axes=ax2[0])
ax2[0].set_xlabel('rate ratio', size=17)
ax2[0].set_ylabel('count', size=17)
ax2[0].set_title('Distrib. of Grad. Rate Ratios', size=20)

nps.qqPlotter_normal(grRatios, 10, axes=ax2[1])
ax2[1].set_xlabel('data quantiles', size=17)
ax2[1].set_ylabel('theoretical normal quantiles', size=17)
ax2[1].set_title('Q-Q comparison plot', size=20)
Out[80]:
Text(0.5, 1.0, 'Q-Q comparison plot')

From the plots above, we are not convinced that graduation rate ratios are normally-distributed. To determine statistical significance of the observed increase in recent rates, the best approach might be to employ a non-parametric bootstrap of studentized hypothesis test or confidence limits.

In [ ]:
 

Leave a Reply

Your email address will not be published. Required fields are marked *