In my last post, I talked about how to programmatically process and cleanup NJASK data. In this post, we'll extend the NJASK functions to the High School Proficiency Assessment (HSPA), and to the old Grade Eight Proficiency Assessment (GEPA). With functions that can access each of those data sources, we'll be ready to write a general wrapper that simplifies access to relevant state assessment data.

HSPA

Much like the NJASK data in posts 1 and 2, we're going to read from a fixed width file on the state website, use a layout file to name the variables, and do some post-processing. I also wrote up how to process the HSPA metadata, if data processing is your thing.

Load in those processed files:

library(readr)
library(dplyr)
library(magrittr)
load(file = 'datasets/hspa_layout.rda')
load(file = 'datasets/hspa2010_layout.rda')
head(layout_hspa)
##   field_start_position field_end_position field_length data_type
## 1                    1                  9            9      Text
## 2                    1                  9            9      Text
## 3                    1                  2            2      Text
## 4                    3                  6            4      Text
## 5                    7                  9            3      Text
## 6                   10                 59           50      Text
##     description                                              comments
## 1    RECORD KEY                                                      
## 2      CDS Code                                                      
## 3   County Code                                                      
## 4 District Code Applicable only for district and school aggregations.
## 5   School Code              Applicable only for school aggregations.
## 6   County Name                                                      
##                                                                                                                                                                                               valid_values
## 1                                                                                                                                                                                                         
## 2 CDS codes for schools and districts\nTHE FIRST TWO POSITIONS WILL INCLUDE THE FOLLOWING AGGREGATION CODES: NS = Non-Special Needs; SN = Special Needs; ST = State; A = DFG A; B = DFG B; CD= DFG CD\x85.
## 3                                                                                                       01, 03, 05, 07, 09, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 80
## 4                                                                                                                                                                                      0100 to 9999, blank
## 5                                                                                                                                                                                        001 to 999, blank
## 6                                                                                                                                     A to Z, blank;  Applicable only for district and school aggregations
##   spanner1 spanner2    final_name
## 1                      RECORD_KEY
## 2                        CDS_Code
## 3                     County_Code
## 4                   District_Code
## 5                     School_Code
## 6                     County_Name

Use the layout file to process an example HSPA data file:

hspa_url <- 'http://www.state.nj.us/education/schools/achievement/14/hspa/state_summary.txt'

hspa_ex <- readr::read_fwf(
  file = hspa_url,
  col_positions = readr::fwf_positions(
    start = layout_hspa$field_start_position,
    end = layout_hspa$field_end_position,
    col_names = layout_hspa$final_name
  ),
  na = "*"
)
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   DFG = col_integer(),
##   Special_Needs = col_integer(),
##   TOTAL_POPULATION_Number_Enrolled_LAL = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Advanced_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_Number_Enrolled_LAL = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Partially_Proficient_Percentage = col_integer()
##   # ... with 202 more columns
## )
## See spec(...) for full column specifications.
## Warning: 1484 parsing failures.
## row                                                                        col    expected      actual
##   1 SCIENCE_SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_Number_of_Valid_Scale_Scores 6 chars     0          
##   1 NA                                                                         559 columns 555 columns
##   2 SCIENCE_SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_Number_of_Valid_Scale_Scores 6 chars     0          
##   2 NA                                                                         559 columns 555 columns
##   3 SCIENCE_SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_Number_of_Valid_Scale_Scores 6 chars     0          
## ... .......................................................................... ........... ...........
## See problems(...) for more details.
hspa_ex %>% as.data.frame() %>% select(CDS_Code:TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean) %>% head()
##   CDS_Code County_Code District_Code School_Code County_Name
## 1                                                           
## 2                                                           
## 3                                                           
## 4                                                           
## 5 MONMOUTH                                       ASBURY PARK
## 6 MONMOUTH                                       ASBURY PARK
##      District_Name     School_Name DFG Special_Needs
## 1                      98240   507  14             8
## 2                      83656   342  12             0
## 3                      14584   165   2             8
## 4                      11988   150   2             9
## 5                  A Y    73     1  NA             1
## 6 ASBURY PARK H.S. A Y    73     1  NA             1
##   TOTAL_POPULATION_Number_Enrolled_LAL
## 1                                97091
## 2                                82776
## 3                                14315
## 4                                11750
## 5                                   71
## 6                                   71
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present
## 1                                              67 6
## 2                                              46 5
## 3                                             190 7
## 4                                             203 7
## 5                                             366 6
## 6                                             366 6
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Voids
## 1                                         04 328
## 2                                         86 367
## 3                                         08 103
## 4                                         08  88
## 5                                         34  00
## 6                                         34  00
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Valid_Scale_Scores
## 1                                                      2368 9
## 2                                                      2396 8
## 3                                                      2201 1
## 4                                                      2187 1
## 5                                                        2039
## 6                                                        2039
##   TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage
## 1                                                           8240
## 2                                                           3656
## 3                                                           4584
## 4                                                           1988
## 5                                                             73
## 6                                                             73
##   TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage
## 1                                                    6
## 2                                                    4
## 3                                                    2
## 4                                                    2
## 5                                                   NA
## 6                                                   NA
##   TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage
## 1                                                            63
## 2                                                            42
## 3                                                            21
## 4                                                            10
## 5                                                             1
## 6                                                             1
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean
## 1                                             129
## 2                                             110
## 3                                              19
## 4                                              16
## 5                                               0
## 6                                               0

That gets us to a similar state as we were in for the NJASK data - we have all the columns identified, but there's a need for post-processing, especially for the percentage columns, which have 'One implied decimal.'

We can take the formula we wrote to process NJASK data frames and generalize it, so that it can handle both NJASK and HSPA data.

process_nj_assess <- function(df, layout) {
  #build a mask
  mask <- layout$comments == 'One implied decimal'

  #keep the names to put back in the same order
  all_names <- names(df)

  #make sure df is data frame (not dplyr data frame) so that normal subsetting
  df <- as.data.frame(df)

  #get name of last column and kill \n characters
  last_col <- names(df)[ncol(df)]
  df[, last_col] <- gsub('\n', '', df[, last_col], fixed = TRUE)

  #put some columns aside
  ignore <- df[, !mask]

  implied_decimal_fix <- function(x) {
    #strip out anything that's not a number.
    x <- as.numeric(gsub("[^\\d]+", "", x, perl=TRUE))
    x / 10
  }

  #process the columns that have an implied decimal
  processed <- df[, mask] %>%
    dplyr::mutate_each(
      dplyr::funs(implied_decimal_fix)  
    )

  #put back together 
  final <- cbind(ignore, processed)

  #reorder and return
  final %>%
    select(
      one_of(names(df))
    )
}

process_nj_assess(hspa_ex, layout_hspa) %>% select(CDS_Code:TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean) %>% head()
##   CDS_Code County_Code District_Code School_Code County_Name
## 1                                                           
## 2                                                           
## 3                                                           
## 4                                                           
## 5 MONMOUTH                                       ASBURY PARK
## 6 MONMOUTH                                       ASBURY PARK
##      District_Name     School_Name DFG Special_Needs
## 1                      98240   507  14             8
## 2                      83656   342  12             0
## 3                      14584   165   2             8
## 4                      11988   150   2             9
## 5                  A Y    73     1  NA             1
## 6 ASBURY PARK H.S. A Y    73     1  NA             1
##   TOTAL_POPULATION_Number_Enrolled_LAL
## 1                                97091
## 2                                82776
## 3                                14315
## 4                                11750
## 5                                   71
## 6                                   71
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present
## 1                                              67 6
## 2                                              46 5
## 3                                             190 7
## 4                                             203 7
## 5                                             366 6
## 6                                             366 6
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Voids
## 1                                         04 328
## 2                                         86 367
## 3                                         08 103
## 4                                         08  88
## 5                                         34  00
## 6                                         34  00
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Valid_Scale_Scores
## 1                                                      2368 9
## 2                                                      2396 8
## 3                                                      2201 1
## 4                                                      2187 1
## 5                                                        2039
## 6                                                        2039
##   TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage
## 1                                                          824.0
## 2                                                          365.6
## 3                                                          458.4
## 4                                                          198.8
## 5                                                            7.3
## 6                                                            7.3
##   TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage
## 1                                                  0.6
## 2                                                  0.4
## 3                                                  0.2
## 4                                                  0.2
## 5                                                   NA
## 6                                                   NA
##   TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage
## 1                                                           6.3
## 2                                                           4.2
## 3                                                           2.1
## 4                                                           1.0
## 5                                                           0.1
## 6                                                           0.1
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean
## 1                                            12.9
## 2                                            11.0
## 3                                             1.9
## 4                                             1.6
## 5                                             0.0
## 6                                             0.0

Yep, that totally works. Following from the NJASK example, we'll write a function to simplify fetching the HSPA data, and a final wrapper around the fetch/process steps.

get_raw_hspa <- function(year, layout=layout_hspa) {
  require(readr)

  #url paths changed in 2012
  years <- list(
    "2014"="14", "2013"="13", "2012"="2013", "2011"="2012", "2010"="2011", "2009"="2010", 
    "2008"="2009", "2007"="2008", "2006"="2007", "2005"="2006", "2004"="2005"
  )
  parsed_year <- years[[as.character(year)]]

  #filenames are screwy
  parsed_filename <- if(year > 2005) {
    "state_summary.txt"
  } else if (year == 2005) {
    "2005hspa_state_summary.txt" 
  } else if (year == 2004) {
    "hspa04state_summary.txt"
  }

  #build url
  target_url <- paste0(
    "http://www.state.nj.us/education/schools/achievement/", parsed_year, 
    "/hspa/", parsed_filename
  )

  #read_fwf
  df <- readr::read_fwf(
    file = target_url,
    col_positions = readr::fwf_positions(
      start = layout$field_start_position,
      end = layout$field_end_position,
      col_names = layout$final_name
    ),
    na = "*"
  )

  #return df
  return(df)

}

#final wrapper
fetch_hspa <- function(year) {
  if (year >= 2011) {
    hspa_df <- get_raw_hspa(year) %>% process_nj_assess(layout=layout_hspa)
  } else if (year >= 2004) {
    hspa_df <- get_raw_hspa(year, layout=layout_hspa2010) %>% process_nj_assess(layout=layout_hspa2010) 
  }

  return(hspa_df)
}

fetch_hspa(2010) %>% select(CDS_Code:TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean) %>% head()
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   DFG = col_integer(),
##   Special_Needs = col_integer(),
##   TOTAL_POPULATION_Number_Enrolled_LAL = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_Number_Enrolled_LAL = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Partially_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Proficient_Percentage = col_integer()
##   # ... with 189 more columns
## )
## See spec(...) for full column specifications.
##   CDS_Code County_Code District_Code School_Code County_Name
## 1                                                           
## 2                                                           
## 3                                                           
## 4                                                           
## 5 MONMOUTH                                       ASBURY PARK
## 6 MONMOUTH                                       ASBURY PARK
##      District_Name     School_Name DFG Special_Needs
## 1                      99548   551  19             9
## 2                      85256   356  12             6
## 3                      14292   195   7             3
## 4                      11673   179   6             4
## 5                  A Y   107     1  NA             0
## 6 ASBURY PARK H.S. A Y   107     1  NA             0
##   TOTAL_POPULATION_Number_Enrolled_LAL
## 1                                98257
## 2                                84337
## 3                                13920
## 4                                11347
## 5                                  103
## 6                                  103
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present
## 1                                             129 6
## 2                                              92 7
## 3                                             351 6
## 4                                             374 5
## 5                                             524 4
## 6                                             524 4
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Voids
## 1                                         87 184
## 2                                         00 208
## 3                                         06  43
## 4                                         89  37
## 5                                         76  00
## 6                                         76  00
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Valid_Scale_Scores
## 1                                                      2273 9
## 2                                                      2309 8
## 3                                                      2053 1
## 4                                                      2035 1
## 5                                                        1867
## 6                                                        1867
##   TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage
## 1                                                          954.8
## 2                                                          525.6
## 3                                                          429.2
## 4                                                          167.3
## 5                                                           10.7
## 6                                                           10.7
##   TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage
## 1                                                  0.6
## 2                                                  0.4
## 3                                                  0.2
## 4                                                  0.2
## 5                                                   NA
## 6                                                   NA
##   TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage
## 1                                                           7.0
## 2                                                           0.7
## 3                                                           6.3
## 4                                                           3.0
## 5                                                           0.4
## 6                                                           0.4
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean
## 1                                            13.9
## 2                                            11.3
## 3                                             2.6
## 4                                             2.2
## 5                                             0.0
## 6                                             0.0

Nice! NJASK and HSPA down, GEPA data to go.

GEPA

Load in the processed GEPA layout file, and the old NJASK layout file.

load(file = 'datasets/gepa_layout.rda')
load(file = 'datasets/njask05_layout.rda')

head(layout_gepa)
##   field_start_position field_end_position field_length data_type
## 1                    1                  9            9      Text
## 2                    1                  2            2      Text
## 3                    3                  6            4      Text
## 4                    7                  9            3      Text
## 5                   10                 59           50      Text
## 6                   60                109           50      Text
##     description comments
## 1      CDS Code         
## 2   County Code         
## 3 District Code         
## 4   School Code         
## 5   County Name         
## 6 District Name         
##                                                                                                                                                                                               valid_values
## 1 CDS codes for schools and districts\nTHE FIRST TWO POSITIONS WILL INCLUDE THE FOLLOWING AGGREGATION CODES: NS = Non-Special Needs; SN = Special Needs; ST = State; A = DFG A; B = DFG B; CD= DFG CD\x85.
## 2                                                                        01, 03, 05, 07, 09, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 80, ST, A, B, CD, DE, FG, GH, I, J, R, NS, SN
## 3                                                                                                                                      0100 to 9999;  Applicable only for district and school aggregations
## 4                                                                                                                                                     001 to 999;  Applicable only for school aggregations
## 5                                                                                                                                     A to Z, blank;  Applicable only for district and school aggregations
## 6                                                                                                                                     A to Z, blank;  Applicable only for district and school aggregations
##   spanner1 spanner2    final_name
## 1                        CDS_Code
## 2                     County_Code
## 3                   District_Code
## 4                     School_Code
## 5                     County_Name
## 6                   District_Name

A function to get GEPA data:

get_raw_gepa <- function(year, layout=layout_gepa) {
  require(readr)

  #url paths changed in 2012
  years <- list(
    "2007"="2008", "2006"="2007", "2005"="2006", "2004"="2005"
  )
  parsed_year <- years[[as.character(year)]]

  filename <- list(
    "2007"="state_summary.txt", "2006"="state_summary.txt",
    "2005"="2005njgepa_state_summary.txt", "2004"="gepa04state_summary.txt"   
  )
  parsed_filename <- filename[[as.character(year)]]

  #build url
  target_url <- paste0(
    "http://www.state.nj.us/education/schools/achievement/", parsed_year, 
    "/gepa/", parsed_filename
  )

  #read_fwf
  df <- readr::read_fwf(
    file = target_url,
    col_positions = readr::fwf_positions(
      start = layout$field_start_position,
      end = layout$field_end_position,
      col_names = layout$final_name
    ),
    na = "*"
  )

  #return df
  return(df)

}

gepa_ex <- get_raw_gepa(2007)
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Special_Needs_(Abbott)_district_flag` = col_integer(),
##   MALE_SCIENCE_Scale_Score_Mean = col_integer(),
##   MIGRANT_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
##   MIGRANT_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
##   PACIFIC_ISLANDER_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
##   `NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean` = col_integer()
## )
## See spec(...) for full column specifications.
## Warning: 2448 parsing failures.
## row                                                              col    expected      actual
##   1 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars     1          
##   1 NA                                                               486 columns 484 columns
##   2 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars     1          
##   2 NA                                                               486 columns 484 columns
##   3 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars     1          
## ... ................................................................ ........... ...........
## See problems(...) for more details.
gepa_ex %>% as.data.frame() %>% select(CDS_Code:TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean) %>% head()
##    CDS_Code County_Code District_Code School_Code County_Name
## 1        ST                                                  
## 2        NS                                                  
## 3        SN                                                  
## 4         A                                                  
## 5 250100000          MO          NMOU          TH   ASBURY PA
## 6 250100070          MO          NMOU          TH   ASBURY PA
##                                        District_Name
## 1                                                   
## 2                                                   
## 3                                                   
## 4                                                   
## 5                                                 RK
## 6 RK                                       ASBURY PA
##                                      School_Name DFG_Flag
## 1                                         108474       00
## 2                                         087774       00
## 3                                         020700       00
## 4                                         017401       00
## 5                                          A Y *     <NA>
## 6 RK M.S.                                  A Y *     <NA>
##   Special_Needs_(Abbott)_district_flag TOTAL_POPULATION_Number_Enrolled
## 1                                    0                           667001
## 2                                    0                           371000
## 3                                    0                           296000
## 4                                    0                           264000
## 5                                   NA                             <NA>
## 6                                   NA                             <NA>
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present
## 1                                            262000
## 2                                            668000
## 3                                            594000
## 4                                            509000
## 5                                              <NA>
## 6                                              <NA>
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Voids
## 1                                         680105
## 2                                         545086
## 3                                         135019
## 4                                         102016
## 5                                           <NA>
## 6                                           <NA>
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_APA
## 1                                       865026
## 2                                       190020
## 3                                       675052
## 4                                       526053
## 5                                          071
## 6                                          071
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Valid_Scale_Scores
## 1                                                      406240
## 2                                                      506630
## 3                                                      204500
## 4                                                      104420
## 5                                                      502780
## 6                                                      502780
##   TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage
## 1                                                           1132
## 2                                                           1322
## 3                                                           0281
## 4                                                           0271
## 5                                                           0071
## 6                                                           0071
##   TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage
## 1                                                 1490
## 2                                                 1930
## 3                                                 9530
## 4                                                 9460
## 5                                                  842
## 6                                                  842
##   TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage
## 1                                                          0066
## 2                                                          0038
## 3                                                          0027
## 4                                                          0024
## 5                                                          <NA>
## 6                                                          <NA>
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean
## 1                                            1000
## 2                                            5000
## 3                                            6000
## 4                                            5000
## 5                                            <NA>
## 6                                            <NA>

Can we process the GEPA df using our existing function?

process_nj_assess(gepa_ex, layout_gepa) %>% select(CDS_Code:TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean) %>% head()
##    CDS_Code County_Code District_Code School_Code County_Name
## 1        ST                                                  
## 2        NS                                                  
## 3        SN                                                  
## 4         A                                                  
## 5 250100000          MO          NMOU          TH   ASBURY PA
## 6 250100070          MO          NMOU          TH   ASBURY PA
##                                        District_Name
## 1                                                   
## 2                                                   
## 3                                                   
## 4                                                   
## 5                                                 RK
## 6 RK                                       ASBURY PA
##                                      School_Name DFG_Flag
## 1                                         108474       00
## 2                                         087774       00
## 3                                         020700       00
## 4                                         017401       00
## 5                                          A Y *     <NA>
## 6 RK M.S.                                  A Y *     <NA>
##   Special_Needs_(Abbott)_district_flag TOTAL_POPULATION_Number_Enrolled
## 1                                    0                           667001
## 2                                    0                           371000
## 3                                    0                           296000
## 4                                    0                           264000
## 5                                   NA                             <NA>
## 6                                   NA                             <NA>
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present
## 1                                            262000
## 2                                            668000
## 3                                            594000
## 4                                            509000
## 5                                              <NA>
## 6                                              <NA>
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Voids
## 1                                         680105
## 2                                         545086
## 3                                         135019
## 4                                         102016
## 5                                           <NA>
## 6                                           <NA>
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_APA
## 1                                       865026
## 2                                       190020
## 3                                       675052
## 4                                       526053
## 5                                          071
## 6                                          071
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Valid_Scale_Scores
## 1                                                      406240
## 2                                                      506630
## 3                                                      204500
## 4                                                      104420
## 5                                                      502780
## 6                                                      502780
##   TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage
## 1                                                          113.2
## 2                                                          132.2
## 3                                                           28.1
## 4                                                           27.1
## 5                                                            7.1
## 6                                                            7.1
##   TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage
## 1                                                149.0
## 2                                                193.0
## 3                                                953.0
## 4                                                946.0
## 5                                                 84.2
## 6                                                 84.2
##   TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage
## 1                                                           6.6
## 2                                                           3.8
## 3                                                           2.7
## 4                                                           2.4
## 5                                                            NA
## 6                                                            NA
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean
## 1                                             100
## 2                                             500
## 3                                             600
## 4                                             500
## 5                                              NA
## 6                                              NA

Yes, totally.
Final step: write all of that into a final wrapper function:

#final wrapper
fetch_gepa <- function(year) {
  get_raw_gepa(year) %>% process_nj_assess(layout=layout_gepa)

}

fetch_gepa(2007) %>% as.data.frame() %>% select(CDS_Code:TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean) %>% head()
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Special_Needs_(Abbott)_district_flag` = col_integer(),
##   MALE_SCIENCE_Scale_Score_Mean = col_integer(),
##   MIGRANT_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
##   MIGRANT_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
##   PACIFIC_ISLANDER_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
##   `NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean` = col_integer()
## )
## See spec(...) for full column specifications.
## Warning: 2448 parsing failures.
## row                                                              col    expected      actual
##   1 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars     1          
##   1 NA                                                               486 columns 484 columns
##   2 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars     1          
##   2 NA                                                               486 columns 484 columns
##   3 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars     1          
## ... ................................................................ ........... ...........
## See problems(...) for more details.
##    CDS_Code County_Code District_Code School_Code County_Name
## 1        ST                                                  
## 2        NS                                                  
## 3        SN                                                  
## 4         A                                                  
## 5 250100000          MO          NMOU          TH   ASBURY PA
## 6 250100070          MO          NMOU          TH   ASBURY PA
##                                        District_Name
## 1                                                   
## 2                                                   
## 3                                                   
## 4                                                   
## 5                                                 RK
## 6 RK                                       ASBURY PA
##                                      School_Name DFG_Flag
## 1                                         108474       00
## 2                                         087774       00
## 3                                         020700       00
## 4                                         017401       00
## 5                                          A Y *     <NA>
## 6 RK M.S.                                  A Y *     <NA>
##   Special_Needs_(Abbott)_district_flag TOTAL_POPULATION_Number_Enrolled
## 1                                    0                           667001
## 2                                    0                           371000
## 3                                    0                           296000
## 4                                    0                           264000
## 5                                   NA                             <NA>
## 6                                   NA                             <NA>
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present
## 1                                            262000
## 2                                            668000
## 3                                            594000
## 4                                            509000
## 5                                              <NA>
## 6                                              <NA>
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Voids
## 1                                         680105
## 2                                         545086
## 3                                         135019
## 4                                         102016
## 5                                           <NA>
## 6                                           <NA>
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_APA
## 1                                       865026
## 2                                       190020
## 3                                       675052
## 4                                       526053
## 5                                          071
## 6                                          071
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Valid_Scale_Scores
## 1                                                      406240
## 2                                                      506630
## 3                                                      204500
## 4                                                      104420
## 5                                                      502780
## 6                                                      502780
##   TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage
## 1                                                          113.2
## 2                                                          132.2
## 3                                                           28.1
## 4                                                           27.1
## 5                                                            7.1
## 6                                                            7.1
##   TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage
## 1                                                149.0
## 2                                                193.0
## 3                                                953.0
## 4                                                946.0
## 5                                                 84.2
## 6                                                 84.2
##   TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage
## 1                                                           6.6
## 2                                                           3.8
## 3                                                           2.7
## 4                                                           2.4
## 5                                                            NA
## 6                                                            NA
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean
## 1                                             100
## 2                                             500
## 3                                             600
## 4                                             500
## 5                                              NA
## 6                                              NA

In the next post in this series, we'll take these individual NJASK, HSPA, and GEPA functions and write one wrapper to rule them all, allowing data to be easily fetched for any year/grade.