In my last post, I showed how to extend the functions I developed for NJASK data to other state assessments. In this post, I'll tie eveything together, and write some general functions that bring a wide variety of NJ state assessment data into R.

Roughly speaking, we're tying to write a function that will return data given a year and a grade. Here are the big things that need to happen:

Check that call we made is a valid grade/year combination (raising an informative error if not)
Map the grade / year call to the correct get_blank_data function (NJASK? HSPA? GEPA?)
Fetch, clean, and return the data frame.

Let's tackle each of these in turn:.

valid calls

Before we do anything, let's source in all of the functions fetch_hspa(), fetch_gepa() created in the previous two posts.

knitr::knit('05_njask-data-2.rmd', tangle=TRUE)

## [1] "05_njask-data-2.R"

source('05_njask-data-2.R')

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   TOTAL_POPULATION_Number_Enrolled_ELA = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_Not_Present = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Scale_Score_Mean = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Number_Enrolled_Science = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Number_Not_Present = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Advanced_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_Number_Enrolled_ELA = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Number_Not_Present = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_Not_Present = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Advanced_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Number_Enrolled_Science = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Number_Not_Present = col_integer()
##   # ... with 240 more columns
## )

## See spec(...) for full column specifications.

## Warning: 2578 parsing failures.
## row                                                            col    expected      actual
##   1 SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_SCIENCE_Scale_Score_Mean 4 chars     0          
##   1 NA                                                             551 columns 549 columns
##   2 SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_SCIENCE_Scale_Score_Mean 4 chars     0          
##   2 NA                                                             551 columns 549 columns
##   3 SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_SCIENCE_Scale_Score_Mean 4 chars     2          
## ... .............................................................. ........... ...........
## See problems(...) for more details.

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   TOTAL_POPULATION_Number_Enrolled_ELA = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Advanced_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_Number_Enrolled_ELA = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Number_Not_Present = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_Not_Present = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Advanced_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Scale_Score_Mean = col_integer(),
##   SPECIAL_EDUCATION_Number_Enrolled_ELA = col_integer(),
##   SPECIAL_EDUCATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage = col_integer(),
##   SPECIAL_EDUCATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   SPECIAL_EDUCATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   SPECIAL_EDUCATION_SCIENCE_Advanced_Proficient_Percentage = col_integer(),
##   SPECIAL_EDUCATION_SCIENCE_Scale_Score_Mean = col_integer(),
##   LIMITED_ENGLISH_PROFICIENT_current_and_former_Number_Enrolled_ELA = col_integer()
##   # ... with 147 more columns
## )
## See spec(...) for full column specifications.

## Warning: 3010 parsing failures.
## row                                                            col    expected      actual
##   1 SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_SCIENCE_Scale_Score_Mean 4 chars     0          
##   1 NA                                                             551 columns 549 columns
##   2 SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_SCIENCE_Scale_Score_Mean 4 chars     0          
##   2 NA                                                             551 columns 549 columns
##   3 SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_SCIENCE_Scale_Score_Mean 4 chars     2          
## ... .............................................................. ........... ...........
## See problems(...) for more details.

## Error: NA column indexes not supported

knitr::knit('06_njask-data-3.rmd', tangle=TRUE)

## [1] "06_njask-data-3.R"

source('06_njask-data-3.R')

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   DFG = col_integer(),
##   Special_Needs = col_integer(),
##   TOTAL_POPULATION_Number_Enrolled_LAL = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Advanced_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_Number_Enrolled_LAL = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Partially_Proficient_Percentage = col_integer()
##   # ... with 202 more columns
## )
## See spec(...) for full column specifications.

## Warning: 1484 parsing failures.
## row                                                                        col    expected      actual
##   1 SCIENCE_SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_Number_of_Valid_Scale_Scores 6 chars     0          
##   1 NA                                                                         559 columns 555 columns
##   2 SCIENCE_SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_Number_of_Valid_Scale_Scores 6 chars     0          
##   2 NA                                                                         559 columns 555 columns
##   3 SCIENCE_SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_Number_of_Valid_Scale_Scores 6 chars     0          
## ... .......................................................................... ........... ...........
## See problems(...) for more details.

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   DFG = col_integer(),
##   Special_Needs = col_integer(),
##   TOTAL_POPULATION_Number_Enrolled_LAL = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_Number_Enrolled_LAL = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Partially_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Proficient_Percentage = col_integer()
##   # ... with 189 more columns
## )
## See spec(...) for full column specifications.

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Special_Needs_(Abbott)_district_flag` = col_integer(),
##   MALE_SCIENCE_Scale_Score_Mean = col_integer(),
##   MIGRANT_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
##   MIGRANT_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
##   PACIFIC_ISLANDER_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
##   `NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean` = col_integer()
## )

## See spec(...) for full column specifications.

## Warning: 2448 parsing failures.
## row                                                              col    expected      actual
##   1 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars     1          
##   1 NA                                                               486 columns 484 columns
##   2 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars     1          
##   2 NA                                                               486 columns 484 columns
##   3 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars     1          
## ... ................................................................ ........... ...........
## See problems(...) for more details.

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Special_Needs_(Abbott)_district_flag` = col_integer(),
##   MALE_SCIENCE_Scale_Score_Mean = col_integer(),
##   MIGRANT_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
##   MIGRANT_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
##   PACIFIC_ISLANDER_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
##   `NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean` = col_integer()
## )
## See spec(...) for full column specifications.

## Warning: 2448 parsing failures.
## row                                                              col    expected      actual
##   1 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars     1          
##   1 NA                                                               486 columns 484 columns
##   2 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars     1          
##   2 NA                                                               486 columns 484 columns
##   3 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars     1          
## ... ................................................................ ........... ...........
## See problems(...) for more details.

This function will test if a year/grade call is valid.

valid_call <- function(year, grade) {
  #data for 2015 school year doesn't exist yet
  #common core transition started in 2015 (njask is no more)
  if(year > 2014) {
    valid_call <- FALSE
  #assessment coverage 3:8 from 2006 on.
  #NJASK fully implemented in 2008
  } else if(year >= 2006) {
    valid_call <- grade %in% c(3:8, 11)
  } else if (year >= 2004) {
    valid_call <- grade %in% c(3, 4, 8, 11)
  } else if (year < 2004) {
    valid_call <- FALSE
  }

  return(valid_call)
}

map for retrieval

This function does normal retrieval (NJASK for 3-8; HSPA for 11).

standard_assess <- function(year, grade) {
  if(grade %in% c(3:8)) {
    assess_data <- fetch_njask(year, grade)
  } else if (grade == 11) {
    assess_data <- fetch_hspa(year) 
  }

  return(assess_data)
}

Here is a mapping function that calls the correct retrieval method, given grade and year.

fetch_nj_assess <- function(year, grade) {
  require(ensurer)

  #only allow valid calls
  valid_call(year, grade) %>%
    ensure_that(
      all(.) ~ "invalid grade/year parameter passed")

  #everything post 2008 has the same grade coverage
  if (year >= 2008) {
    assess_data <- standard_assess(year, grade)

  #2006 and 2007: NJASK 3rd-7th, GEPA 8th, HSPA 11th
  } else if (year %in% c(2006, 2007)) {
    if (grade %in% c(3:7)) {
      assess_data <- standard_assess(year, grade)  
    } else if (grade == 8) {
      assess_data <- fetch_gepa(year)
    } else if (grade == 11) {
      assess_data <- fetch_hspa(year)
    }

  #2004 and 2005:  NJASK 3rd & 4th, GEPA 8th, HSPA 11th
  } else if (year %in% c(2004, 2005)) {
    if (grade %in% c(3:4)) {
      assess_data <- standard_assess(year, grade)  
    } else if (grade == 8) {
      assess_data <- fetch_gepa(year)
    } else if (grade == 11) {
      assess_data <- fetch_hspa(year)
    }

  } else {
    #if we ever reached this block, there's a problem with our `valid_call()` function
    stop("unable to match your grade/year parameters to the appropriate function.")
  }

  return(assess_data)
}

try it out:

fetch_nj_assess(2014, 6) %>% select(CDS_Code:TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean) %>% head()

## Loading required package: ensurer

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   TOTAL_POPULATION_Number_Enrolled_ELA = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Advanced_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_Number_Enrolled_ELA = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Number_Not_Present = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_Not_Present = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Advanced_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Scale_Score_Mean = col_integer(),
##   SPECIAL_EDUCATION_Number_Enrolled_ELA = col_integer(),
##   SPECIAL_EDUCATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage = col_integer(),
##   SPECIAL_EDUCATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   SPECIAL_EDUCATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   SPECIAL_EDUCATION_SCIENCE_Advanced_Proficient_Percentage = col_integer(),
##   SPECIAL_EDUCATION_SCIENCE_Scale_Score_Mean = col_integer(),
##   LIMITED_ENGLISH_PROFICIENT_current_and_former_Number_Enrolled_ELA = col_integer()
##   # ... with 147 more columns
## )

## See spec(...) for full column specifications.

## Warning: 3010 parsing failures.
## row                                                            col    expected      actual
##   1 SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_SCIENCE_Scale_Score_Mean 4 chars     0          
##   1 NA                                                             551 columns 549 columns
##   2 SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_SCIENCE_Scale_Score_Mean 4 chars     0          
##   2 NA                                                             551 columns 549 columns
##   3 SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_SCIENCE_Scale_Score_Mean 4 chars     2          
## ... .............................................................. ........... ...........
## See problems(...) for more details.

## Error: NA column indexes not supported

all together

Finally, as a convenience, let's write a function that brings down all of the NJASK data for all years and grades.

fetch_all_nj <- function() {

  #make the df of years and grades to iterate over  
  post2006_years <- c(2006:2014)
  post2006_grades <- c(3:8, 11)

  pre2006_years <- c(2004, 2005)
  pre2006_grades <- c(3, 4, 8, 11)

  #subset just for testing
  #post2006_years <- c(2006)
  #post2006_grades <- c(8, 11)
  #pre2006_grades <- c(4)

  df <- data.frame(
    year = vector(mode="numeric", length=0),
    grade = vector(mode="numeric", length=0)
  )

  for (i in post2006_years) {
    #use R recycling to make df
    int_df <- data.frame(
      year = i,
      grade = post2006_grades
    )

    df <- rbind(df, int_df)
  }

  for (j in pre2006_years) {
    #use R recycling to make df
    int_df <- data.frame(
      year = j,
      grade = pre2006_grades
    )

    df <- rbind(df, int_df)
  }

  #sort the df
  df <- df %>% dplyr::arrange(
    desc(year), grade
  )

  #to hold the results
  results <- list()

  #iterate over the df and get the data
  for (i in 1:nrow(df)) {
    this_row <- df[i, ]
    #be verbose
    row_key <- paste0('nj', this_row$year, 'gr', this_row$grade)
    print(row_key)

    #call this grade/year and attach to results list
    results[[row_key]] <-  fetch_nj_assess(this_row$year, this_row$grade)
  }

  return(results)  
}

test it:

all_nj <- fetch_all_nj()

## [1] "nj2014gr3"

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   TOTAL_POPULATION_Number_Enrolled_ELA = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present = col_integer(),
##   TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   TOTAL_POPULATION_MATHEMATICS_Number_Not_Present = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Advanced_Proficient_Percentage = col_integer(),
##   TOTAL_POPULATION_SCIENCE_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_Number_Enrolled_ELA = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Number_Not_Present = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
##   GENERAL_EDUCATION_MATHEMATICS_Number_Not_Present = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Advanced_Proficient_Percentage = col_integer(),
##   GENERAL_EDUCATION_SCIENCE_Scale_Score_Mean = col_integer(),
##   SPECIAL_EDUCATION_Number_Enrolled_ELA = col_integer(),
##   SPECIAL_EDUCATION_LANGUAGE_ARTS_Number_Not_Present = col_integer(),
##   SPECIAL_EDUCATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage = col_integer(),
##   SPECIAL_EDUCATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
##   SPECIAL_EDUCATION_MATHEMATICS_Number_Enrolled_Math = col_integer()
##   # ... with 156 more columns
## )

## See spec(...) for full column specifications.

## Warning: 3924 parsing failures.
## row                                                            col    expected      actual
##   1 SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_SCIENCE_Scale_Score_Mean 4 chars     0          
##   1 NA                                                             551 columns 549 columns
##   2 SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_SCIENCE_Scale_Score_Mean 4 chars     0          
##   2 NA                                                             551 columns 549 columns
##   3 SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_SCIENCE_Scale_Score_Mean 4 chars     2          
## ... .............................................................. ........... ...........
## See problems(...) for more details.

## Error: NA column indexes not supported

length(all_nj)

## Error in eval(expr, envir, enclos): object 'all_nj' not found

Getting NJ assessment data into R: part 4 in a series.

valid calls

map for retrieval

all together