In my last post, I talked about how to programmatically process and cleanup NJASK data. In this post, we'll extend the NJASK functions to the High School Proficiency Assessment (HSPA), and to the old Grade Eight Proficiency Assessment (GEPA). With functions that can access each of those data sources, we'll be ready to write a general wrapper that simplifies access to relevant state assessment data.
HSPA
Much like the NJASK data in posts 1 and 2, we're going to read from a fixed width file on the state website, use a layout file to name the variables, and do some post-processing. I also wrote up how to process the HSPA metadata, if data processing is your thing.
Load in those processed files:
library(readr)
library(dplyr)
library(magrittr)
load(file = 'datasets/hspa_layout.rda')
load(file = 'datasets/hspa2010_layout.rda')
head(layout_hspa)
## field_start_position field_end_position field_length data_type
## 1 1 9 9 Text
## 2 1 9 9 Text
## 3 1 2 2 Text
## 4 3 6 4 Text
## 5 7 9 3 Text
## 6 10 59 50 Text
## description comments
## 1 RECORD KEY
## 2 CDS Code
## 3 County Code
## 4 District Code Applicable only for district and school aggregations.
## 5 School Code Applicable only for school aggregations.
## 6 County Name
## valid_values
## 1
## 2 CDS codes for schools and districts\nTHE FIRST TWO POSITIONS WILL INCLUDE THE FOLLOWING AGGREGATION CODES: NS = Non-Special Needs; SN = Special Needs; ST = State; A = DFG A; B = DFG B; CD= DFG CD\x85.
## 3 01, 03, 05, 07, 09, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 80
## 4 0100 to 9999, blank
## 5 001 to 999, blank
## 6 A to Z, blank; Applicable only for district and school aggregations
## spanner1 spanner2 final_name
## 1 RECORD_KEY
## 2 CDS_Code
## 3 County_Code
## 4 District_Code
## 5 School_Code
## 6 County_Name
Use the layout file to process an example HSPA data file:
hspa_url <- 'http://www.state.nj.us/education/schools/achievement/14/hspa/state_summary.txt'
hspa_ex <- readr::read_fwf(
file = hspa_url,
col_positions = readr::fwf_positions(
start = layout_hspa$field_start_position,
end = layout_hspa$field_end_position,
col_names = layout_hspa$final_name
),
na = "*"
)
## Parsed with column specification:
## cols(
## .default = col_character(),
## DFG = col_integer(),
## Special_Needs = col_integer(),
## TOTAL_POPULATION_Number_Enrolled_LAL = col_integer(),
## TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage = col_integer(),
## TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
## TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage = col_integer(),
## TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
## TOTAL_POPULATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
## TOTAL_POPULATION_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
## TOTAL_POPULATION_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
## TOTAL_POPULATION_SCIENCE_Proficient_Percentage = col_integer(),
## TOTAL_POPULATION_SCIENCE_Advanced_Proficient_Percentage = col_integer(),
## TOTAL_POPULATION_SCIENCE_Scale_Score_Mean = col_integer(),
## GENERAL_EDUCATION_Number_Enrolled_LAL = col_integer(),
## GENERAL_EDUCATION_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
## GENERAL_EDUCATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
## GENERAL_EDUCATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
## GENERAL_EDUCATION_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
## GENERAL_EDUCATION_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
## GENERAL_EDUCATION_SCIENCE_Partially_Proficient_Percentage = col_integer()
## # ... with 202 more columns
## )
## See spec(...) for full column specifications.
## Warning: 1484 parsing failures.
## row col expected actual
## 1 SCIENCE_SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_Number_of_Valid_Scale_Scores 6 chars 0
## 1 NA 559 columns 555 columns
## 2 SCIENCE_SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_Number_of_Valid_Scale_Scores 6 chars 0
## 2 NA 559 columns 555 columns
## 3 SCIENCE_SPECIAL_EDUCATION_WITH_ACCOMMODATIONS_Number_of_Valid_Scale_Scores 6 chars 0
## ... .......................................................................... ........... ...........
## See problems(...) for more details.
hspa_ex %>% as.data.frame() %>% select(CDS_Code:TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean) %>% head()
## CDS_Code County_Code District_Code School_Code County_Name
## 1
## 2
## 3
## 4
## 5 MONMOUTH ASBURY PARK
## 6 MONMOUTH ASBURY PARK
## District_Name School_Name DFG Special_Needs
## 1 98240 507 14 8
## 2 83656 342 12 0
## 3 14584 165 2 8
## 4 11988 150 2 9
## 5 A Y 73 1 NA 1
## 6 ASBURY PARK H.S. A Y 73 1 NA 1
## TOTAL_POPULATION_Number_Enrolled_LAL
## 1 97091
## 2 82776
## 3 14315
## 4 11750
## 5 71
## 6 71
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present
## 1 67 6
## 2 46 5
## 3 190 7
## 4 203 7
## 5 366 6
## 6 366 6
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Voids
## 1 04 328
## 2 86 367
## 3 08 103
## 4 08 88
## 5 34 00
## 6 34 00
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Valid_Scale_Scores
## 1 2368 9
## 2 2396 8
## 3 2201 1
## 4 2187 1
## 5 2039
## 6 2039
## TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage
## 1 8240
## 2 3656
## 3 4584
## 4 1988
## 5 73
## 6 73
## TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage
## 1 6
## 2 4
## 3 2
## 4 2
## 5 NA
## 6 NA
## TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage
## 1 63
## 2 42
## 3 21
## 4 10
## 5 1
## 6 1
## TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean
## 1 129
## 2 110
## 3 19
## 4 16
## 5 0
## 6 0
That gets us to a similar state as we were in for the NJASK data - we have all the columns identified, but there's a need for post-processing, especially for the percentage columns, which have 'One implied decimal.'
We can take the formula we wrote to process NJASK data frames and generalize it, so that it can handle both NJASK and HSPA data.
process_nj_assess <- function(df, layout) {
#build a mask
mask <- layout$comments == 'One implied decimal'
#keep the names to put back in the same order
all_names <- names(df)
#make sure df is data frame (not dplyr data frame) so that normal subsetting
df <- as.data.frame(df)
#get name of last column and kill \n characters
last_col <- names(df)[ncol(df)]
df[, last_col] <- gsub('\n', '', df[, last_col], fixed = TRUE)
#put some columns aside
ignore <- df[, !mask]
implied_decimal_fix <- function(x) {
#strip out anything that's not a number.
x <- as.numeric(gsub("[^\\d]+", "", x, perl=TRUE))
x / 10
}
#process the columns that have an implied decimal
processed <- df[, mask] %>%
dplyr::mutate_each(
dplyr::funs(implied_decimal_fix)
)
#put back together
final <- cbind(ignore, processed)
#reorder and return
final %>%
select(
one_of(names(df))
)
}
process_nj_assess(hspa_ex, layout_hspa) %>% select(CDS_Code:TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean) %>% head()
## CDS_Code County_Code District_Code School_Code County_Name
## 1
## 2
## 3
## 4
## 5 MONMOUTH ASBURY PARK
## 6 MONMOUTH ASBURY PARK
## District_Name School_Name DFG Special_Needs
## 1 98240 507 14 8
## 2 83656 342 12 0
## 3 14584 165 2 8
## 4 11988 150 2 9
## 5 A Y 73 1 NA 1
## 6 ASBURY PARK H.S. A Y 73 1 NA 1
## TOTAL_POPULATION_Number_Enrolled_LAL
## 1 97091
## 2 82776
## 3 14315
## 4 11750
## 5 71
## 6 71
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present
## 1 67 6
## 2 46 5
## 3 190 7
## 4 203 7
## 5 366 6
## 6 366 6
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Voids
## 1 04 328
## 2 86 367
## 3 08 103
## 4 08 88
## 5 34 00
## 6 34 00
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Valid_Scale_Scores
## 1 2368 9
## 2 2396 8
## 3 2201 1
## 4 2187 1
## 5 2039
## 6 2039
## TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage
## 1 824.0
## 2 365.6
## 3 458.4
## 4 198.8
## 5 7.3
## 6 7.3
## TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage
## 1 0.6
## 2 0.4
## 3 0.2
## 4 0.2
## 5 NA
## 6 NA
## TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage
## 1 6.3
## 2 4.2
## 3 2.1
## 4 1.0
## 5 0.1
## 6 0.1
## TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean
## 1 12.9
## 2 11.0
## 3 1.9
## 4 1.6
## 5 0.0
## 6 0.0
Yep, that totally works. Following from the NJASK example, we'll write a function to simplify fetching the HSPA data, and a final wrapper around the fetch/process steps.
get_raw_hspa <- function(year, layout=layout_hspa) {
require(readr)
#url paths changed in 2012
years <- list(
"2014"="14", "2013"="13", "2012"="2013", "2011"="2012", "2010"="2011", "2009"="2010",
"2008"="2009", "2007"="2008", "2006"="2007", "2005"="2006", "2004"="2005"
)
parsed_year <- years[[as.character(year)]]
#filenames are screwy
parsed_filename <- if(year > 2005) {
"state_summary.txt"
} else if (year == 2005) {
"2005hspa_state_summary.txt"
} else if (year == 2004) {
"hspa04state_summary.txt"
}
#build url
target_url <- paste0(
"http://www.state.nj.us/education/schools/achievement/", parsed_year,
"/hspa/", parsed_filename
)
#read_fwf
df <- readr::read_fwf(
file = target_url,
col_positions = readr::fwf_positions(
start = layout$field_start_position,
end = layout$field_end_position,
col_names = layout$final_name
),
na = "*"
)
#return df
return(df)
}
#final wrapper
fetch_hspa <- function(year) {
if (year >= 2011) {
hspa_df <- get_raw_hspa(year) %>% process_nj_assess(layout=layout_hspa)
} else if (year >= 2004) {
hspa_df <- get_raw_hspa(year, layout=layout_hspa2010) %>% process_nj_assess(layout=layout_hspa2010)
}
return(hspa_df)
}
fetch_hspa(2010) %>% select(CDS_Code:TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean) %>% head()
## Parsed with column specification:
## cols(
## .default = col_character(),
## DFG = col_integer(),
## Special_Needs = col_integer(),
## TOTAL_POPULATION_Number_Enrolled_LAL = col_integer(),
## TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage = col_integer(),
## TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
## TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
## TOTAL_POPULATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
## TOTAL_POPULATION_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
## TOTAL_POPULATION_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
## TOTAL_POPULATION_SCIENCE_Proficient_Percentage = col_integer(),
## TOTAL_POPULATION_SCIENCE_Scale_Score_Mean = col_integer(),
## GENERAL_EDUCATION_Number_Enrolled_LAL = col_integer(),
## GENERAL_EDUCATION_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
## GENERAL_EDUCATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage = col_integer(),
## GENERAL_EDUCATION_LANGUAGE_ARTS_Scale_Score_Mean = col_integer(),
## GENERAL_EDUCATION_MATHEMATICS_Number_Enrolled_Math = col_integer(),
## GENERAL_EDUCATION_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
## GENERAL_EDUCATION_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
## GENERAL_EDUCATION_SCIENCE_Partially_Proficient_Percentage = col_integer(),
## GENERAL_EDUCATION_SCIENCE_Proficient_Percentage = col_integer()
## # ... with 189 more columns
## )
## See spec(...) for full column specifications.
## CDS_Code County_Code District_Code School_Code County_Name
## 1
## 2
## 3
## 4
## 5 MONMOUTH ASBURY PARK
## 6 MONMOUTH ASBURY PARK
## District_Name School_Name DFG Special_Needs
## 1 99548 551 19 9
## 2 85256 356 12 6
## 3 14292 195 7 3
## 4 11673 179 6 4
## 5 A Y 107 1 NA 0
## 6 ASBURY PARK H.S. A Y 107 1 NA 0
## TOTAL_POPULATION_Number_Enrolled_LAL
## 1 98257
## 2 84337
## 3 13920
## 4 11347
## 5 103
## 6 103
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present
## 1 129 6
## 2 92 7
## 3 351 6
## 4 374 5
## 5 524 4
## 6 524 4
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Voids
## 1 87 184
## 2 00 208
## 3 06 43
## 4 89 37
## 5 76 00
## 6 76 00
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Valid_Scale_Scores
## 1 2273 9
## 2 2309 8
## 3 2053 1
## 4 2035 1
## 5 1867
## 6 1867
## TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage
## 1 954.8
## 2 525.6
## 3 429.2
## 4 167.3
## 5 10.7
## 6 10.7
## TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage
## 1 0.6
## 2 0.4
## 3 0.2
## 4 0.2
## 5 NA
## 6 NA
## TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage
## 1 7.0
## 2 0.7
## 3 6.3
## 4 3.0
## 5 0.4
## 6 0.4
## TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean
## 1 13.9
## 2 11.3
## 3 2.6
## 4 2.2
## 5 0.0
## 6 0.0
Nice! NJASK and HSPA down, GEPA data to go.
GEPA
Load in the processed GEPA layout file, and the old NJASK layout file.
load(file = 'datasets/gepa_layout.rda')
load(file = 'datasets/njask05_layout.rda')
head(layout_gepa)
## field_start_position field_end_position field_length data_type
## 1 1 9 9 Text
## 2 1 2 2 Text
## 3 3 6 4 Text
## 4 7 9 3 Text
## 5 10 59 50 Text
## 6 60 109 50 Text
## description comments
## 1 CDS Code
## 2 County Code
## 3 District Code
## 4 School Code
## 5 County Name
## 6 District Name
## valid_values
## 1 CDS codes for schools and districts\nTHE FIRST TWO POSITIONS WILL INCLUDE THE FOLLOWING AGGREGATION CODES: NS = Non-Special Needs; SN = Special Needs; ST = State; A = DFG A; B = DFG B; CD= DFG CD\x85.
## 2 01, 03, 05, 07, 09, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 80, ST, A, B, CD, DE, FG, GH, I, J, R, NS, SN
## 3 0100 to 9999; Applicable only for district and school aggregations
## 4 001 to 999; Applicable only for school aggregations
## 5 A to Z, blank; Applicable only for district and school aggregations
## 6 A to Z, blank; Applicable only for district and school aggregations
## spanner1 spanner2 final_name
## 1 CDS_Code
## 2 County_Code
## 3 District_Code
## 4 School_Code
## 5 County_Name
## 6 District_Name
A function to get GEPA data:
get_raw_gepa <- function(year, layout=layout_gepa) {
require(readr)
#url paths changed in 2012
years <- list(
"2007"="2008", "2006"="2007", "2005"="2006", "2004"="2005"
)
parsed_year <- years[[as.character(year)]]
filename <- list(
"2007"="state_summary.txt", "2006"="state_summary.txt",
"2005"="2005njgepa_state_summary.txt", "2004"="gepa04state_summary.txt"
)
parsed_filename <- filename[[as.character(year)]]
#build url
target_url <- paste0(
"http://www.state.nj.us/education/schools/achievement/", parsed_year,
"/gepa/", parsed_filename
)
#read_fwf
df <- readr::read_fwf(
file = target_url,
col_positions = readr::fwf_positions(
start = layout$field_start_position,
end = layout$field_end_position,
col_names = layout$final_name
),
na = "*"
)
#return df
return(df)
}
gepa_ex <- get_raw_gepa(2007)
## Parsed with column specification:
## cols(
## .default = col_character(),
## `Special_Needs_(Abbott)_district_flag` = col_integer(),
## MALE_SCIENCE_Scale_Score_Mean = col_integer(),
## MIGRANT_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
## MIGRANT_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
## PACIFIC_ISLANDER_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
## `NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean` = col_integer()
## )
## See spec(...) for full column specifications.
## Warning: 2448 parsing failures.
## row col expected actual
## 1 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars 1
## 1 NA 486 columns 484 columns
## 2 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars 1
## 2 NA 486 columns 484 columns
## 3 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars 1
## ... ................................................................ ........... ...........
## See problems(...) for more details.
gepa_ex %>% as.data.frame() %>% select(CDS_Code:TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean) %>% head()
## CDS_Code County_Code District_Code School_Code County_Name
## 1 ST
## 2 NS
## 3 SN
## 4 A
## 5 250100000 MO NMOU TH ASBURY PA
## 6 250100070 MO NMOU TH ASBURY PA
## District_Name
## 1
## 2
## 3
## 4
## 5 RK
## 6 RK ASBURY PA
## School_Name DFG_Flag
## 1 108474 00
## 2 087774 00
## 3 020700 00
## 4 017401 00
## 5 A Y * <NA>
## 6 RK M.S. A Y * <NA>
## Special_Needs_(Abbott)_district_flag TOTAL_POPULATION_Number_Enrolled
## 1 0 667001
## 2 0 371000
## 3 0 296000
## 4 0 264000
## 5 NA <NA>
## 6 NA <NA>
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present
## 1 262000
## 2 668000
## 3 594000
## 4 509000
## 5 <NA>
## 6 <NA>
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Voids
## 1 680105
## 2 545086
## 3 135019
## 4 102016
## 5 <NA>
## 6 <NA>
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_APA
## 1 865026
## 2 190020
## 3 675052
## 4 526053
## 5 071
## 6 071
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Valid_Scale_Scores
## 1 406240
## 2 506630
## 3 204500
## 4 104420
## 5 502780
## 6 502780
## TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage
## 1 1132
## 2 1322
## 3 0281
## 4 0271
## 5 0071
## 6 0071
## TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage
## 1 1490
## 2 1930
## 3 9530
## 4 9460
## 5 842
## 6 842
## TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage
## 1 0066
## 2 0038
## 3 0027
## 4 0024
## 5 <NA>
## 6 <NA>
## TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean
## 1 1000
## 2 5000
## 3 6000
## 4 5000
## 5 <NA>
## 6 <NA>
Can we process the GEPA df using our existing function?
process_nj_assess(gepa_ex, layout_gepa) %>% select(CDS_Code:TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean) %>% head()
## CDS_Code County_Code District_Code School_Code County_Name
## 1 ST
## 2 NS
## 3 SN
## 4 A
## 5 250100000 MO NMOU TH ASBURY PA
## 6 250100070 MO NMOU TH ASBURY PA
## District_Name
## 1
## 2
## 3
## 4
## 5 RK
## 6 RK ASBURY PA
## School_Name DFG_Flag
## 1 108474 00
## 2 087774 00
## 3 020700 00
## 4 017401 00
## 5 A Y * <NA>
## 6 RK M.S. A Y * <NA>
## Special_Needs_(Abbott)_district_flag TOTAL_POPULATION_Number_Enrolled
## 1 0 667001
## 2 0 371000
## 3 0 296000
## 4 0 264000
## 5 NA <NA>
## 6 NA <NA>
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present
## 1 262000
## 2 668000
## 3 594000
## 4 509000
## 5 <NA>
## 6 <NA>
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Voids
## 1 680105
## 2 545086
## 3 135019
## 4 102016
## 5 <NA>
## 6 <NA>
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_APA
## 1 865026
## 2 190020
## 3 675052
## 4 526053
## 5 071
## 6 071
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Valid_Scale_Scores
## 1 406240
## 2 506630
## 3 204500
## 4 104420
## 5 502780
## 6 502780
## TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage
## 1 113.2
## 2 132.2
## 3 28.1
## 4 27.1
## 5 7.1
## 6 7.1
## TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage
## 1 149.0
## 2 193.0
## 3 953.0
## 4 946.0
## 5 84.2
## 6 84.2
## TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage
## 1 6.6
## 2 3.8
## 3 2.7
## 4 2.4
## 5 NA
## 6 NA
## TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean
## 1 100
## 2 500
## 3 600
## 4 500
## 5 NA
## 6 NA
Yes, totally.
Final step: write all of that into a final wrapper function:
#final wrapper
fetch_gepa <- function(year) {
get_raw_gepa(year) %>% process_nj_assess(layout=layout_gepa)
}
fetch_gepa(2007) %>% as.data.frame() %>% select(CDS_Code:TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean) %>% head()
## Parsed with column specification:
## cols(
## .default = col_character(),
## `Special_Needs_(Abbott)_district_flag` = col_integer(),
## MALE_SCIENCE_Scale_Score_Mean = col_integer(),
## MIGRANT_MATHEMATICS_Number_of_Valid_Scale_Scores = col_integer(),
## MIGRANT_SCIENCE_Number_of_Valid_Scale_Scores = col_integer(),
## PACIFIC_ISLANDER_LANGUAGE_ARTS_Proficient_Percentage = col_integer(),
## `NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean` = col_integer()
## )
## See spec(...) for full column specifications.
## Warning: 2448 parsing failures.
## row col expected actual
## 1 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars 1
## 1 NA 486 columns 484 columns
## 2 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars 1
## 2 NA 486 columns 484 columns
## 3 NON-ECONOMICALLY_DISADVANTAGED_Non-Econ_SCIENCE_Scale_Score_Mean 4 chars 1
## ... ................................................................ ........... ...........
## See problems(...) for more details.
## CDS_Code County_Code District_Code School_Code County_Name
## 1 ST
## 2 NS
## 3 SN
## 4 A
## 5 250100000 MO NMOU TH ASBURY PA
## 6 250100070 MO NMOU TH ASBURY PA
## District_Name
## 1
## 2
## 3
## 4
## 5 RK
## 6 RK ASBURY PA
## School_Name DFG_Flag
## 1 108474 00
## 2 087774 00
## 3 020700 00
## 4 017401 00
## 5 A Y * <NA>
## 6 RK M.S. A Y * <NA>
## Special_Needs_(Abbott)_district_flag TOTAL_POPULATION_Number_Enrolled
## 1 0 667001
## 2 0 371000
## 3 0 296000
## 4 0 264000
## 5 NA <NA>
## 6 NA <NA>
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_Not_Present
## 1 262000
## 2 668000
## 3 594000
## 4 509000
## 5 <NA>
## 6 <NA>
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Voids
## 1 680105
## 2 545086
## 3 135019
## 4 102016
## 5 <NA>
## 6 <NA>
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_APA
## 1 865026
## 2 190020
## 3 675052
## 4 526053
## 5 071
## 6 071
## TOTAL_POPULATION_LANGUAGE_ARTS_Number_of_Valid_Scale_Scores
## 1 406240
## 2 506630
## 3 204500
## 4 104420
## 5 502780
## 6 502780
## TOTAL_POPULATION_LANGUAGE_ARTS_Partially_Proficient_Percentage
## 1 113.2
## 2 132.2
## 3 28.1
## 4 27.1
## 5 7.1
## 6 7.1
## TOTAL_POPULATION_LANGUAGE_ARTS_Proficient_Percentage
## 1 149.0
## 2 193.0
## 3 953.0
## 4 946.0
## 5 84.2
## 6 84.2
## TOTAL_POPULATION_LANGUAGE_ARTS_Advanced_Proficient_Percentage
## 1 6.6
## 2 3.8
## 3 2.7
## 4 2.4
## 5 NA
## 6 NA
## TOTAL_POPULATION_LANGUAGE_ARTS_Scale_Score_Mean
## 1 100
## 2 500
## 3 600
## 4 500
## 5 NA
## 6 NA
In the next post in this series, we'll take these individual NJASK, HSPA, and GEPA functions and write one wrapper to rule them all, allowing data to be easily fetched for any year/grade.