NJ Fall Enrollment data, explained

Every tool on this site is built on one public dataset: New Jersey's annual Fall Enrollment Reports. This is everything we learned wrangling 28 years of it — the format eras, the categories that changed and exactly when, the traps, and the shape of the database we normalized it all into.

1 · The source

The data comes from the New Jersey Department of Education Fall Enrollment Reports, published at nj.gov/education/doedata/enr. It's an annual fall headcount of every student in every public school and district in the state, broken out by grade, race/ethnicity, and economic and program status. The state posts one ZIP file per school year, and the series runs from 1998–99 through 2025–26 — 28 years.

Each ZIP holds a single file (CSV or Excel) whose name and shape depend on the year. The grain underneath is consistent in spirit — numbers for each school — but the container, the column names, and even which categories exist have all changed over the years. That's the whole challenge.

2 · Three format eras

The state rebuilt the report format twice. There are three distinct generations, and a parser has to detect which one it's looking at before it can read a row.

Era	Years	Container	Shape
A	1998–99 → 2008–09 (11 yrs)	One CSV, `STAT_ENR.CSV`	One row per school × program/grade. Race is split into sex-coded columns — `WHM`/`WHF` (white male/female), `BLM`/`BLF`, `HIM`/`HIF`, and so on. You sum male+female to get a race count.
B	2009–10 → 2018–19 (10 yrs)	Excel "long" workbook. 2009–10 and 2010–11 are legacy binary `.xls` (must be converted, e.g. via LibreOffice, before a modern library will open them); 2011–12 onward are `.xlsx`. Filenames drift: `enr.xls`, `enr.xlsx`, `EnrollmentReport.xlsx`.	A long "SQL Results"–style sheet: one row per school × grade × program, with race still as race × gender columns. Header naming shifts (`WhiteM` vs `WHM`), and some years mislabel the grade column "Total" on every row, keeping the real grade in a separate `PRGCODE`.
C	2019–20 → 2025–26 (7 yrs)	Modern `.xlsx` with separate State / District / School sheets.	One row per school, with plainly-named columns (`White`, `Black`, `Hispanic`…) and a percentage column next to each count. Read the count, skip the percent.

Two practical notes that cost us time: header rows are not always the first row (scan the first several rows for a known label like a district-code column), and the way to tell Era B from Era C apart — both are .xlsx — is to look at the sheet names: if there are separate "School" and "District" sheets, it's Era C; otherwise treat it as an Era B long workbook.

3 · What's counted, and when it changed

The single most important thing to understand: the race/ethnicity categories are not constant across the 28 years.

Five categories run the entire series, unchanged in meaning: White, Black, Hispanic, Asian, and Native American. Two more appear partway through. Here's exactly where:

Category	Present	Note
White, Black, Hispanic, Native American	1998–99 → 2025–26 (all 28 yrs)	Stable throughout.
Asian	1998–99 → 2025–26 (all 28 yrs)	But its meaning narrows in 2006–07 — see below.
Native Hawaiian / Pacific Islander	2006–07 → 2025–26 (20 yrs)	Split out as its own category. Before 2006–07 these students were counted inside Asian.
Two or More Races	2006–07 → 2025–26 (20 yrs)	New multiracial category. Before 2006–07 there was no way to be multiracial; students were assigned a single race.

The break is 2006–07. That year New Jersey moved from a 5-category race scheme to a 7-category one, adding Native Hawaiian/Pacific Islander (carved out of Asian) and Two or More Races. This happened inside the CSV era — it's a separate axis from the format eras above, and it's the change that most often trips up year-over-year comparisons.

Non-race dimensions phase in at different times, too

Dimension	Coverage
Free / reduced-price lunch (economic disadvantage)	Back to 1998–99, but blank ~2019–20 → 2021–22 in the source.
Multilingual / English learners (labeled LEP, then ELL, then ML over time — same dimension)	From ~2005–06.
Migrant	From ~2005–06.
Military-connected, Homeless	Modern only — 2022–23 onward.
Gender (male/female by race)	Exists in the legacy source files (1998–2018) but we do not load it.

4 · Comparing years fairly

Because 2006–07 reshuffled the categories, you cannot naively compare raw race shares across that line. Two rules make any two years comparable:

Fold Pacific Islander back into Asian. Since Asian included Pacific Islander before 2006–07, the consistent measure across all 28 years is Asian + Native Hawaiian/Pacific Islander.

Only set aside "Two or More Races" when a comparison reaches back before it existed. Because the category (and a separate Pacific Islander count) didn't exist before 2006–07, including it in a comparison that crosses that line would count a group one side structurally couldn't have. So the rule is best-available: when both chosen years are 2006–07 or later, use all seven groups; when the range reaches earlier, fall back to the five consistent ones — White, Black, Hispanic, Asian (incl. Pacific Islander), Native American — with "two or more" set aside. Either way the shares are re-shared to sum to 100%.

That is exactly what the demographic-shift explorer does. Its Demographic Shift Index is ½ · Σ |Δ share| over whichever group set the chosen years support: the share of students who would have to be a different race to turn the earlier year's mix into the later one's.

5 · Gotchas we hit the hard way

Totals vs. grade rows — don't double-count

In the CSV and long-workbook eras, each school has many rows: an authoritative "Total" row plus per-grade rows (and sometimes special-education-by-disability rows that are already inside the Total). Take a school's total, race, and lunch counts from the Total row; take per-grade counts from the grade rows; never add the two together. Only fall back to summing grade rows when a Total row is genuinely absent.

Fractional counts are real

Shared-time students (those who split attendance between schools) are reported as halves. Counts are not always integers — there are roughly 4,400 fractional totals in the series. Preserve them as-is; don't round on import.

Rollup rows masquerading as schools

The files embed aggregate rows. A school code of 999, or a school name containing "TOTAL," is the district total, not a building. A district code of 9999, or names like "County Total" / "State Total," are county/state rollups. Separate the district aggregates into their own table and drop the county/state rollups, or they'll inflate everything.

Suppressed and blank cells

Missing or suppressed values show up as ., *, N, -, or empty. Treat all of them as null, not zero.

Codes are the identity; names drift

Normalize the codes — district to 4 digits, school to 3 — and key schools on district_code + school_code, which is globally unique in New Jersey and stable across years. Do not key on county_code; it's unreliable in some legacy files. School names, meanwhile, drift constantly and sometimes change outright. One building in South Orange-Maplewood (district 4900, school 090) reads:

JEFFERSON                        1998-99 → 2009-10
Jefferson E.S.                   2010-11
Jefferson Elementary School      2011-12 → 2021-22
Delia Bolden Elementary School   2022-23 → 2025-26

Same code throughout, four names. Track the history (we keep a school_name_history table) so a rename doesn't read as a school appearing and disappearing.

6 · The database we built

All 28 years normalize into one SQLite file with three tables. Counts are stored as REAL (to preserve the fractional shared-time values), and missing values are NULL.

Table	Grain	Rows
`enrollment`	one row per school × year	~69,300
`district_enrollment`	one row per district × year (authoritative district total; synthesized from member schools where the source omits it, and flagged)	~18,400
`school_name_history`	one row per code × name, with first/last year — catches renames	—

Coverage: 744 districts, ~3,038 schools, all 21 NJ counties (plus charter / state-operated groupings), 1998–99 → 2025–26.

The enrollment schema:

CREATE TABLE enrollment (
  school_year TEXT, county_code TEXT, county_name TEXT,
  district_code TEXT, district_name TEXT, school_code TEXT, school_name TEXT,
  total REAL,
  -- race / ethnicity (hawaiian_pi & two_or_more only populated from 2006-07)
  white REAL, black REAL, hispanic REAL, asian REAL,
  native_american REAL, hawaiian_pi REAL, two_or_more REAL,
  -- grades
  pk REAL, k REAL, g1 REAL, g2 REAL, g3 REAL, g4 REAL, g5 REAL, g6 REAL,
  g7 REAL, g8 REAL, g9 REAL, g10 REAL, g11 REAL, g12 REAL,
  -- economic / program
  free_lunch REAL, reduced_lunch REAL, ml_learners REAL,
  migrant REAL, military REAL, homeless REAL,
  PRIMARY KEY (school_year, county_code, district_code, school_code)
);
-- district_enrollment mirrors these columns (no school_code/name; adds
-- synthesized INTEGER), keyed (school_year, county_code, district_code).

An example — every school's economic-disadvantage rate for a given year:

SELECT school_year, district_name, school_name,
       ROUND(100.0 * (free_lunch + reduced_lunch) / total, 1) AS pct_frl
FROM enrollment
WHERE school_year = '2024-25' AND total >= 100 AND free_lunch IS NOT NULL
ORDER BY pct_frl DESC;

7 · Reproducing it

A single build script reproduces the whole database from the public source. For each year it: downloads the ZIP from the state archive; detects the era (CSV → Era A; an Excel workbook with School + District sheets → Era C; any other workbook → the Era B long format, converting legacy .xls first); reads each row by matching column names against the known naming variants; applies the Total-row / grade-row / rollup rules above; normalizes codes; and writes the three tables. Everything in sections 2–5 is the spec it implements — given the raw ZIPs, that's enough to rebuild the same schema from scratch.

8 · How we checked the work

The pipeline's output reproduces figures the South Orange-Maplewood district published itself (from its Nov 2025 integration forum), which validates the parse end to end across eras:

Figure	Our extract	District-published
SOMSD White %, 1998–99	43.8%	44%
SOMSD % free/reduced lunch, 1998–99	17.0%	16.9%
South Mountain White %, 1998–99	56.1%	56%
SOMSD White %, 2019–20	55.4%	55.4%
SOMSD % free/reduced lunch, 2024–25	14.1%	14.1%

Source data: New Jersey Department of Education, Fall Enrollment Reports, 1998–99 through 2025–26 (nj.gov/education/doedata/enr).

New Jersey Fall Enrollment, 1998–2026

1 · The source

2 · Three format eras

3 · What's counted, and when it changed

Non-race dimensions phase in at different times, too

4 · Comparing years fairly

5 · Gotchas we hit the hard way

Totals vs. grade rows — don't double-count

Fractional counts are real

Rollup rows masquerading as schools

Suppressed and blank cells

Codes are the identity; names drift

6 · The database we built

7 · Reproducing it

8 · How we checked the work