Lab 2: Spatial Analysis and Visualization

Healthcare Access and Equity in Pennsylvania

Author

Shaurya Chauhan

Published

February 10, 2026

Assignment Overview

Learning Objectives: - Apply spatial operations to answer policy-relevant research questions - Integrate census demographic data with spatial analysis - Create publication-quality visualizations and maps - Work with spatial data from multiple sources - Communicate findings effectively for policy audiences

Part 1: Healthcare Access for Vulnerable Populations

Research Question

Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?

Your analysis should identify counties that should be priorities for healthcare investment and policy intervention.

Required Analysis Steps

Complete the following analysis, documenting each step with code and brief explanations:

Step 1: Data Collection (5 points)

Load the required spatial data: - Pennsylvania county boundaries - Pennsylvania hospitals (from lecture data) - Pennsylvania census tracts

Your Task:

# Load required packages
library(sf)
library(tidyverse)
library(tigris)
library(tidycensus)
library(scales)
library(patchwork)
library(here)

getwd()

[1] "C:/Users/bhanu/OneDrive/Desktop/Post Grad Work/Sem 2/PPA/PortfolioPPA/labs/lab_2"

# Load spatial data
hospitals <- st_read(here("data/hospitals.geojson"))

Reading layer `hospitals' from data source 
  `C:\Users\bhanu\OneDrive\Desktop\Post Grad Work\Sem 2\PPA\PortfolioPPA\data\hospitals.geojson' 
  using driver `GeoJSON'
Simple feature collection with 223 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.49621 ymin: 39.75163 xmax: -74.86704 ymax: 42.13403
Geodetic CRS:  WGS 84

pa_tracts <- tracts(state = "PA", cb = TRUE)


  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |========================================================              |  81%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |===============================================================       |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%

pa_counties <- st_read(here("data/Pennsylvania_County_Boundaries.shp"))

Reading layer `Pennsylvania_County_Boundaries' from data source 
  `C:\Users\bhanu\OneDrive\Desktop\Post Grad Work\Sem 2\PPA\PortfolioPPA\data\Pennsylvania_County_Boundaries.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 67 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8963377 ymin: 4825316 xmax: -8314404 ymax: 5201413
Projected CRS: WGS 84 / Pseudo-Mercator

# Check that all data loaded correctly
pa_counties

Simple feature collection with 67 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8963377 ymin: 4825316 xmax: -8314404 ymax: 5201413
Projected CRS: WGS 84 / Pseudo-Mercator
First 10 features:
   OBJECTID MSLINK COUNTY_NAM COUNTY_NUM FIPS_COUNT COUNTY_ARE COUNTY_PER
1       336     46 MONTGOMERY         46        091       <NA>       <NA>
2       337      8   BRADFORD         08        015       <NA>       <NA>
3       338      9      BUCKS         09        017       <NA>       <NA>
4       339     58      TIOGA         58        117       <NA>       <NA>
5       340     59      UNION         59        119       <NA>       <NA>
6       341     60    VENANGO         60        121       <NA>       <NA>
7       342     62 WASHINGTON         62        125       <NA>       <NA>
8       343     63      WAYNE         63        127       <NA>       <NA>
9       344     42     MCKEAN         42        083       <NA>       <NA>
10      345     43     MERCER         43        085       <NA>       <NA>
   NUMERIC_LA COUNTY_N_1 AREA_SQ_MI SOUND SPREAD_SHE IMAGE_NAME NOTE_FILE VIDEO
1           5         46   487.4271  <NA>       <NA>   poll.bmp      <NA>  <NA>
2           2          8  1161.3379  <NA>       <NA>   poll.bmp      <NA>  <NA>
3           5          9   622.0836  <NA>       <NA>   poll.bmp      <NA>  <NA>
4           2         58  1137.2480  <NA>       <NA>   poll.bmp      <NA>  <NA>
5           2         59   319.1893  <NA>       <NA>   poll.bmp      <NA>  <NA>
6           3         60   683.3676  <NA>       <NA>   poll.bmp      <NA>  <NA>
7           1         62   862.1077  <NA>       <NA>   poll.bmp      <NA>  <NA>
8           2         63   750.8286  <NA>       <NA>   poll.bmp      <NA>  <NA>
9           1         42   985.2700  <NA>       <NA>   poll.bmp      <NA>  <NA>
10          3         43   682.3598  <NA>       <NA>   poll.bmp      <NA>  <NA>
   DISTRICT_N PA_CTY_COD MAINT_CTY_ DISTRICT_O                       geometry
1          06         46          4        6-4 MULTIPOLYGON (((-8398884 48...
2          03         08          9        3-9 MULTIPOLYGON (((-8558633 51...
3          06         09          1        6-1 MULTIPOLYGON (((-8367360 49...
4          03         59          7        3-7 MULTIPOLYGON (((-8558633 51...
5          03         60          8        3-8 MULTIPOLYGON (((-8562865 49...
6          01         61          5        1-5 MULTIPOLYGON (((-8870781 50...
7          12         63          4       12-4 MULTIPOLYGON (((-8935296 48...
8          04         64          6        4-6 MULTIPOLYGON (((-8368003 50...
9          02         42          5        2-5 MULTIPOLYGON (((-8705967 51...
10         01         43          4        1-4 MULTIPOLYGON (((-8956031 50...

pa_tracts

Simple feature collection with 3445 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -80.51989 ymin: 39.7198 xmax: -74.68952 ymax: 42.26986
Geodetic CRS:  NAD83
First 10 features:
   STATEFP COUNTYFP TRACTCE              GEOIDFQ       GEOID   NAME
1       42      001  031101 1400000US42001031101 42001031101 311.01
2       42      013  100400 1400000US42013100400 42013100400   1004
3       42      013  100500 1400000US42013100500 42013100500   1005
4       42      013  100800 1400000US42013100800 42013100800   1008
5       42      013  101900 1400000US42013101900 42013101900   1019
6       42      011  011200 1400000US42011011200 42011011200    112
7       42      011  000200 1400000US42011000200 42011000200      2
8       42      011  011500 1400000US42011011500 42011011500    115
9       42      011  000600 1400000US42011000600 42011000600      6
10      42      011  001900 1400000US42011001900 42011001900     19
              NAMELSAD STUSPS   NAMELSADCO   STATE_NAME LSAD   ALAND AWATER
1  Census Tract 311.01     PA Adams County Pennsylvania   CT 3043185      0
2    Census Tract 1004     PA Blair County Pennsylvania   CT  993724      0
3    Census Tract 1005     PA Blair County Pennsylvania   CT 1130204      0
4    Census Tract 1008     PA Blair County Pennsylvania   CT  996553      0
5    Census Tract 1019     PA Blair County Pennsylvania   CT  573726      0
6     Census Tract 112     PA Berks County Pennsylvania   CT 1539365   9308
7       Census Tract 2     PA Berks County Pennsylvania   CT 1949529 159015
8     Census Tract 115     PA Berks County Pennsylvania   CT 1978380  12469
9       Census Tract 6     PA Berks County Pennsylvania   CT 1460473      0
10     Census Tract 19     PA Berks County Pennsylvania   CT  182420      0
                         geometry
1  MULTIPOLYGON (((-77.03108 3...
2  MULTIPOLYGON (((-78.42478 4...
3  MULTIPOLYGON (((-78.41661 4...
4  MULTIPOLYGON (((-78.41067 4...
5  MULTIPOLYGON (((-78.40836 4...
6  MULTIPOLYGON (((-75.95433 4...
7  MULTIPOLYGON (((-75.96071 4...
8  MULTIPOLYGON (((-75.99913 4...
9  MULTIPOLYGON (((-75.91528 4...
10 MULTIPOLYGON (((-75.91819 4...

hospitals

Simple feature collection with 223 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.49621 ymin: 39.75163 xmax: -74.86704 ymax: 42.13403
Geodetic CRS:  WGS 84
First 10 features:
                  CHIEF_EXEC              CHIEF_EX_1
1              Peter J Adamo               President
2           Autumn DeShields Chief Executive Officer
3               Shawn Parekh Chief Executive Officer
4                DIANE HRITZ Chief Executive Officer
5             Tim Harclerode Chief Executive Officer
6  Richard McLaughlin MD MBA Chief Executive Officer
7             Laura Murnyack             Interim CEO
8                  Adam Beck Chief Executive Officer
9                Pamela Keen Chief Executive Officer
10              Mark Papalia           President/CEO
                                 FACILITY_U LONGITUDE       COUNTY
1              https://www.phhealthcare.org -79.91131   Washington
2                 https://www.malvernbh.com -75.17005 Philadelphia
3            https://roxboroughmemorial.com -75.20963 Philadelphia
4                https://www.ashospital.net -80.27907   Washington
5                 https://www.conemaugh.org -79.02513     Somerset
6                   https://towerhealth.org -75.61213   Montgomery
7         https://bucktailmedicalcenter.org -77.73649      Clinton
8  https://www.selectspecialtyhospitals.com -76.88013      Dauphin
9          https://www.childrenshomepgh.org -79.93736    Allegheny
10    https://www.kanecommunityhospital.com -78.81705       McKean
                                                  FACILITY_N
1                                  Penn Highlands Mon Valley
2                                  MALVERN BEHAVIORAL HEALTH
3                               Roxborough Memorial Hospital
4                                 ADVANCED SURGICAL HOSPITAL
5                    DLP Conemaugh Meyersdale Medical Center
6                                    Pottstown Hospital, LLC
7                                    Bucktail medical Center
8  SELECT SPECIALTY HOSPITAL CENTRAL PENNSYLVANIA HARRISBURG
9                          The Children's Home of Pittsburgh
10              University of Pittsburgh Medical Center Kane
                           STREET   CITY_OR_BO LATITUDE   TELEPHONE_ ZIP_CODE
1          1163 Country Club Road  Monongahela 40.18193 724-258-1000    15063
2  1930 South Broad Street Unit 4 Philadelphia 39.92619 610-480-8919    19145
3               5800 Ridge Avenue Philadelphia 40.02869 215-483-9900    19128
4        100 TRICH DRIVE\nSUITE 1   WASHINGTON 40.15655   7248840710    15301
5              200 Hospital Drive   Meyersdale 39.80913 814-634-5911    15552
6           1600 East High Street    Pottstown 40.24273   6103277000    19464
7                1001 Pine Street       Renovo 41.32789 570-923-1000    17764
8          111 South Front Street   Harrisburg 40.25841 717-724-6604    17110
9                5324 Penn Avenue   Pittsburgh 40.46424 412-441-4884    15224
10                4372 US Route 6         Kane 41.67188   8148378585    16735
                     geometry
1  POINT (-79.91131 40.18193)
2   POINT (-75.17005 39.9262)
3  POINT (-75.20963 40.02869)
4  POINT (-80.27907 40.15655)
5  POINT (-79.02513 39.80913)
6  POINT (-75.61213 40.24273)
7  POINT (-77.73649 41.32789)
8  POINT (-76.88013 40.25842)
9  POINT (-79.93736 40.46424)
10 POINT (-78.81705 41.67188)

st_crs(pa_counties)

Coordinate Reference System:
  User input: WGS 84 / Pseudo-Mercator 
  wkt:
PROJCRS["WGS 84 / Pseudo-Mercator",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            MEMBER["World Geodetic System 1984 (G2296)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["Popular Visualisation Pseudo-Mercator",
        METHOD["Popular Visualisation Pseudo Mercator",
            ID["EPSG",1024]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["False easting",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Web mapping and visualisation."],
        AREA["World between 85.06°S and 85.06°N."],
        BBOX[-85.06,-180,85.06,180]],
    ID["EPSG",3857]]

st_crs(pa_tracts)

Coordinate Reference System:
  User input: NAD83 
  wkt:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]

st_crs(hospitals)

Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

pa_tracts <- st_transform(pa_tracts, st_crs(pa_counties))

Questions to answer: - How many hospitals are in your dataset? 223 Hospitals.

How many census tracts? 3445 Census Tracts.
What coordinate reference system is each dataset in? Counties and Hospitals are in WGS 84, Tracts are in NAD 83.

Step 2: Get Demographic Data

Use tidycensus to download tract-level demographic data for Pennsylvania.

Required variables: - Total population - Median household income - Population 65 years and over (you may need to sum multiple age categories)

Your Task:

# Get demographic data from ACS
tract_demographics <- get_acs(
  geography = "tract",
  state = "PA",
  year = 2022,
  survey = "acs5",
  variables = c(
    total_pop = "B01003_001",
    median_income = "B19013_001",
    
    male_65_66 = "B01001_020",
    male_67_69 = "B01001_021",
    male_70_74 = "B01001_022",
    male_75_79 = "B01001_023",
    male_80_84 = "B01001_024",
    male_85_plus = "B01001_025",
    
    female_65_66 = "B01001_044",
    female_67_69 = "B01001_045",
    female_70_74 = "B01001_046",
    female_75_79 = "B01001_047",
    female_80_84 = "B01001_048",
    female_85_plus = "B01001_049"
  ),
  output = "wide"
)
 #65 plus population summing
tract_demographics <- tract_demographics %>%
  mutate(
    pop_65_plus =
      male_65_66E + male_67_69E + male_70_74E +
      male_75_79E + male_80_84E + male_85_plusE +
      female_65_66E + female_67_69E + female_70_74E +
      female_75_79E + female_80_84E + female_85_plusE
  )


# Join to tract boundaries
pa_tracts_demographics <- pa_tracts %>%
  left_join(
    tract_demographics,
    by = "GEOID"
  )

sum(is.na(pa_tracts_demographics$median_incomeE))

[1] 62

median(pa_tracts_demographics$median_incomeE, na.rm = TRUE)

[1] 70188

Questions to answer: - What year of ACS data are you using? 2022

How many tracts have missing income data? 62 tracts
What is the median income across all PA census tracts? It is $70,188 —

Step 3: Define Vulnerable Populations

Identify census tracts with vulnerable populations based on TWO criteria: 1. Low median household income (choose an appropriate threshold) 2. Significant elderly population (choose an appropriate threshold)

Your Task:

# Filter for vulnerable tracts based on your criteria
# Calculate the percentage of elderly residents in each tract. This converts the elderly population count into a comparable rate

pa_tracts_demographics <- pa_tracts_demographics %>%
  mutate(
    pct_elderly = (pop_65_plus / total_popE) * 100
  )

# Define the income threshold
# Using the median tract-level income across Pennsylvania as a relative benchmark for "low income"
income_threshold <- median(
  pa_tracts_demographics$median_incomeE,
  na.rm = TRUE
)

# Define the elderly population threshold
# Tracts above the median percentage elderly are considered to have a significant elderly population
elderly_threshold <- median(
  pa_tracts_demographics$pct_elderly,
  na.rm = TRUE
)

# Filter for vulnerable tracts
# A tract is considered vulnerable if it meets BOTH conditions:- median income below the income threshold & percentage elderly above the elderly threshold
vulnerable_tracts <- pa_tracts_demographics %>%
  filter(
    median_incomeE < income_threshold,
    pct_elderly > elderly_threshold
  )

# Checking how many tracts meet the vulnerability criteria
nrow(vulnerable_tracts)

[1] 822

# Calculate the percentage of all PA census tracts that are classified as vulnerable under this definition
round(
  nrow(vulnerable_tracts) / nrow(pa_tracts_demographics) * 100,
  1
)

[1] 23.9

Questions to answer: - What income threshold did you choose and why? I used the median household income across Pennsylvania census tracts as the threshold. This provides a relative, state-specific benchmark for identifying economically vulnerable neighborhoods without relying on arbitrary cutoffs.

What elderly population threshold did you choose and why? I used the median percentage of residents aged 65 and over across Pennsylvania census tracts. Using a percentage (rather than a count) allows comparison across tracts of different sizes and highlights areas with above-average healthcare needs.
How many tracts meet your vulnerability criteria? 822 census tracts meet both vulnerability criteria (low income and high elderly population).
What percentage of PA census tracts are considered vulnerable by your definition? Approximately 23.9% of Pennsylvania census tracts are classified as vulnerable under this definition.

Step 4: Calculate Distance to Hospitals

For each vulnerable tract, calculate the distance to the nearest hospital.

Your Task:

# Transform to appropriate projected CRS
# Project vulnerable tracts to a distance-preserving CRS
vulnerable_tracts_proj <- st_transform(vulnerable_tracts, 3365)

# Project hospitals to the same CRS
hospitals_proj <- st_transform(hospitals, 3365)

# Calculate distance from each tract centroid to nearest hospital

# Calculate centroids of vulnerable census tracts
vulnerable_centroids <- st_centroid(vulnerable_tracts_proj)

# Calculate distance matrix between tract centroids and hospitals
distance_matrix <- st_distance(
  vulnerable_centroids,
  hospitals_proj
)

# For each tract, find distance to nearest hospital (in miles)
vulnerable_centroids <- vulnerable_centroids %>%
  mutate(
    dist_to_nearest_hospital_miles =
      as.numeric(apply(distance_matrix, 1, min)) / 5280
  )


avg_distance <- mean(
  vulnerable_centroids$dist_to_nearest_hospital_miles,
  na.rm = TRUE
)
avg_distance

[1] 5.764526

max_distance <- max(
  vulnerable_centroids$dist_to_nearest_hospital_miles,
  na.rm = TRUE
)
max_distance

[1] 29.11587

n_far_tracts <- sum(
  vulnerable_centroids$dist_to_nearest_hospital_miles > 15,
  na.rm = TRUE
)
n_far_tracts

[1] 50

Requirements: - Use an appropriate projected coordinate system for Pennsylvania - Calculate distances in miles - Explain why you chose your projection

Questions to answer: - What is the average distance to the nearest hospital for vulnerable tracts? The average distance from vulnerable census tracts to the nearest hospital is approximately 5.7 miles.

What is the maximum distance? The maximum distance from a vulnerable census tract to the nearest hospital is approximately 29 miles, indicating extreme geographic isolation in some areas.
How many vulnerable tracts are more than 15 miles from the nearest hospital? 50

Step 5: Identify Underserved Areas

Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.

Your Task:

# Create underserved variable

# Create an underserved indicator
vulnerable_centroids <- vulnerable_centroids %>%
  mutate(
    underserved = dist_to_nearest_hospital_miles > 15
  )


n_underserved <- sum(vulnerable_centroids$underserved, na.rm = TRUE)
n_underserved

[1] 50

pct_underserved <- round(
  n_underserved / nrow(vulnerable_centroids) * 100,
  1
)
pct_underserved

[1] 6.1

Questions to answer: - How many tracts are underserved? 50 tracts.

What percentage of vulnerable tracts are underserved? 6.1% are underserved.
Does this surprise you? Why or why not? This result is not surprising, as Pennsylvania’s large hospital density mgiht mask rural access gaps that disproportionately affect a small but important subset of vulnerable communities.

Step 6: Aggregate to County Level

Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.

Your Task:

pa_counties_proj <- st_transform(pa_counties, 3365)

# Spatial join tracts to counties
tracts_with_county <- vulnerable_centroids %>%
  st_join(pa_counties_proj %>% select(COUNTY_NAM))

# Aggregating statistics by county
county_summary <- tracts_with_county %>%
  st_drop_geometry() %>%  # removing geometry for clean aggregation
  group_by(COUNTY_NAM) %>%
  summarize(
    # Number of vulnerable tracts
    n_vulnerable_tracts = n(),
    
    # Number of underserved tracts
    n_underserved_tracts = sum(underserved, na.rm = TRUE),
    
    # Percentage of vulnerable tracts underserved
    pct_underserved = round(
      n_underserved_tracts / n_vulnerable_tracts * 100,
      1
    ),
    
    # Average distance to nearest hospital
    avg_distance_miles = round(
      mean(dist_to_nearest_hospital_miles, na.rm = TRUE),
      2
    ),
    
    # Total vulnerable population
    total_vulnerable_population = sum(pop_65_plus, na.rm = TRUE)
  ) %>%
  arrange(desc(pct_underserved))

county_summary

# A tibble: 68 × 6
   COUNTY_NAM n_vulnerable_tracts n_underserved_tracts pct_underserved
   <chr>                    <int>                <int>           <dbl>
 1 CAMERON                      2                    2           100  
 2 FOREST                       1                    1           100  
 3 SULLIVAN                     3                    3           100  
 4 PIKE                         6                    4            66.7
 5 JUNIATA                      4                    2            50  
 6 SNYDER                       4                    2            50  
 7 CLEARFIELD                  16                    7            43.8
 8 HUNTINGDON                   7                    2            28.6
 9 POTTER                       7                    2            28.6
10 BRADFORD                    11                    3            27.3
# ℹ 58 more rows
# ℹ 2 more variables: avg_distance_miles <dbl>,
#   total_vulnerable_population <dbl>

county_summary %>%
  arrange(desc(n_underserved_tracts)) %>%
  select(COUNTY_NAM, n_underserved_tracts) %>%
  head(5)

# A tibble: 5 × 2
  COUNTY_NAM n_underserved_tracts
  <chr>                     <int>
1 CLEARFIELD                    7
2 PIKE                          4
3 SULLIVAN                      3
4 BRADFORD                      3
5 CAMERON                       2

# For counties with most vulnerable people
county_population_summary <- tracts_with_county %>%
  st_drop_geometry() %>%
  group_by(COUNTY_NAM) %>%
  summarize(
    vulnerable_pop_far = sum(
      ifelse(underserved, pop_65_plus, 0),
      na.rm = TRUE
    )
  ) %>%
  arrange(desc(vulnerable_pop_far))

head(county_population_summary, 5)

# A tibble: 5 × 2
  COUNTY_NAM vulnerable_pop_far
  <chr>                   <dbl>
1 CLEARFIELD               5856
2 BRADFORD                 3385
3 PIKE                     2127
4 LANCASTER                1857
5 SNYDER                   1839

Required county-level statistics: - Number of vulnerable tracts - Number of underserved tracts
- Percentage of vulnerable tracts that are underserved - Average distance to nearest hospital for vulnerable tracts - Total vulnerable population

Questions to answer: - Which 5 counties have the highest percentage of underserved vulnerable tracts? Cameron, Forest, SUllivan, Pike and Juniata counties have the highest % od underserved tracts.

Which counties have the most vulnerable people living far from hospitals? The counties with the highest number of vulnerable residents living more than 15 miles from a hospital are Clearfield, Bradford, Pike, Lancaster and Snyder.
Are there any patterns in where underserved counties are located? The counties with the highest percentage of underserved vulnerable tracts are primarily small, rural counties in northern and central Pennsylvania (e.g., Cameron, Forest, Sullivan). These counties have relatively few hospitals and large geographic areas, which increases travel distances. Meanwhile, counties with the highest number of vulnerable residents living far from hospitals (e.g., Clearfield, Bradford, Pike) tend to be geographically large rural counties with dispersed populations. This indicates that geographic isolation and low facility density are the main drivers of healthcare access inequities. —

Step 7: Create Summary Table

Create a professional table showing the top 10 priority counties for healthcare investment.

Your Task:

# Creating and formatting priority counties table
library(knitr)

priority_table <- county_population_summary %>%
  left_join(county_summary, by = "COUNTY_NAM") %>%
  arrange(desc(vulnerable_pop_far)) %>%
  slice_head(n = 10) %>%
  select(
    County = COUNTY_NAM,
    `Vulnerable Residents >15 Miles` = vulnerable_pop_far,
    `Vulnerable Tracts` = n_vulnerable_tracts,
    `Underserved Tracts` = n_underserved_tracts,
    `% Underserved` = pct_underserved,
    `Avg Distance (Miles)` = avg_distance_miles
  )

# Formatting and displaying table
kable(
  priority_table,
  caption = "Top 10 Priority Counties for Healthcare Investment Based on Vulnerable Populations with Limited Hospital Access",
  format.args = list(big.mark = ","),
  align = "lrrrrr"
)

Top 10 Priority Counties for Healthcare Investment Based on Vulnerable Populations with Limited Hospital Access
County	Vulnerable Residents >15 Miles	Vulnerable Tracts	Underserved Tracts	% Underserved	Avg Distance (Miles)
CLEARFIELD	5,856	16	7	43.8	14.15
BRADFORD	3,385	11	3	27.3	10.62
PIKE	2,127	6	4	66.7	17.47
LANCASTER	1,857	10	2	20.0	7.04
SNYDER	1,839	4	2	50.0	15.40
HUNTINGDON	1,692	7	2	28.6	12.43
CRAWFORD	1,682	16	2	12.5	7.86
TIOGA	1,589	10	2	20.0	10.67
SULLIVAN	1,306	3	3	100.0	21.52
CAMERON	1,269	2	2	100.0	19.07

Requirements: - Use knitr::kable() or similar for formatting - Include descriptive column names - Format numbers appropriately (commas for population, percentages, etc.) - Add an informative caption - Sort by priority (you decide the metric)

Counties were prioritized based on the total number of vulnerable residents living more than 15 miles from the nearest hospital. This metric captures both socioe-conomic vulnerability and geographic isolation, ensuring that investment targets areas with the greatest concentration of healthcare access inequities.

Part 2: Comprehensive Visualization

Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.

Map 1: County-Level Choropleth

Create a choropleth map showing healthcare access challenges at the county level.

Your Task:

# Create county-level access map

library(ggplot2)

# Joining summary statistics back to county geometry
counties_for_map <- pa_counties_proj %>%
  left_join(county_summary, by = "COUNTY_NAM")

# Creating choropleth map
ggplot() +
  # County polygons filled by % underserved
  geom_sf(
    data = counties_for_map,
    aes(fill = pct_underserved),
    color = "white",
    size = 0.2
  ) +
  
  # Hospital locations
  geom_sf(
    data = hospitals_proj,
    color = "black",
    size = 1,
    alpha = 0.7
  ) +
  
  # Color scale
  scale_fill_viridis_c(
    option = "magma",
    name = "% Vulnerable\nTracts Underserved",
    labels = function(x) paste0(x, "%"),
    na.value = "grey90"
  ) +
  
  labs(
    title = "Healthcare Access Challenges in Pennsylvania",
    subtitle = "Percentage of Vulnerable Census Tracts Located More Than 15 Miles from a Hospital",
    caption = "Source: ACS 2018–2022, Pennsylvania Hospital Data | Analysis by Shaurya Chauhan"
  ) +
  
  theme_void() +
  
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11),
    legend.position = "right"
  )

Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels

Map 2: Detailed Vulnerability Map

Create a map highlighting underserved vulnerable tracts.

Your Task:

# Create detailed tract-level map

# Separating underserved and served tracts for clarity
underserved_tracts <- vulnerable_centroids %>%
  filter(underserved == TRUE)

served_tracts <- vulnerable_centroids %>%
  filter(underserved == FALSE)

ggplot() +
  
  # County boundaries for context
  geom_sf(
    data = pa_counties_proj,
    fill = NA,
    color = "grey70",
    size = 0.3
  ) +
  
  # Served vulnerable tracts (light color)
  geom_sf(
    data = served_tracts,
    color = "lightblue",
    size = 1,
    alpha = 0.6
  ) +
  
  # Underserved vulnerable tracts (highlighted)
  geom_sf(
    data = underserved_tracts,
    color = "red",
    size = 1.5,
    alpha = 0.9
  ) +
  
  # Hospital locations
  geom_sf(
    data = hospitals_proj,
    color = "black",
    size = 1,
    alpha = 0.8
  ) +
  
  labs(
    title = "Underserved Vulnerable Census Tracts in Pennsylvania",
    subtitle = "Red = Vulnerable tracts >15 miles from a hospital | Light blue = Vulnerable tracts within 15 miles",
    caption = "Source: ACS 2018–2022, Pennsylvania Hospital Data | Analysis by Shaurya Chauhan"
  ) +
  
  theme_void() +
  
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11)
  )

Requirements: - Show underserved vulnerable tracts in a contrasting color - Include county boundaries for context - Show hospital locations - Use appropriate visual hierarchy (what should stand out?) - Include informative title and subtitle

Chart: Distribution Analysis

Create a visualization showing the distribution of distances to hospitals for vulnerable populations.

Your Task:

# Create distribution visualization

ggplot(vulnerable_centroids, 
       aes(x = dist_to_nearest_hospital_miles)) +
  
  geom_histogram(
    bins = 30,
    fill = "steelblue",
    color = "white",
    alpha = 0.8
  ) +
  
  geom_vline(
    xintercept = 15,
    color = "red",
    linetype = "dashed",
    size = 1
  ) +
  
  labs(
    title = "Distribution of Distance to Nearest Hospital\nAmong Vulnerable Census Tracts",
    x = "Distance to Nearest Hospital (Miles)",
    y = "Number of Vulnerable Census Tracts",
    caption = "Red dashed line indicates 15-mile underserved threshold"
  ) +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.title = element_text(size = 12)
  )

Suggested chart types: - Histogram or density plot of distances - Box plot comparing distances across regions - Bar chart of underserved tracts by county - Scatter plot of distance vs. vulnerable population size

Requirements: - Clear axes labels with units - Appropriate title - Professional formatting - Brief interpretation (1-2 sentences as a caption or in text)

Part 3: Bring Your Own Data Analysis

Choose your own additional spatial dataset and conduct a supplementary analysis.

Challenge Options

Choose ONE of the following challenge exercises, or propose your own research question using OpenDataPhilly data (https://opendataphilly.org/datasets/).

Note these are just loose suggestions to spark ideas - follow or make your own as the data permits and as your ideas evolve. This analysis should include bringing in your own dataset, ensuring the projection/CRS of your layers align and are appropriate for the analysis (not lat/long or geodetic coordinate systems). The analysis portion should include some combination of spatial and attribute operations to answer a relatively straightforward question

Education & Youth Services

Option A: Educational Desert Analysis - Data: Schools, Libraries, Recreation Centers, Census tracts (child population) - Question: “Which neighborhoods lack adequate educational infrastructure for children?” - Operations: Buffer schools/libraries (0.5 mile walking distance), identify coverage gaps, overlay with child population density - Policy relevance: School district planning, library placement, after-school program siting

Option B: School Safety Zones - Data: Schools, Crime Incidents, Bike Network - Question: “Are school zones safe for walking/biking, or are they crime hotspots?” - Operations: Buffer schools (1000ft safety zone), spatial join with crime incidents, assess bike infrastructure coverage - Policy relevance: Safe Routes to School programs, crossing guard placement

Environmental Justice

Option C: Green Space Equity - Data: Parks, Street Trees, Census tracts (race/income demographics) - Question: “Do low-income and minority neighborhoods have equitable access to green space?” - Operations: Buffer parks (10-minute walk = 0.5 mile), calculate tree canopy or park acreage per capita, compare by demographics - Policy relevance: Climate resilience, environmental justice, urban forestry investment —

Public Safety & Justice

Option D: Crime & Community Resources - Data: Crime Incidents, Recreation Centers, Libraries, Street Lights - Question: “Are high-crime areas underserved by community resources?” - Operations: Aggregate crime counts to census tracts or neighborhoods, count community resources per area, spatial correlation analysis - Policy relevance: Community investment, violence prevention strategies —

Infrastructure & Services

Option E: Polling Place Accessibility - Data: Polling Places, SEPTA stops, Census tracts (elderly population, disability rates) - Question: “Are polling places accessible for elderly and disabled voters?” - Operations: Buffer polling places and transit stops, identify vulnerable populations, find areas lacking access - Policy relevance: Voting rights, election infrastructure, ADA compliance

Health & Wellness

Option F: Recreation & Population Health - Data: Recreation Centers, Playgrounds, Parks, Census tracts (demographics) - Question: “Is lack of recreation access associated with vulnerable populations?” - Operations: Calculate recreation facilities per capita by neighborhood, buffer facilities for walking access, overlay with demographic indicators - Policy relevance: Public health investment, recreation programming, obesity prevention

Emergency Services

Option G: EMS Response Coverage - Data: Fire Stations, EMS stations, Population density, High-rise buildings - Question: “Are population-dense areas adequately covered by emergency services?” - Operations: Create service area buffers (5-minute drive = ~2 miles), assess population coverage, identify gaps in high-density areas - Policy relevance: Emergency preparedness, station siting decisions

Arts & Culture

Option H: Cultural Asset Distribution - Data: Public Art, Museums, Historic sites/markers, Neighborhoods - Question: “Do all neighborhoods have equitable access to cultural amenities?” - Operations: Count cultural assets per neighborhood, normalize by population, compare distribution across demographic groups - Policy relevance: Cultural equity, tourism, quality of life, neighborhood identity

Data Sources

OpenDataPhilly: https://opendataphilly.org/datasets/ - Most datasets available as GeoJSON, Shapefile, or CSV with coordinates - Always check the Metadata for a data dictionary of the fields.

Additional Sources: - Pennsylvania Open Data: https://data.pa.gov/ - Census Bureau (via tidycensus): Demographics, economic indicators, commute patterns - TIGER/Line (via tigris): Geographic boundaries

Recommended Starting Points

If you’re feeling confident: Choose an advanced challenge with multiple data layers. If you are a beginner, choose something more manageable that helps you understand the basics

If you have a different idea: Propose your own question! Just make sure: - You can access the spatial data - You can perform at least 2 spatial operations

Your Analysis

Your Task:

Find and load additional data
- Document your data source
- Check and standardize the CRS
- Provide basic summary statistics

# Load your additional dataset

library(sf)
library(here)

# Load parks shapefile
philly_parks <- st_read(here("labs/lab_2/PPR_Properties/PPR_Properties.shp"))

Reading layer `PPR_Properties' from data source 
  `C:\Users\bhanu\OneDrive\Desktop\Post Grad Work\Sem 2\PPA\PortfolioPPA\labs\lab_2\PPR_Properties\PPR_Properties.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 504 features and 25 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8380525 ymin: 4847239 xmax: -8344359 ymax: 4885123
Projected CRS: WGS 84 / Pseudo-Mercator

# Basic inspection of parks dataset
nrow(philly_parks)                 # Number of park features

[1] 504

st_geometry_type(philly_parks)[1]  # Geometry type

[1] MULTIPOLYGON
18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

st_crs(philly_parks)               # CRS

Coordinate Reference System:
  User input: WGS 84 / Pseudo-Mercator 
  wkt:
PROJCRS["WGS 84 / Pseudo-Mercator",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            MEMBER["World Geodetic System 1984 (G2296)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["Popular Visualisation Pseudo-Mercator",
        METHOD["Popular Visualisation Pseudo Mercator",
            ID["EPSG",1024]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["False easting",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Web mapping and visualisation."],
        AREA["World between 85.06°S and 85.06°N."],
        BBOX[-85.06,-180,85.06,180]],
    ID["EPSG",3857]]

names(philly_parks)                # Column names

 [1] "public_nam" "parent_nam" "nested"     "official_n" "label"     
 [6] "alias"      "dpp_asset_" "address_91" "zip_code"   "address_br"
[11] "alias_addr" "acreage"    "property_c" "ppr_use"    "ppr_prog_d"
[16] "ppr_ops_di" "council_di" "police_dis" "city_scale" "local_scal"
[21] "program_si" "comments"   "objectid"   "Shape__Are" "Shape__Len"
[26] "geometry"

# Transform parks to PA State Plane South
philly_parks_proj <- st_transform(philly_parks, 3365)
st_crs(philly_parks_proj)

Coordinate Reference System:
  User input: EPSG:3365 
  wkt:
PROJCRS["NAD83(HARN) / Pennsylvania South (ftUS)",
    BASEGEOGCRS["NAD83(HARN)",
        DATUM["NAD83 (High Accuracy Reference Network)",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4152]],
    CONVERSION["SPCS83 Pennsylvania South zone (US survey foot)",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39.3333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-77.75,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",40.9666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",39.9333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",1968500,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["US survey foot",0.304800609601219]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["US survey foot",0.304800609601219]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["United States (USA) - Pennsylvania - counties of Adams; Allegheny; Armstrong; Beaver; Bedford; Berks; Blair; Bucks; Butler; Cambria; Chester; Cumberland; Dauphin; Delaware; Fayette; Franklin; Fulton; Greene; Huntingdon; Indiana; Juniata; Lancaster; Lawrence; Lebanon; Lehigh; Mifflin; Montgomery; Northampton; Perry; Philadelphia; Schuylkill; Snyder; Somerset; Washington; Westmoreland; York."],
        BBOX[39.71,-80.53,41.18,-74.72]],
    ID["EPSG",3365]]

Questions to answer: - What dataset did you choose and why? I selected the Philadelphia Parks & Recreation (PPR) Properties dataset- which contains the spatial boundaries of park properties and land managed by Philadelphia Parks & Recreation. I chose this dataset because access to publicly managed green space is closely linked to public health, environmental justice and neighborhood quality of life. Evaluating spatial access to park land in relation to vulnerable populations allows for a meaningful equity-focused analysis relevant to urban planning and public investment decisions.

What is the data source and date? The dataset was obtained from OpenDataPhilly, provided by Philadelphia Parks & Recreation (PPR). Source: Philadelphia Parks & Recreation via OpenDataPhilly (2018 dataset).
How many features does it contain? The dataset contains 504 multipolygon features, where each feature represents a park property or managed land boundary within Philadelphia.
What CRS is it in? Did you need to transform it? The dataset is originally projected in WGS 84 / Pseudo-Mercator (EPSG:3857), which uses meters as units and is commonly used for web mapping applications. Because accurate buffering and area calculations require a locally appropriate projected coordinate system, I transformed the dataset to NAD83 / Pennsylvania South State Plane (EPSG:3365), which uses feet as linear units and is better suited for distance-based spatial analysis within Pennsylvania.

Pose a research question

Are census tracts in Philadelphia with below-median income and above-median elderly population disproportionately lacking walkable access (within 0.25 miles) to publicly managed park land- using tract centroids as a proxy for residential location?

Conduct spatial analysis

Use at least TWO spatial operations to answer your research question.

Required operations (choose 2+): - Buffers - Spatial joins - Spatial filtering with predicates - Distance calculations - Intersections or unions - Point-in-polygon aggregation

Your Task:

# Your spatial analysis


# Extract Philadelphia Census Tracts (FIPS 42101)
philly_tracts <- pa_tracts %>%
  filter(substr(GEOID, 1, 5) == "42101")


# Transform BOTH layers to PA State Plane South (EPSG 3365, feet)
philly_tracts_proj <- st_transform(philly_tracts, 3365)
philly_parks_proj  <- st_transform(philly_parks, 3365)


# Join Demographic Data
philly_tracts_proj <- philly_tracts_proj %>%
  left_join(
    pa_tracts_demographics %>%
      st_drop_geometry() %>%
      select(GEOID, median_incomeE, pop_65_plus, total_popE),
    by = "GEOID"
  )


# Calculate elderly percentage
philly_tracts_proj <- philly_tracts_proj %>%
  mutate(
    pct_elderly = (pop_65_plus / total_popE) * 100
  )


# Define Vulnerability (Philly thresholds)
philly_income_threshold <- median(
  philly_tracts_proj$median_incomeE,
  na.rm = TRUE
)

philly_elderly_threshold <- median(
  philly_tracts_proj$pct_elderly,
  na.rm = TRUE
)

philly_tracts_proj <- philly_tracts_proj %>%
  mutate(
    vulnerable = median_incomeE < philly_income_threshold &
                 pct_elderly > philly_elderly_threshold
  )


# 0.25 miles = 1320 feet buffer
park_buffers <- philly_parks_proj %>%
  st_buffer(dist = 1320)

# Merge overlapping buffers
park_coverage_area <- st_union(park_buffers)



# Use TRACT CENTROIDS for Access Test
tract_centroids <- st_centroid(philly_tracts_proj)

# Determine whether centroid falls inside park buffer
tract_centroids <- tract_centroids %>%
  mutate(
    has_park_access =
      st_intersects(geometry, park_coverage_area, sparse = FALSE)[,1]
  )


# Compare Park Access by Vulnerability
park_equity_summary <- tract_centroids %>%
  st_drop_geometry() %>%
  group_by(vulnerable) %>%
  summarize(
    total_tracts = n(),
    tracts_with_access = sum(has_park_access),
    tracts_without_access = sum(!has_park_access),
    pct_without_access =
      (tracts_without_access / total_tracts) * 100
  )



# Joining park access back to tract polygons
philly_tracts_proj <- philly_tracts_proj %>%
  left_join(
    tract_centroids %>%
      st_drop_geometry() %>%
      select(GEOID, has_park_access),
    by = "GEOID"
  )



# Creating simple equity category for mapping
philly_tracts_proj <- philly_tracts_proj %>%
  mutate(
    equity_flag = case_when(
      vulnerable & !has_park_access ~ "Vulnerable – No Access",
      vulnerable & has_park_access ~ "Vulnerable – Has Access",
      TRUE ~ "Other Tracts"
    )
  )



ggplot() +
  
  # Park polygons (light background)
  geom_sf(
    data = philly_parks_proj,
    fill = "lightgreen",
    color = NA,
    alpha = 0.3
  ) +
  
  # Census tracts colored by equity category
  geom_sf(
    data = philly_tracts_proj,
    aes(fill = equity_flag),
    color = "white",
    size = 0.1
  ) +
  
  scale_fill_manual(
    values = c(
      "Vulnerable – No Access" = "red",
      "Vulnerable – Has Access" = "orange",
      "Other Tracts" = "grey80"
    ),
    name = "Tract Type"
  ) +
  
  labs(
    title = "Walkable Park Access in Philadelphia",
    subtitle = "Access defined as centroid within 0.25 miles of park land",
    caption = "Source: ACS 2018–2022 and PPR Properties"
  ) +
  
  theme_void()

Analysis requirements: - Clear code comments explaining each step - Appropriate CRS transformations - Summary statistics or counts - At least one map showing your findings - Brief interpretation of results (3-5 sentences)

Your interpretation: The results indicate a measurable equity gap in walkable park access across Philadelphia. Approximately 29% of socioeconomically vulnerable census tracts lack access to park land within a 0.25-mile walking distance, compared to 21% of non-vulnerable tracts. This 8-percentage-point difference suggests that neighborhoods with lower incomes and higher elderly populations face systematically reduced proximity to public green space. While Philadelphia overall has relatively dense park coverage, access at a shorter walking threshold reveals localized disparities. These findings suggest that targeted investments in small neighborhood parks, green corridors, or pedestrian infrastructure may be warranted in vulnerable areas.

Finally - A few comments about your incorporation of feedback!

Take a few moments to clean up your markdown document and then write a line or two or three about how you may have incorporated feedback that you received after your first assignment.

Submission Requirements

What to submit:

Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
- Use embed-resources: true in YAML so it’s a single file
- All code should run without errors
- All maps and charts should display correctly
Submit the correct and working links of your assignment on Canvas