To-Do: Build Business History Dataset

Purpose

Build a business-change dataset that can support the parking minimum study at two levels:

1. an **aggregated baseline** using NaNDA tract-level annual business counts and employment

2. a later **business-level POI history dataset** using Yelp, Foursquare, OSM, and manual validation

The main idea is to start with a tract-based corridor proxy so we can learn the data, test corridor comparisons, and get an early read on signal before committing to the harder business-level pipeline.


Why This Revision Makes Sense

The original plan jumped quickly into business-level longitudinal reconstruction.

That is still a good long-run goal, but the Nanda dataset give us a (low effort/ high reward) baby step in that direction:

Examples from the downloaded NaNDA files:

The tract CSVs appear to contain fields like:

That makes NaNDA a good baseline for:

It does **not** solve true corridor-only attribution by itself, because tracts are larger than the corridor buffers we care about. But it gives us a practical first dataset.


Framing

We should treat this as a two-stage business-history strategy.

Stage A

Build a **corridor-proxy aggregated panel** using:

Stage B

Build a **business-level corridor panel** using:

Stage A helps us learn:


Revised Goal

Construct a staged longitudinal business dataset aligned with the study objective of detecting corridor-level change before and after parking minimum removal.

Immediate goal

Create an **aggregated corridor-year panel** using NaNDA tract-level business and employment measures.

Later goal

Create a **business-location panel** for selected corridors after the aggregated baseline is working.


Phase I: Aggregated Baseline With NaNDA

1. Define The Analysis Frame

Output:


2. Define The Geographic Proxy Strategy

Because NaNDA is tract-level, we need a transparent way to map corridor buffers to tracts.

Recommended default:

Recommended rules to support:

My recommendation:

This will not perfectly isolate corridor-only activity, but it is a defensible first-pass corridor proxy.


3. Prepare NaNDA Inputs

Use the tract-level CSVs inside [`data/nanda`](c:/Users/ylaim/OneDrive/6555-urpl-transport-env/parking_research/data/nanda).

Relevant families:

Tasks:

Recommended first-pass fields:

Output:


4. Build Corridor-Tract Crosswalk

For each corridor buffer:

Recommended crosswalk fields:

Output:

This crosswalk becomes the key bridge between corridor geometry and NaNDA panels.


5. Build Corridor-Year Aggregated Panel

Join the corridor-tract crosswalk to the cleaned NaNDA tract panels.

Then aggregate tract values to the corridor-year level.

Recommended default:

Example outputs by corridor-year:

This yields a corridor-year panel such as:


6. Derive Aggregated Change Measures

From the corridor-year panel, derive:

Good first measures:

This is not business churn yet in the strict entry/exit sense, but it gives an early measure of corridor commercial change.


7. Validate The Aggregated Proxy

We should explicitly test whether the tract proxy behaves plausibly.

Checks:

Important:

Some corridors will be better candidates for tract-based inference than others.

This validation step should help us identify:


8. Produce Phase I Deliverables

Deliverables:

This phase should be enough to:


Phase II: Business-Level POI History

Only after the aggregated baseline is working should we move to the harder business-level build.

9. Build Multi-Source POI Inventory

Use a combined approach:

For each corridor:


10. Standardize And Merge POIs

Output:


11. Attach Time Signals

Examples:

This will allow us to infer:


12. Build Business-Level Panel

Unit:

Time:

Fields:

This is the stage that supports true churn metrics.


13. Validate With Manual Checks

Use:

Focus on:


14. Final Analysis Outputs

Eventually we want two linked outputs:

Corridor-level panel

Business-level panel

Ready for:


Deliverable

A staged business-history workflow that starts with a tract-based aggregated baseline and later extends to business-level longitudinal reconstruction.

That gives us: