Populus × Mesa: Copy‑Paste Integration

Goal: Load Populus synthetic agents and places from Parquet, wire up agent⇄place and place⇄place relationships, and run a Mesa model where agents “know” their household, workplace, school, etc.—with everything provided right here to copy/paste.

Prerequisites

Python ≥ 3.10
Mesa ≥ 2.0 (this code targets the Mesa 2 series API)
Pandas + pyarrow (or fastparquet) for Parquet IO

pip install "mesa>=2.0" pandas pyarrow
# or: pip install fastparquet

Mesa version note: This integration uses Mesa ≥ 2 conventions (Agent(model), model.agents.shuffle_do("step")). If you’re on Mesa 1.x, see the Legacy (Mesa 1.x) appendix at the end for the small code edits you’ll need.

Expected Directory Layout for Populus Exports

<POP_DIR>/<State>/<County>/
  agent-person_<County>.parquet
  place-<type>_<County>.parquet
  agent-person_to_place-<type>_<County>.parquet
  place-<child>_to_place-<parent>_<County>.parquet

Agents: agent-person_<County>.parquet must contain agent_id plus any other columns you want copied onto the agent (e.g., age).
Places: place-<type>_<County>.parquet must contain place_id plus any columns you want copied onto the place record.
Agent→Place: agent-person_to_place-<type>_<County>.parquet has agent_id, place_id. The <type> string (e.g., household, workplace) becomes the relationship name.
Place→Place: place-<child>_to_place-<parent>_<County>.parquet has exactly two columns, e.g., block_group_place_id, tract_place_id.

County strings: Use the exact on‑disk spelling. Exports commonly use underscores for spaces (e.g., Cook_County).

🔧 Copy‑Paste Code (Single‑File Integration)

Create a file named populus_mesa_integration.py in your project and paste the following:

# file: populus_mesa_integration.py
# Populus × Mesa integration: Places, Agents, Model, and a Loader for Parquet exports.

from __future__ import annotations
import os
import glob
import pandas as pd
from typing import Dict, List, Optional, Tuple, Iterable

from mesa import Model
from mesa import Agent as MesaAgent


# -------------------------------
# Core data: Place
# -------------------------------

class Place:
    """Represents a place in the population with hierarchical relationships."""
    def __init__(self, place_id, **attributes):
        self.id = place_id
        self.parent_places: Dict[str, str] = {}   # {'census_tract': place_id, 'county': place_id}
        self.child_places: List[str] = []         # [block_group_id1, block_group_id2, ...]
        for key, value in attributes.items():
            setattr(self, key, value)

    def add_parent_relationship(self, relationship_type: str, parent_id: str) -> None:
        """Add a parent place relationship."""
        self.parent_places[relationship_type] = parent_id

    def add_child_place(self, child_id: str) -> None:
        """Add a child place."""
        if child_id not in self.child_places:
            self.child_places.append(child_id)

    def get_parent_place(self, relationship_type: str) -> Optional[str]:
        """Get the parent place ID for a specific relationship type."""
        return self.parent_places.get(relationship_type)


# -------------------------------
# Mesa model with place registry
# -------------------------------

class PopulationModel(Model):
    """Mesa Model wrapper with built-in place management for population data."""
    def __init__(self):
        super().__init__()
        self.places: Dict[str, Place] = {}  # place_id -> Place object

    def add_place(self, place: Place) -> None:
        """Add a place to the model's place storage."""
        self.places[place.id] = place

    def get_place(self, place_id: str) -> Optional[Place]:
        """Get a place by its ID."""
        return self.places.get(place_id)

    def step(self) -> None:
        """Step all agents in the model (Mesa ≥ 2)."""
        # In Mesa 2.x, Model maintains AgentSet at self.agents
        self.agents.shuffle_do("step")


# -------------------------------
# Mesa agent with Populus fields
# -------------------------------

class PopulationAgent(MesaAgent):
    """
    Mesa agent with population data attributes and place relationships.
    Users should subclass this to define their own step behavior and other methods.
    """
    def __init__(self, unique_id, model: PopulationModel, **attributes):
        # Mesa 2.x signature: Agent(model)
        super().__init__(model)
        self.unique_id = unique_id
        self.places: Dict[str, str] = {}  # {'household': place_id, 'workplace': place_id, ...}
        for key, value in attributes.items():
            setattr(self, key, value)

    def add_place_relationship(self, relationship_type: str, place_id: str) -> None:
        """Add a relationship to a place (household, workplace, etc.)."""
        self.places[relationship_type] = place_id

    def get_place(self, relationship_type: str) -> Optional[str]:
        """Get the place ID for a specific relationship type."""
        return self.places.get(relationship_type)

    def step(self) -> None:
        """Default step method - override for specific behavior."""
        pass


# -------------------------------
# Helpers: filename parsing & IO
# -------------------------------

def extract_place_type_from_filename(filename: str, county: str) -> str:
    """Extract place type from a place declaration filename."""
    parts_after_place = filename.split('-', 1)[1]  # remove 'place-' prefix
    county_index = parts_after_place.rfind(f'_{county}.parquet')
    if county_index != -1:
        return parts_after_place[:county_index]
    raise ValueError(
        f"Unable to parse place type from filename: {filename}. "
        f"Expected pattern: place-_{county}.parquet"
    )


def extract_agent_relationship_type(filename: str, county: str, relationship_marker: str) -> str:
    """Extract relationship type from agent-to-place relationship filename."""
    type_part = filename.split(relationship_marker, 1)[1]
    county_suffix = f'_{county}.parquet'
    if type_part.endswith(county_suffix):
        return type_part[:-len(county_suffix)]
    raise ValueError(
        f"Unable to parse relationship type from filename: {filename}. "
        f"Expected pattern: agent-person{relationship_marker}_{county}.parquet"
    )


def parse_place_relationship_filename(filename: str, county: str, relationship_marker: str) -> Tuple[str, str]:
    """Parse place-to-place relationship filename to extract (child_type, parent_type)."""
    after_prefix = filename.split('place-', 1)[1]
    parts = after_prefix.split(relationship_marker, 1)
    if len(parts) != 2:
        raise ValueError(f"Unable to parse place relationship filename: {filename}")
    child_type = parts[0]
    parent_part = parts[1].split('place-', 1)[1] if 'place-' in parts[1] else parts[1]
    county_suffix = f'_{county}.parquet'
    if parent_part.endswith(county_suffix):
        parent_type = parent_part[:-len(county_suffix)]
    else:
        raise ValueError(f"Unable to parse parent type from filename: {filename}")
    return child_type, parent_type


def get_place_declaration_files(county_path: str, county: str, relationship_marker: str) -> List[str]:
    """Get all place declaration files (excluding relationship files)."""
    place_pattern = os.path.join(county_path, f"place-*_{county}.parquet")
    place_files = glob.glob(place_pattern)
    place_declaration_files = [
        fp for fp in place_files if relationship_marker not in os.path.basename(fp)
    ]
    if not place_declaration_files:
        raise FileNotFoundError(
            f"No place declaration files found matching pattern: {place_pattern}"
        )
    return place_declaration_files


def load_places_from_files(county_path: str, county: str, model: PopulationModel, relationship_marker: str) -> None:
    """Load all places from declaration files into the model."""
    place_declaration_files = get_place_declaration_files(county_path, county, relationship_marker)
    for place_file in place_declaration_files:
        filename = os.path.basename(place_file)
        place_type = extract_place_type_from_filename(filename, county)
        df_places = pd.read_parquet(place_file)
        for _, row in df_places.iterrows():
            place_data = row.to_dict()
            place_id = place_data.pop('place_id')
            place_data['place_type'] = place_type
            place = Place(place_id, **place_data)
            model.add_place(place)


def load_agents_from_file(county_path: str, county: str, model: PopulationModel, agent_class=PopulationAgent) -> Dict:
    """Load all agents from the agent file and return a dict[agent_id -> agent]."""
    agent_file = os.path.join(county_path, f"agent-person_{county}.parquet")
    if not os.path.exists(agent_file):
        raise FileNotFoundError(f"Agent file not found: {agent_file}")
    df_agents = pd.read_parquet(agent_file)
    agents_dict = {}
    for idx, row in df_agents.iterrows():
        agent_data = row.to_dict()
        agent_id = agent_data.pop('agent_id', idx)
        agent = agent_class(unique_id=agent_id, model=model, **agent_data)
        agents_dict[agent_id] = agent
    return agents_dict


def process_agent_place_relationships(county_path: str, county: str, agents_dict: Dict, relationship_marker: str) -> None:
    """Process all agent-to-place relationship files and attach them to agents."""
    pattern = os.path.join(county_path, f"agent-person{relationship_marker}*_{county}.parquet")
    agent_relationship_files = glob.glob(pattern)
    for relationship_file in agent_relationship_files:
        filename = os.path.basename(relationship_file)
        relationship_type = extract_agent_relationship_type(filename, county, relationship_marker)
        df_relationships = pd.read_parquet(relationship_file)
        for _, row in df_relationships.iterrows():
            agent_id = row['agent_id']
            place_id = row['place_id']
            if agent_id in agents_dict:
                agents_dict[agent_id].add_place_relationship(relationship_type, place_id)


def process_single_place_relationship(row: pd.Series, columns: List[str], model: PopulationModel) -> None:
    """Process a single place-to-place relationship row."""
    if len(columns) != 2:
        raise ValueError(f"Expected 2 columns in relationship file, got {len(columns)}")
    child_col, parent_col = columns[0], columns[1]
    child_place_id = row[child_col]
    parent_place_id = row[parent_col]
    parent_type = parent_col.replace('_place_id', '') if '_place_id' in parent_col else 'parent'
    child_place = model.get_place(child_place_id)
    parent_place = model.get_place(parent_place_id)
    if child_place:
        child_place.add_parent_relationship(parent_type, parent_place_id)
    if parent_place:
        parent_place.add_child_place(child_place_id)


def process_place_place_relationships(county_path: str, county: str, model: PopulationModel, relationship_marker: str) -> None:
    """Process all place-to-place relationship files."""
    pattern = os.path.join(county_path, f"place-*{relationship_marker}*_{county}.parquet")
    place_relationship_files = glob.glob(pattern)
    for relationship_file in place_relationship_files:
        filename = os.path.basename(relationship_file)
        # Parsing validates our expectations; types are not strictly needed below
        _child_type, _parent_type = parse_place_relationship_filename(filename, county, relationship_marker)
        df_relationships = pd.read_parquet(relationship_file)
        columns = df_relationships.columns.tolist()
        for _, row in df_relationships.iterrows():
            process_single_place_relationship(row, columns, model)


# -------------------------------
# Loader facade
# -------------------------------

class PopulationLoader:
    """Loads population data from structured Parquet exports under a base directory."""
    PLACE_RELATIONSHIP_MARKER = "_to_place-"

    def __init__(self, population_directory: str):
        self.population_directory = population_directory

    def discover_locations(self) -> List[Tuple[str, str]]:
        """Discover all (state, county) pairs that contain a valid agent file."""
        locations: List[Tuple[str, str]] = []
        state_paths = glob.glob(os.path.join(self.population_directory, "*"))
        for state_path in state_paths:
            if os.path.isdir(state_path):
                state_name = os.path.basename(state_path)
                county_paths = glob.glob(os.path.join(state_path, "*"))
                for county_path in county_paths:
                    if os.path.isdir(county_path):
                        county_name = os.path.basename(county_path)
                        agent_file = os.path.join(county_path, f"agent-person_{county_name}.parquet")
                        if os.path.exists(agent_file):
                            locations.append((state_name, county_name))
        return locations

    def load_all(self, model: PopulationModel, agent_class=PopulationAgent) -> None:
        """Load all available states and counties into the model."""
        for state, county in self.discover_locations():
            self.load_county(state, county, model, agent_class=agent_class)

    def load_county(self, state: str, county: str, model: PopulationModel, agent_class=PopulationAgent) -> None:
        """Load agents and places for a specific county."""
        county_path = os.path.join(self.population_directory, state, county)
        # 1) Places first (since agents/relationships may reference them)
        load_places_from_files(county_path, county, model, self.PLACE_RELATIONSHIP_MARKER)
        # 2) Agents
        agents_dict = load_agents_from_file(county_path, county, model, agent_class)
        # 3) Agent→Place
        process_agent_place_relationships(county_path, county, agents_dict, self.PLACE_RELATIONSHIP_MARKER)
        # 4) Place→Place
        process_place_place_relationships(county_path, county, model, self.PLACE_RELATIONSHIP_MARKER)

Minimal Working Example

Create demo_populus_mesa.py next to your populus_mesa_integration.py and run it:

# file: demo_populus_mesa.py
from pathlib import Path

from populus_mesa_integration import (
    PopulationModel,
    PopulationAgent,
    PopulationLoader,
)

# Define behavior by subclassing PopulationAgent
class Person(PopulationAgent):
    def step(self):
        # Example: read a relationship if present
        hh_id = self.get_place("household")
        if hh_id is not None:
            home = self.model.get_place(hh_id)
            # Example: use place hierarchy if available
            tract_id = home.get_parent_place("census_tract") if home else None
            _ = tract_id  # replace with your model logic
        # Example attribute access (if present in Parquet):
        # age = getattr(self, "age", None)

POP_DIR = Path("/path/to/populus")  # e.g., /data/populus
STATE   = "Illinois"
COUNTY  = "Cook_County"

model = PopulationModel()
loader = PopulationLoader(str(POP_DIR))

print("Discovering available locations...")
for s, c in loader.discover_locations():
    print(f" - {s}/{c}")

print(f"\nLoading {STATE}/{COUNTY}...")
loader.load_county(STATE, COUNTY, model, agent_class=Person)

# Quick sanity checks
print(f"Loaded places: {len(model.places)}")
try:
    n_agents = len(model.agents)  # Mesa 2 AgentSet should support len()
except Exception:
    # fallback if API differs
    n_agents = sum(1 for _ in model.agents)
print(f"Loaded agents: {n_agents}")

# Run a few steps
for t in range(3):
    model.step()
print("Done.")

Run:

python demo_populus_mesa.py

How to Use in Your Model

Access relationships at runtime:

some_agent = next(iter(model.agents))
household_id = some_agent.get_place("household")
household = model.get_place(household_id)

tract_id = household.get_parent_place("census_tract") if household else None
tract = model.get_place(tract_id) if tract_id else None

children_of_tract = tract.child_places if tract else []

Collect metrics with Mesa’s DataCollector:

from mesa.datacollection import DataCollector

class PopulusModel(PopulationModel):
def __init__(self):
super().__init__()
self.datacollector = DataCollector(
model_reporters={
"num_agents": lambda m: len(m.agents),
"num_places": lambda m: len(m.places),
}
)

def step(self):
super().step()
self.datacollector.collect(self)

Performance & Correctness Tips

Parquet engine: Use pyarrow for best performance.
Start small: Load one county first to validate everything, then scale up.
Columns: If you later customize IO, pass columns=[...] to pd.read_parquet() to reduce memory.
Filename hygiene: The loader infers types from filenames. Keep the exact patterns:
- Agent file: agent-person_<County>.parquet
- Place files: place-<type>_<County>.parquet
- Agent→Place: agent-person_to_place-<type>_<County>.parquet
- Place→Place: place-<child>_to_place-<parent>_<County>.parquet
No PII: Populus is synthetic. Use attributes responsibly and avoid re‑identification attempts.

Troubleshooting

FileNotFoundError: Agent file not found
Double‑check <POP_DIR>/<State>/<County>/agent-person_<County>.parquet.
Unable to parse ... filename
A filename deviates from the expected pattern. Fix the name or adjust PLACE_RELATIONSHIP_MARKER = "_to_place-".
Expected 2 columns in relationship file
Ensure place→place Parquet has exactly two columns (child *_place_id, parent *_place_id).
High memory usage
Test a small county first; customize column subsets later if needed.

Appendix: Legacy (Mesa 1.x)

If you use Mesa 1.x, change two things:

Adjust PopulationAgent to call the older super constructor:

# replace in PopulationAgent.__init__:
super().__init__(model)    # Mesa 2.x
# with:
super().__init__(unique_id, model)  # Mesa 1.x

Use a scheduler in PopulationModel:

from mesa.time import RandomActivation

class PopulationModel(Model):
    def __init__(self):
        super().__init__()
        self.places = {}
        self.schedule = RandomActivation(self)

    def add_place(self, place):
        self.places[place.id] = place

    def step(self):
        self.schedule.step()

When creating each agent in load_agents_from_file, add it to the schedule:

agent = agent_class(unique_id=agent_id, model=model, **agent_data)
model.schedule.add(agent)

Replace every self.agents.shuffle_do("step") with self.schedule.step().

📦 LLM Context Blocks (Drop These Into Your Repo)

Create docs/llm/ and add the files below so your coding assistant understands the integration and filenames.

`docs/llm/POPULUS_MESA_CONTEXT.md`

# CONTEXT: Epistemix Populus × Mesa (Copy-Paste Integration)

You are assisting with Python code that uses Mesa (ABM) to simulate a synthetic population from Epistemix Populus.

## Available Classes (imported from populus_mesa_integration.py)
- `Place(place_id, **attributes)`: place record with attributes, `parent_places` (dict), `child_places` (list).
  - `add_parent_relationship(type, parent_id)`
  - `add_child_place(child_id)`
  - `get_parent_place(type) -> parent_place_id | None`
- `PopulationModel(Model)`: holds `places: dict[place_id -> Place]`.
  - `add_place(place)`, `get_place(place_id)`
  - `step()` calls `self.agents.shuffle_do("step")` (Mesa ≥ 2)
- `PopulationAgent(Agent)`: base agent with arbitrary attributes and `places: dict[str, place_id]`.
  - `add_place_relationship(type, place_id)`
  - `get_place(type) -> place_id | None`
  - `step()` is overridden by users for behavior.
- `PopulationLoader(population_directory)`: loads Parquet exports into the model.
  - `discover_locations() -> list[(state, county)]`
  - `load_county(state, county, model, agent_class=PopulationAgent)`
  - `load_all(model, agent_class=PopulationAgent)`
  - Uses `PLACE_RELATIONSHIP_MARKER = "_to_place-"`

## Data Layout on Disk
`<POP_DIR>/<State>/<County>/`
- `agent-person_<County>.parquet`
- `place-<type>_<County>.parquet`
- `agent-person_to_place-<type>_<County>.parquet`
- `place-<child>_to_place-<parent>_<County>.parquet`

## Rules for the Assistant
- Assume Mesa ≥ 2 unless told otherwise. Use `Agent(model)` and `model.agents.shuffle_do("step")`.
- Do not invent Parquet columns; only use existing fields.
- To reference places from agents: `agent.get_place("<relationship>")` then `model.get_place(place_id)`.
- To traverse place hierarchies: `place.get_parent_place("<type>")` and `place.child_places`.
- Do not write to Parquet—read-only.

## Example Tasks
- Make `Person(PopulationAgent)` with `step()` and use `PopulationLoader.load_county(...)` to populate a `PopulationModel`.
- Aggregate (e.g., agents per household): iterate `model.agents`, get `agent.get_place("household")`, and count.
- Explore hierarchies (e.g., household → tract) via `get_parent_place`.

`docs/llm/POPULUS_MESA_QUICKSTART.md`

# QUICKSTART

1. `pip install "mesa>=2.0" pandas pyarrow`
2. Place Populus Parquet under `<POP_DIR>/<State>/<County>/`:
   - `agent-person_<County>.parquet`
   - `place-<type>_<County>.parquet`
   - `agent-person_to_place-<type>_<County>.parquet`
   - `place-<child>_to_place-<parent>_<County>.parquet`
3. Copy `populus_mesa_integration.py` into your project.
4. Subclass `PopulationAgent` as your `Person` and implement `step()`.
5. Create `PopulationModel()` and call
   `PopulationLoader(<POP_DIR>).load_county(<State>, <County>, model, agent_class=Person)`.
6. Run `for _ in range(N): model.step()`.

**Conventions**
- Relationship marker: `_to_place-`
- County names use underscores (e.g., `Cook_County`)
- All extra columns in Parquet become attributes on agents/places.

**Do / Don’t**
- ✅ Use `agent.get_place("<rel>")` → `model.get_place(place_id)`
- ✅ Use `place.get_parent_place("<type>")` to traverse hierarchies
- ❌ Don’t assume a column exists—check with `hasattr(agent, "age")`
- ❌ Don’t mutate Parquet or rely on global state

Summary

You now have a single-file integration (populus_mesa_integration.py) to load Populus Parquet into Mesa.
Subclass PopulationAgent for behavior; all attributes from Parquet are available on agents/places.
Relationships are discovered from filenames and materialized as convenient lookups.
Ship the LLM context files so your tools scaffold features reliably.