Populus × Mesa: Copy‑Paste Integration
Build Agent-Based Models with Populus
Goal: Load Populus synthetic agents and places from Parquet, wire up agent⇄place and place⇄place relationships, and run a Mesa model where agents “know” their household, workplace, school, etc.—with everything provided right here to copy/paste.
Prerequisites
-
Python ≥ 3.10
-
Mesa ≥ 2.0 (this code targets the Mesa 2 series API)
-
Pandas + pyarrow (or fastparquet) for Parquet IO
pip install "mesa>=2.0" pandas pyarrow
# or: pip install fastparquet
Mesa version note: This integration uses Mesa ≥ 2 conventions (
Agent(model)
,model.agents.shuffle_do("step")
). If you’re on Mesa 1.x, see the Legacy (Mesa 1.x) appendix at the end for the small code edits you’ll need.
Expected Directory Layout for Populus Exports
<POP_DIR>/<State>/<County>/
agent-person_<County>.parquet
place-<type>_<County>.parquet
agent-person_to_place-<type>_<County>.parquet
place-<child>_to_place-<parent>_<County>.parquet
-
Agents:
agent-person_<County>.parquet
must containagent_id
plus any other columns you want copied onto the agent (e.g.,age
). -
Places:
place-<type>_<County>.parquet
must containplace_id
plus any columns you want copied onto the place record. -
Agent→Place:
agent-person_to_place-<type>_<County>.parquet
hasagent_id, place_id
. The<type>
string (e.g.,household
,workplace
) becomes the relationship name. -
Place→Place:
place-<child>_to_place-<parent>_<County>.parquet
has exactly two columns, e.g.,block_group_place_id, tract_place_id
.
County strings: Use the exact on‑disk spelling. Exports commonly use underscores for spaces (e.g.,
Cook_County
).
🔧 Copy‑Paste Code (Single‑File Integration)
Create a file named populus_mesa_integration.py
in your project and paste the following:
# file: populus_mesa_integration.py
# Populus × Mesa integration: Places, Agents, Model, and a Loader for Parquet exports.
from __future__ import annotations
import os
import glob
import pandas as pd
from typing import Dict, List, Optional, Tuple, Iterable
from mesa import Model
from mesa import Agent as MesaAgent
# -------------------------------
# Core data: Place
# -------------------------------
class Place:
"""Represents a place in the population with hierarchical relationships."""
def __init__(self, place_id, **attributes):
self.id = place_id
self.parent_places: Dict[str, str] = {} # {'census_tract': place_id, 'county': place_id}
self.child_places: List[str] = [] # [block_group_id1, block_group_id2, ...]
for key, value in attributes.items():
setattr(self, key, value)
def add_parent_relationship(self, relationship_type: str, parent_id: str) -> None:
"""Add a parent place relationship."""
self.parent_places[relationship_type] = parent_id
def add_child_place(self, child_id: str) -> None:
"""Add a child place."""
if child_id not in self.child_places:
self.child_places.append(child_id)
def get_parent_place(self, relationship_type: str) -> Optional[str]:
"""Get the parent place ID for a specific relationship type."""
return self.parent_places.get(relationship_type)
# -------------------------------
# Mesa model with place registry
# -------------------------------
class PopulationModel(Model):
"""Mesa Model wrapper with built-in place management for population data."""
def __init__(self):
super().__init__()
self.places: Dict[str, Place] = {} # place_id -> Place object
def add_place(self, place: Place) -> None:
"""Add a place to the model's place storage."""
self.places[place.id] = place
def get_place(self, place_id: str) -> Optional[Place]:
"""Get a place by its ID."""
return self.places.get(place_id)
def step(self) -> None:
"""Step all agents in the model (Mesa ≥ 2)."""
# In Mesa 2.x, Model maintains AgentSet at self.agents
self.agents.shuffle_do("step")
# -------------------------------
# Mesa agent with Populus fields
# -------------------------------
class PopulationAgent(MesaAgent):
"""
Mesa agent with population data attributes and place relationships.
Users should subclass this to define their own step behavior and other methods.
"""
def __init__(self, unique_id, model: PopulationModel, **attributes):
# Mesa 2.x signature: Agent(model)
super().__init__(model)
self.unique_id = unique_id
self.places: Dict[str, str] = {} # {'household': place_id, 'workplace': place_id, ...}
for key, value in attributes.items():
setattr(self, key, value)
def add_place_relationship(self, relationship_type: str, place_id: str) -> None:
"""Add a relationship to a place (household, workplace, etc.)."""
self.places[relationship_type] = place_id
def get_place(self, relationship_type: str) -> Optional[str]:
"""Get the place ID for a specific relationship type."""
return self.places.get(relationship_type)
def step(self) -> None:
"""Default step method - override for specific behavior."""
pass
# -------------------------------
# Helpers: filename parsing & IO
# -------------------------------
def extract_place_type_from_filename(filename: str, county: str) -> str:
"""Extract place type from a place declaration filename."""
parts_after_place = filename.split('-', 1)[1] # remove 'place-' prefix
county_index = parts_after_place.rfind(f'_{county}.parquet')
if county_index != -1:
return parts_after_place[:county_index]
raise ValueError(
f"Unable to parse place type from filename: {filename}. "
f"Expected pattern: place-_{county}.parquet"
)
def extract_agent_relationship_type(filename: str, county: str, relationship_marker: str) -> str:
"""Extract relationship type from agent-to-place relationship filename."""
type_part = filename.split(relationship_marker, 1)[1]
county_suffix = f'_{county}.parquet'
if type_part.endswith(county_suffix):
return type_part[:-len(county_suffix)]
raise ValueError(
f"Unable to parse relationship type from filename: {filename}. "
f"Expected pattern: agent-person{relationship_marker}_{county}.parquet"
)
def parse_place_relationship_filename(filename: str, county: str, relationship_marker: str) -> Tuple[str, str]:
"""Parse place-to-place relationship filename to extract (child_type, parent_type)."""
after_prefix = filename.split('place-', 1)[1]
parts = after_prefix.split(relationship_marker, 1)
if len(parts) != 2:
raise ValueError(f"Unable to parse place relationship filename: {filename}")
child_type = parts[0]
parent_part = parts[1].split('place-', 1)[1] if 'place-' in parts[1] else parts[1]
county_suffix = f'_{county}.parquet'
if parent_part.endswith(county_suffix):
parent_type = parent_part[:-len(county_suffix)]
else:
raise ValueError(f"Unable to parse parent type from filename: {filename}")
return child_type, parent_type
def get_place_declaration_files(county_path: str, county: str, relationship_marker: str) -> List[str]:
"""Get all place declaration files (excluding relationship files)."""
place_pattern = os.path.join(county_path, f"place-*_{county}.parquet")
place_files = glob.glob(place_pattern)
place_declaration_files = [
fp for fp in place_files if relationship_marker not in os.path.basename(fp)
]
if not place_declaration_files:
raise FileNotFoundError(
f"No place declaration files found matching pattern: {place_pattern}"
)
return place_declaration_files
def load_places_from_files(county_path: str, county: str, model: PopulationModel, relationship_marker: str) -> None:
"""Load all places from declaration files into the model."""
place_declaration_files = get_place_declaration_files(county_path, county, relationship_marker)
for place_file in place_declaration_files:
filename = os.path.basename(place_file)
place_type = extract_place_type_from_filename(filename, county)
df_places = pd.read_parquet(place_file)
for _, row in df_places.iterrows():
place_data = row.to_dict()
place_id = place_data.pop('place_id')
place_data['place_type'] = place_type
place = Place(place_id, **place_data)
model.add_place(place)
def load_agents_from_file(county_path: str, county: str, model: PopulationModel, agent_class=PopulationAgent) -> Dict:
"""Load all agents from the agent file and return a dict[agent_id -> agent]."""
agent_file = os.path.join(county_path, f"agent-person_{county}.parquet")
if not os.path.exists(agent_file):
raise FileNotFoundError(f"Agent file not found: {agent_file}")
df_agents = pd.read_parquet(agent_file)
agents_dict = {}
for idx, row in df_agents.iterrows():
agent_data = row.to_dict()
agent_id = agent_data.pop('agent_id', idx)
agent = agent_class(unique_id=agent_id, model=model, **agent_data)
agents_dict[agent_id] = agent
return agents_dict
def process_agent_place_relationships(county_path: str, county: str, agents_dict: Dict, relationship_marker: str) -> None:
"""Process all agent-to-place relationship files and attach them to agents."""
pattern = os.path.join(county_path, f"agent-person{relationship_marker}*_{county}.parquet")
agent_relationship_files = glob.glob(pattern)
for relationship_file in agent_relationship_files:
filename = os.path.basename(relationship_file)
relationship_type = extract_agent_relationship_type(filename, county, relationship_marker)
df_relationships = pd.read_parquet(relationship_file)
for _, row in df_relationships.iterrows():
agent_id = row['agent_id']
place_id = row['place_id']
if agent_id in agents_dict:
agents_dict[agent_id].add_place_relationship(relationship_type, place_id)
def process_single_place_relationship(row: pd.Series, columns: List[str], model: PopulationModel) -> None:
"""Process a single place-to-place relationship row."""
if len(columns) != 2:
raise ValueError(f"Expected 2 columns in relationship file, got {len(columns)}")
child_col, parent_col = columns[0], columns[1]
child_place_id = row[child_col]
parent_place_id = row[parent_col]
parent_type = parent_col.replace('_place_id', '') if '_place_id' in parent_col else 'parent'
child_place = model.get_place(child_place_id)
parent_place = model.get_place(parent_place_id)
if child_place:
child_place.add_parent_relationship(parent_type, parent_place_id)
if parent_place:
parent_place.add_child_place(child_place_id)
def process_place_place_relationships(county_path: str, county: str, model: PopulationModel, relationship_marker: str) -> None:
"""Process all place-to-place relationship files."""
pattern = os.path.join(county_path, f"place-*{relationship_marker}*_{county}.parquet")
place_relationship_files = glob.glob(pattern)
for relationship_file in place_relationship_files:
filename = os.path.basename(relationship_file)
# Parsing validates our expectations; types are not strictly needed below
_child_type, _parent_type = parse_place_relationship_filename(filename, county, relationship_marker)
df_relationships = pd.read_parquet(relationship_file)
columns = df_relationships.columns.tolist()
for _, row in df_relationships.iterrows():
process_single_place_relationship(row, columns, model)
# -------------------------------
# Loader facade
# -------------------------------
class PopulationLoader:
"""Loads population data from structured Parquet exports under a base directory."""
PLACE_RELATIONSHIP_MARKER = "_to_place-"
def __init__(self, population_directory: str):
self.population_directory = population_directory
def discover_locations(self) -> List[Tuple[str, str]]:
"""Discover all (state, county) pairs that contain a valid agent file."""
locations: List[Tuple[str, str]] = []
state_paths = glob.glob(os.path.join(self.population_directory, "*"))
for state_path in state_paths:
if os.path.isdir(state_path):
state_name = os.path.basename(state_path)
county_paths = glob.glob(os.path.join(state_path, "*"))
for county_path in county_paths:
if os.path.isdir(county_path):
county_name = os.path.basename(county_path)
agent_file = os.path.join(county_path, f"agent-person_{county_name}.parquet")
if os.path.exists(agent_file):
locations.append((state_name, county_name))
return locations
def load_all(self, model: PopulationModel, agent_class=PopulationAgent) -> None:
"""Load all available states and counties into the model."""
for state, county in self.discover_locations():
self.load_county(state, county, model, agent_class=agent_class)
def load_county(self, state: str, county: str, model: PopulationModel, agent_class=PopulationAgent) -> None:
"""Load agents and places for a specific county."""
county_path = os.path.join(self.population_directory, state, county)
# 1) Places first (since agents/relationships may reference them)
load_places_from_files(county_path, county, model, self.PLACE_RELATIONSHIP_MARKER)
# 2) Agents
agents_dict = load_agents_from_file(county_path, county, model, agent_class)
# 3) Agent→Place
process_agent_place_relationships(county_path, county, agents_dict, self.PLACE_RELATIONSHIP_MARKER)
# 4) Place→Place
process_place_place_relationships(county_path, county, model, self.PLACE_RELATIONSHIP_MARKER)
Minimal Working Example
Create demo_populus_mesa.py
next to your populus_mesa_integration.py
and run it:
# file: demo_populus_mesa.py
from pathlib import Path
from populus_mesa_integration import (
PopulationModel,
PopulationAgent,
PopulationLoader,
)
# Define behavior by subclassing PopulationAgent
class Person(PopulationAgent):
def step(self):
# Example: read a relationship if present
hh_id = self.get_place("household")
if hh_id is not None:
home = self.model.get_place(hh_id)
# Example: use place hierarchy if available
tract_id = home.get_parent_place("census_tract") if home else None
_ = tract_id # replace with your model logic
# Example attribute access (if present in Parquet):
# age = getattr(self, "age", None)
POP_DIR = Path("/path/to/populus") # e.g., /data/populus
STATE = "Illinois"
COUNTY = "Cook_County"
model = PopulationModel()
loader = PopulationLoader(str(POP_DIR))
print("Discovering available locations...")
for s, c in loader.discover_locations():
print(f" - {s}/{c}")
print(f"\nLoading {STATE}/{COUNTY}...")
loader.load_county(STATE, COUNTY, model, agent_class=Person)
# Quick sanity checks
print(f"Loaded places: {len(model.places)}")
try:
n_agents = len(model.agents) # Mesa 2 AgentSet should support len()
except Exception:
# fallback if API differs
n_agents = sum(1 for _ in model.agents)
print(f"Loaded agents: {n_agents}")
# Run a few steps
for t in range(3):
model.step()
print("Done.")
Run:
python demo_populus_mesa.py
How to Use in Your Model
Access relationships at runtime:
some_agent = next(iter(model.agents))
household_id = some_agent.get_place("household")
household = model.get_place(household_id)
tract_id = household.get_parent_place("census_tract") if household else None
tract = model.get_place(tract_id) if tract_id else None
children_of_tract = tract.child_places if tract else []
Collect metrics with Mesa’s DataCollector:
from mesa.datacollection import DataCollector
class PopulusModel(PopulationModel):
def __init__(self):
super().__init__()
self.datacollector = DataCollector(
model_reporters={
"num_agents": lambda m: len(m.agents),
"num_places": lambda m: len(m.places),
}
)
def step(self):
super().step()
self.datacollector.collect(self)
Performance & Correctness Tips
-
Parquet engine: Use
pyarrow
for best performance. -
Start small: Load one county first to validate everything, then scale up.
-
Columns: If you later customize IO, pass
columns=[...]
topd.read_parquet()
to reduce memory. -
Filename hygiene: The loader infers types from filenames. Keep the exact patterns:
-
Agent file:
agent-person_<County>.parquet
-
Place files:
place-<type>_<County>.parquet
-
Agent→Place:
agent-person_to_place-<type>_<County>.parquet
-
Place→Place:
place-<child>_to_place-<parent>_<County>.parquet
-
-
No PII: Populus is synthetic. Use attributes responsibly and avoid re‑identification attempts.
Troubleshooting
-
FileNotFoundError: Agent file not found
Double‑check<POP_DIR>/<State>/<County>/agent-person_<County>.parquet
. -
Unable to parse ... filename
A filename deviates from the expected pattern. Fix the name or adjustPLACE_RELATIONSHIP_MARKER = "_to_place-"
. -
Expected 2 columns in relationship file
Ensure place→place Parquet has exactly two columns (child*_place_id
, parent*_place_id
). -
High memory usage
Test a small county first; customize column subsets later if needed.
Appendix: Legacy (Mesa 1.x)
If you use Mesa 1.x, change two things:
-
Adjust
PopulationAgent
to call the older super constructor:
# replace in PopulationAgent.__init__:
super().__init__(model) # Mesa 2.x
# with:
super().__init__(unique_id, model) # Mesa 1.x
-
Use a scheduler in
PopulationModel
:
from mesa.time import RandomActivation
class PopulationModel(Model):
def __init__(self):
super().__init__()
self.places = {}
self.schedule = RandomActivation(self)
def add_place(self, place):
self.places[place.id] = place
def step(self):
self.schedule.step()
-
When creating each agent in
load_agents_from_file
, add it to the schedule:
agent = agent_class(unique_id=agent_id, model=model, **agent_data)
model.schedule.add(agent)
-
Replace every
self.agents.shuffle_do("step")
withself.schedule.step()
.
📦 LLM Context Blocks (Drop These Into Your Repo)
Create docs/llm/
and add the files below so your coding assistant understands the integration and filenames.
docs/llm/POPULUS_MESA_CONTEXT.md
# CONTEXT: Epistemix Populus × Mesa (Copy-Paste Integration)
You are assisting with Python code that uses Mesa (ABM) to simulate a synthetic population from Epistemix Populus.
## Available Classes (imported from populus_mesa_integration.py)
- `Place(place_id, **attributes)`: place record with attributes, `parent_places` (dict), `child_places` (list).
- `add_parent_relationship(type, parent_id)`
- `add_child_place(child_id)`
- `get_parent_place(type) -> parent_place_id | None`
- `PopulationModel(Model)`: holds `places: dict[place_id -> Place]`.
- `add_place(place)`, `get_place(place_id)`
- `step()` calls `self.agents.shuffle_do("step")` (Mesa ≥ 2)
- `PopulationAgent(Agent)`: base agent with arbitrary attributes and `places: dict[str, place_id]`.
- `add_place_relationship(type, place_id)`
- `get_place(type) -> place_id | None`
- `step()` is overridden by users for behavior.
- `PopulationLoader(population_directory)`: loads Parquet exports into the model.
- `discover_locations() -> list[(state, county)]`
- `load_county(state, county, model, agent_class=PopulationAgent)`
- `load_all(model, agent_class=PopulationAgent)`
- Uses `PLACE_RELATIONSHIP_MARKER = "_to_place-"`
## Data Layout on Disk
`<POP_DIR>/<State>/<County>/`
- `agent-person_<County>.parquet`
- `place-<type>_<County>.parquet`
- `agent-person_to_place-<type>_<County>.parquet`
- `place-<child>_to_place-<parent>_<County>.parquet`
## Rules for the Assistant
- Assume Mesa ≥ 2 unless told otherwise. Use `Agent(model)` and `model.agents.shuffle_do("step")`.
- Do not invent Parquet columns; only use existing fields.
- To reference places from agents: `agent.get_place("<relationship>")` then `model.get_place(place_id)`.
- To traverse place hierarchies: `place.get_parent_place("<type>")` and `place.child_places`.
- Do not write to Parquet—read-only.
## Example Tasks
- Make `Person(PopulationAgent)` with `step()` and use `PopulationLoader.load_county(...)` to populate a `PopulationModel`.
- Aggregate (e.g., agents per household): iterate `model.agents`, get `agent.get_place("household")`, and count.
- Explore hierarchies (e.g., household → tract) via `get_parent_place`.
docs/llm/POPULUS_MESA_QUICKSTART.md
# QUICKSTART
1. `pip install "mesa>=2.0" pandas pyarrow`
2. Place Populus Parquet under `<POP_DIR>/<State>/<County>/`:
- `agent-person_<County>.parquet`
- `place-<type>_<County>.parquet`
- `agent-person_to_place-<type>_<County>.parquet`
- `place-<child>_to_place-<parent>_<County>.parquet`
3. Copy `populus_mesa_integration.py` into your project.
4. Subclass `PopulationAgent` as your `Person` and implement `step()`.
5. Create `PopulationModel()` and call
`PopulationLoader(<POP_DIR>).load_county(<State>, <County>, model, agent_class=Person)`.
6. Run `for _ in range(N): model.step()`.
**Conventions**
- Relationship marker: `_to_place-`
- County names use underscores (e.g., `Cook_County`)
- All extra columns in Parquet become attributes on agents/places.
**Do / Don’t**
- ✅ Use `agent.get_place("<rel>")` → `model.get_place(place_id)`
- ✅ Use `place.get_parent_place("<type>")` to traverse hierarchies
- ❌ Don’t assume a column exists—check with `hasattr(agent, "age")`
- ❌ Don’t mutate Parquet or rely on global state
Summary
-
You now have a single-file integration (
populus_mesa_integration.py
) to load Populus Parquet into Mesa. -
Subclass
PopulationAgent
for behavior; all attributes from Parquet are available on agents/places. -
Relationships are discovered from filenames and materialized as convenient lookups.
-
Ship the LLM context files so your tools scaffold features reliably.