Frequently Asked Questions
The most common issues that come up about Populus.
1) What is Populus (today)?
Answer: Populus provides a high‑fidelity synthetic population built from public Census data. Each record represents a person and household with core attributes—gender, ethnicity, income, household location, workplace location, etc.—mirroring real‑world distributions without using PII.
2) What can I use the synthetic population for?
Answer: Teams use Populus to:
-
Run agent‑based simulations (ABM): Treat each record as an agent with demographics, locations, and relationships.
-
Build ML models: Train or fine‑tune models (e.g., to predict missing attributes).
-
Mitigate bias in training data: Reweight or augment datasets to align with census‑grounded distributions.
-
Project to new regions: Extend data from one region to another by applying your models over the synthetic population.
3) Which attributes are included out of the box?
Answer: Populus ships with core census‑derived attributes, including gender, ethnicity, income, household location, and workplace location (plus related fields to support joins and grouping). Additional fields can be attached during enrichment (see below).
4) How does enrichment work?
Answer: You can attach your own data or third‑party attributes to the synthetic population using the core keys (e.g., gender, ethnicity, income, household location, workplace location):
-
Choose the segment or geography you want to enrich (e.g., tract, county, metro).
-
Join your attributes on matching keys (or map them with a crosswalk).
-
Optionally compute derived features or probabilities for use in ABM/ML.
5) What formats can I download, and how do I use them?
Answer: Export the population as Parquet (columnar, fast, analytics‑friendly) or CSV (universally compatible). Load directly into Python (pandas/Polars), Spark, R, SQL warehouses, or ABM frameworks—no special tooling required.
6) Does Populus contain PII? Is it privacy‑safe?
Answer: Yes, it’s privacy‑safe. No PII is required or included. Records are synthetic—constructed to match published distributions—so they do not correspond to real individuals.
7) How realistic/accurate is the population, and how is it validated?
Answer: Populus is calibrated to public Census distributions at appropriate geographic levels. We validate by comparing generated marginals (and available joint distributions) to published totals and by checking location patterns (e.g., home/workplace distributions) for internal consistency. This aligns the synthetic population with ground‑truth population structure.
8) How do I use Populus to remove bias in a training dataset?
Answer: Use the synthetic population as a reference distribution:
-
Reweight your training rows to match census‑grounded segment shares.
-
Resample/augment under‑represented groups to reduce representation bias.
-
Audit model outputs across segments defined by gender, ethnicity, income, and location to detect residual disparities.
Populus supplies the segments and targets; you apply the weighting/augmentation in your ML pipeline.
9) How do I project data or a model to a new region?
Answer: Train or calibrate your model where you have data, then apply it over the synthetic population of the target region. Because schemas are consistent, you can generate region‑specific predictions, counts, and what‑ifs—even when you lack direct measurement in that region.
10) What regions/geographies are supported, and how often is it updated?
Answer: Populus is built from public Census data and organizes records by standard geographic units. Supported coverage and update cadence depend on the underlying public releases.
Customize here with your specifics (e.g., country/coverage levels, most recent Census/ACS vintage, and refresh frequency).