Examples¶
A collection of practical recipes for common use-cases.
Example 1 — ENAHO, single year, selective modules¶
from perustats.inei import INEIFetcher
fetcher = INEIFetcher(
survey="enaho",
years=[2023],
master_directory="./data",
)
fetcher.fetch_modules()
# Peek at what is available
print(fetcher.modules_df[["module_code", "stata", "csv"]].dropna(subset=["stata"]))
# Download modules 1, 2, and 5 (housing, household members, education)
fetcher.download(module_codes=[1, 2, 5])
# Organize by module for easy longitudinal stacking later
fetcher.organize(organize_by="module")
Example 2 — ENDES, multiple years, multiple formats¶
from perustats.inei import INEIFetcher
(INEIFetcher(
survey="endes",
years=range(2010, 2024),
master_directory="./datos_inei",
parallel_jobs=6,
)
.fetch_modules()
.download(
module_codes=[64, 65, 73, 74],
force=False,
remove_zip_after_extract=False,
)
.organize(organize_by="year", operation="copy")
.organize(organize_by="module", operation="copy"))
Tip
Calling organize() twice with different organize_by values is a valid pattern — the source files in 1_unzipped/ are not modified as long as operation="copy".
Example 3 — ENAPRES, save disk space¶
When storage is constrained, use operation="move" and remove ZIPs after extraction:
from perustats.inei import INEIFetcher
(INEIFetcher("enapres", years=range(2015, 2024), parallel_jobs=4)
.fetch_modules()
.download(remove_zip_after_extract=True)
.organize(organize_by="module", operation="move"))
Example 4 — Inspect modules without downloading¶
from perustats.inei import INEIFetcher
fetcher = INEIFetcher("enaho", years=[2021, 2022, 2023])
fetcher.fetch_modules()
df = fetcher.modules_df
print(df.shape) # (rows, cols)
print(df["module_code"].unique())
print(df[df["stata"].notna()][["year_ref", "module_code", "stata"]])
Example 5 — Registering a custom survey¶
from perustats.inei import INEIFetcher, registry, Survey
# Register once at the top of your script/notebook
registry.register(Survey(
code="enniv",
name="Encuesta Nacional de Niveles de Vida",
period="anual",
))
# Then use it like any built-in survey
INEIFetcher("enniv", years=range(2000, 2010)).fetch_modules().download().organize()
Example 6 — RENAMU, full download¶
from perustats.inei import INEIFetcher
(INEIFetcher("renamu", years=range(2012, 2024), parallel_jobs=4)
.fetch_modules()
.download() # no module_codes = download all
.organize(organize_by="year"))
Tips & Best Practices¶
- Start small: test with one or two years before requesting a full decade.
- Use
force=False: skip already-downloaded files on re-runs to avoid redundant network traffic. - Tune
parallel_jobs:4–6workers usually gives the best throughput without overwhelming the INEI server. - Keep ZIPs initially: set
remove_zip_after_extract=Falsewhile experimenting; delete later once you confirm the extraction is complete. - Hash deduplication: leave
deduplicate_docs_by_hash=True(default) to avoid accumulating hundreds of identical PDF copies in the documentation folder.