Installation & Quickstart¶
Installation¶
PeruStats is published on PyPI and can be installed with pip:
Python 3.9 or later is required.
Minimal Example¶
from perustats.inei import INEIFetcher
fetcher = INEIFetcher(
survey="enaho",
years=range(2020, 2024),
)
fetcher.fetch_modules().download(module_codes=[1, 2, 5]).organize(organize_by="year")
That's it. After this runs you will find the organized files under ./data/microdatos_inei/enaho/2_organized/by_year/.
Step-by-Step Breakdown¶
1 — Create the fetcher¶
from perustats.inei import INEIFetcher
fetcher = INEIFetcher(
survey="enaho", # survey code
years=range(2018, 2024), # years to cover
master_directory="./data",
parallel_jobs=4,
)
2 — Fetch the module listing¶
This call scrapes the INEI portal for each requested year and stores the result in a local SQLite database (referrer.db). Subsequent runs load from the cache instead of hitting the network.
After the call completes, the module catalogue is available as a DataFrame:
3 — Download ZIP files¶
fetcher.download(
module_codes=[1, 2, 5], # None = download everything
force=False, # skip existing valid ZIPs
remove_zip_after_extract=False,
)
ZIPs are saved to 0_zips/ and extracted to 1_unzipped/.
4 — Organize extracted files¶
fetcher.organize(
organize_by="year", # or "module"
operation="copy", # or "move" to save disk space
deduplicate_docs_by_hash=True,
)
Files are placed in 2_organized/by_year/ (or by_module/). Documentation PDFs are deduplicated by content hash when deduplicate_docs_by_hash=True.
Method Chaining¶
All three methods return self, so you can chain them:
(INEIFetcher("endes", years=range(2015, 2024))
.fetch_modules()
.download(module_codes=[64, 65, 73])
.organize(organize_by="module", operation="move"))