TODO β MEF Module Roadmap¶
Planned features and improvements for the perustats.MEF module.
π In Progress / Next Up¶
Progress Bars¶
Add rich visual feedback during long-running scrapes using the rich library's progress utilities.
- [ ] Step-level progress bar showing
Step N / Totalas each navigation level is entered - [ ] Row-level progress bar within each step showing
Row N / M (current_value) - [ ] Display the current metadata context (year, filters applied) in the progress description
- [ ] Configurable verbosity:
verbose=Falseto silence all output,verbose=True(default) for progress bars - [ ] Nested progress bars for multi-level workflows using
rich.progress.ProgresswithTaskID
# Planned API (subject to change)
MEFScraper(steps, verbose=True).run(2023)
# β
# Scraping MEF gasto 2023 ββββββββββββββββββββββ 100% Step 3/4
# TipoGobierno β Locales βββββββββββββββββββββ 100% Row 2/2
# Departamento ββββββββββββββββββΈβββ 72% Row 18/25
Partial Saves (SavePartial)¶
SavePartial is already parsed and accepted in the step list but is not yet active. Implementation will:
- [ ] Write a Parquet or CSV checkpoint after completing all row iterations at the marked step
- [ ] Detect existing checkpoints on startup and skip already-scraped rows
- [ ] Allow resuming an interrupted scrape without restarting from zero
- [ ] Expose a
resume=Trueflag onMEFScraper.run()to opt into checkpoint recovery
# Planned behavior
steps = [
...
Rows(),
ClickBtn(BTN.DEPARTAMENTO),
SavePartial(filename_prefix="departamento"), # will checkpoint here
...
]
MEFScraper(steps, master_dir_save="./data/mef/").run(2023)
# If interrupted and re-run, already-saved departments are skipped
π Planned Features¶
Async / Concurrent Requests¶
- [ ]
asyncio-based HTTP requests to parallelize row iterations at each step - [ ] Configurable concurrency limit (
max_workers) to avoid overwhelming the MEF server - [ ] Estimated time remaining based on rows completed
OnMissing Full Implementation¶
- [ ]
OnMissing.SKIPβ silently skip steps where no row matches the filter - [ ]
OnMissing.RAISEβ raiseValueErroron missing rows (useful for strict pipelines) - [ ]
OnMissing.RECORDβ log misses to ascraper.missingattribute for post-run inspection
Retry Logic¶
- [ ] Automatic retry with exponential backoff for network errors and MEF 500 responses
- [ ] Configurable
max_retriesandretry_delayparameters onMEFScraper
Output Formats¶
- [ ]
.save()method to write results directly to CSV, Parquet, or Excel - [ ] Optional column renaming / localization to Spanish descriptive names
Extended Year Support¶
- [ ] Monitor MEF portal for v8 migration and update URL/column configs accordingly
π Known Limitations¶
- The MEF portal is occasionally down or slow β no retry logic yet.
SavePartialsteps are parsed but do nothing at runtime.OnMissingenum is defined but not yet enforced in_proces_step.- Very large result sets (all municipalities, all years) can be slow due to sequential HTTP requests.
π‘ Ideas Under Consideration¶
- A
MEFScraper.preview()method that simulates the first step offline and returns the initial table without making further requests. - CLI entrypoint (
perustats mef run ...) for one-off queries without writing a script. - Integration with
perustatscaching layer (shared with BCRP module) to avoid redundant requests.