The raw CIA World Factbook changed format at least 10 times between 1990 and 2025. Every script in etl/ exists because a previous version of the parser broke on a new year's data. The pipeline handles ...
Need to extract data from PDF files into a spreadsheet so you can analyze it? Find out how seven PDF to Excel conversion tools fared in head-to-head tests with increasingly complex data sources. In an ...
Pantable is a Python library that maps the pandoc Table AST to an internal structure losslessly. This enables writing pandoc filters specifically manipulating tables in pandoc. pantable is the main ...