Historical and Colonial Census Data Archive - HCCDA

The Historical Census and Colonial Data Archive (HCCDA) is a searchable archive of Australian colonial census publications and reports covering the period from 1833 to 1901, the year of Australia's federation. The corpus includes 18,638 pages of text, 208 maps, and approximately 15000 tables, all with full digital images, text conversion and individually identified pages and tables.

Please note that the archive contains colonial census reports, but not individual census returns.

Background to the HCCDA

The HCCDA had it's origins in the work of the Australian Bureau of Statistics in 1988 to copy the colonial census reports from the existing paper volumes onto fiche as part of the Australian Bicentenary. In the time since, fiche has largely become outmoded, as few fiche readers are still maintained, and most new resources of this type are developed in digital formats. The aim of the HCCDA project was therefore to transform these fiche census publications, including digital conversion, searching and presentation.

In 2005, researchers at the Australian National University (ANU) combined with the ABS to investigate how to digitize these scanned documents. The project team (lead by Tim Rowse, Len Smith and Stuart Hungerford) provided complementary skills in data archiving, IT technologies, historical and demographic research and liaison with ABS. With funds drawn from a wider ARC LIEF grant to the Australian Social Science Data Archive (LE775510), the team began a pilot project to test the feasibility of such a conversion process.

For two reasons, the project chose nineteenth century census publications produced by the six colonial governments. First, the published presentation of population data in nineteenth century Australian colonies offers useful challenges, as it was heterogeneous in style and conventions. By choosing to capture these complex documents in a semantically rich form – even if the initial cost of doing so is relatively high – the project has maximized its exposure to technical problems and thus to learning opportunities. Second, because the publications from the six colonial governments tended to be short, compared with the ever-fatter volumes of twentieth century censuses, it was possible to encompass a longer period within a ‘mere’ 20,000 pages; were our pilot project to have no successor, it will at least have produced a longitudinal sequence of searchable data that historians of the nineteenth century would find useful. For these two reasons the pilot project has been based on census reports in New South Wales from 1833 to 1901, in Queensland from 1861 to 1901, in South Australia from 1844 to 1901, in Tasmania from 1842 to 1901, Victoria from 1854 to 1901 and in Western Australia from 1848 to 1901.
The resulting corpus actually comes to 18,638 pages of text, 208 maps, and approximately 15000 tables (most tables are embedded in a text page).

The structure of the HCCDA archive

The HCCDA archive contains a large number of highly structured document pages, tables and page images. The digital assets in the archive are currently organized hierarchically into regions, which contain publication years, which in turn contain documents. Document assets contain both page and table assets.

Every digital asset in the HCCDA archive has a unique ID. These IDs are designed to be descriptive and human-readable, without being too verbose to work with. Currently, every region (colony), publication year, document, document page and document table has an HCCDA ID.

Page Numbering

The page numbering schemes used across the HCCDA archive documents are widely varying. The page numbering scheme we've adopted needs to cope with: decimal and Roman page numbers, missing page numbers, page numbers restarting in a document, multiple page ranges bound into one publication, pages inserted after publication, amongst other issues. To cope with these issues HCCDA page numbers are not simple indexes, but compound numbers. Detailed examples are provided in the HCCDA help information. The URLs that you can use to directly access or reference HCCDA documents, pages and tables via this website make use of the standard IDs as described above.

Funding Partners

Collaboration Partners

