Williams, Morgan1 and Schoneveld, Louise1
1CSIRO Mineral Resources, Perth
The availability and accessibility of instrumentation to rapidly acquire large volumes of geochemical data has steadily increased over recent decades. However, the geochemical community as a whole is only beginning to embrace the flexibility and relative power of adopting high-level programming languages (e.g. Python) to process, summarise and visualise this data. This emerging shift provides opportunities to address scientific reproducibility concerns, increase the comparability of geochemical datasets, and develop transferable digital skills within the research community. Taking a declarative programmatic approach to data processing and analysis makes these tasks more repeatable, particularly when compared to multi-stage workflows based on extensive user interface interaction where steps are often only descriptively documented. When combined with versioning and environment management, this approach enables reproducible workflows which can be shared such that others can effectively examine, compare and re-use existing code as needed. To address the relative scarcity of geochemistry-focused tools and support adoption of a programmatic approach to geochemical data analysis we’ve developed pyrolite – an open source Python package for working with geochemical data. In this presentation we’ll provide an overview of a series of short examples demonstrating some of the advantages of adopting a programmatic approach to geochemical data analysis workflows and highlighting some of pyrolite’s key features.
pyrolite aims to allow new users to get off the ground quickly by providing ‘batteries included’ functionality for transforming, analysing and visualising geochemical and compositional data. The package contains a suite of functions commonly used in geochemical data workflows, including log-transforms for working with compositional data in a robust manner, scaling between units and simple element-oxide conversion. pyrolite also implements several common geochemical visualisations and plot templates and provides easy access to data-density based visualisation methods better adapted to larger multivariate datasets with hundreds to hundreds of thousands of samples. Beyond providing foundational functionality, pyrolite also provides a framework to encode and document relevant algorithms recently introduced to the geochemistry community (e.g. lambdas for parameterising rare earth element profiles, bootstrap resampling methods) and also link geochemical data to common machine learning frameworks (e.g. scikit-learn). The package is built upon and exposes the API of commonly used scientific Python packages, including matplotlib and pandas, allowing a greater degree of interoperability and familiarity for new users.
The package and related tools continue to be actively developed, and the package has recently been peer reviewed and published. Future development is planned to include support for interactive visualization and will expand the set of examples and tutorials within the documentation suite. The project is being developed for and by the geochemistry community, and we encourage new users to get involved. We hope to foster a growing community of users and contributors to ensure the long-term sustainability and usefulness of the project; all forms of contribution to the project are welcome.
Morgan uses data-driven approaches to interrogate geochemical datasets and develops open software for geoscientists. His current research focuses on lithogeochemical classification, tectonic discrimination, and spatial geochemical prediction. Louise shoots rocks with lasers, electrons and X-rays to expose their oft-subtlety hidden secrets. They’re both keen on supporting better geochemical data practices.