Preprint Information eXtraction for Life Sciences
Preprints are a relatively new method of making research findings available to other researchers prior to peer review and publication in an academic journal. Posting a paper as a preprint is the fastest way to share and disseminate results that other scientists may be able to benefit from. Preprints are hosted on a number of different servers, many of which are dedicated to specific areas of research. These servers tend to specialise in certain types of content or offer particular technical features. For example, Preprints.org is a multidisciplinary preprint platform, while bioRxiv is a repository that only hosts preprints from the life sciences. Yet, at present, the vast majority of preprints do not appear in reference or search engines.
Run in collaboration with TH Köln – University of Applied Sciences, the PIXLS project aims to develop a system that will systematically access and index the information on preprint servers that has previously been neglected. It will also offer value-added services to make this information more easily accessible. This will improve the discoverability and reusability of preprint full texts and metadata. To achieve this goal, the PIXLS project team has created an ‘information extraction pipeline’, a dedicated application that extracts structured information from the unstructured data contained in preprints, such as the body text and numerical data. This structured information is easier for databases to process. Consequently, it is easier for researchers to find and can be reused in a greater variety of ways.
The team is currently compiling and consolidating the extracted data in the ZB MED Knowledge Environment database, which was developed by ZB MED in a previous project. Value-added services will then be developed to make this structured preprint data accessible to the wider research community. For example, these might include linked open-data interfaces or innovative reputation and trend indicators. The data will also be made available through LIVIVO. In accordance with the principles of open science, the PIXLS project team will also make the data and technology available for use by the library and research community.
ZB MED’s role in the project
- Implementation of the software solutions in ZB MED KE and in LIVIVO.
Duration
- 1 January 2023 – 31 December 2025
Funding bodies
- German Research Foundation – Scientific Library Services and Information Systems (DFG-LIS): e-Research Technologies programme
Partners
- TH Köln – University of Applied Sciences