For the consulting project requested by the European Food Safety Authority (EFSA), we explored how state-of-the-art NLP techniques can be used to support frequently occurring tasks within EFSA.
For the consulting project requested by the European Food Safety Authority (EFSA), we explored how state-of-the-art NLP techniques can be used to support frequently occurring tasks within EFSA. The project covered four different case studies, including tasks such as predicting categories for scientific publications and discovering latent topics related to food adulteration in a large corpus of collected news articles. For these case studies, we implemented and compared different text classification models and topic models. The former models predict predefined labels for a given text document, whereas the latter models try to create structure in a large corpus of text documents by grouping related documents and providing an adequate description for each such group. For each of these models, we compared baseline approaches with state-of-the-art approaches leveraging text embeddings (e.g. Doc2Vec and various BERT models). We implemented hyperparameter tuning to further optimize the trained models. These models are included in an NLP pipeline implemented in the cloud to facilitate training new models, and infrastructure-as-code templates are provided to automate deployment of these pipelines in new cloud environments. For each case study, we provided an interactive proof-of-concept web application visualizing the results to facilitate model exploration, targeted at subject-matter specialists without a technical background. The outcome of the project showed promising results, enabling further acceleration of the work for subject-matter specialists.
Overall, this project provided multiple exciting challenges. In particular, the larger scope of the project allowed us to combine different challenging aspects (i.e., state-of-the-art NLP machine learning techniques, cloud infrastructure and interactive web-based dashboard applications) into a single product, enabling audiences without a technical background to apply models and explore the results more directly. Furthermore, this project gave us the opportunity to apply, study and compare different NLP techniques on real world case studies and datasets, thereby providing useful new insights and potential ideas for future research directions.