Workshop: "Hands-On Transformers: From Model Fine-Tuning to Automated Label Generation"
News from Nov 26, 2025
Dear colleagues,
We are happy to invite you to the next instalment of the DiMES (Digital Methods and Empirical Social Science Center) workshop series.
On December 4th and 5th (Thursday and Friday), we are happy to host Christopher Klamm (University of Mannheim/ Cologne Center of Comparative Politics) for a two-day in-person workshop “Hands-On Transformers: From Model Fine-Tuning to Automated Label Generation” that introduces techniques for automated text-classification using LLMs and other state of the art techniques. The full announcement text can be found below.
The workshop is primarily open to the PhD students, postdocs, and faculty of FB PolSoz. In addition, please consider inviting advanced students with a specific interest in the topic.
The workshop is funded by DiMES and offered free of cost to participants. However, we do require reliable registration to plan the workshop. To register, please fill out this form by November 30th. We plan to offer up to 20 seats. If registrations exceed the limit, PolSoz PhD students and postdocs will be given preference, and the remainder of the seats will distributed on a first-come, first-serve basis.
If you have any questions about this workshop, please reach out to Christoph Nguyen at christoph.nguyen@fu-berlin.de
Hands-On Transformers: From Model Fine-Tuning to Automated Label Generation Christopher Klamm
This two-day workshop (with 10 hours) provides PhD students and postdoctoral researchers in the social sciences with practical training in fine-tuning transformer models for text classification tasks. Participants should have prior programming experience and foundational knowledge of machine learning concepts. The workshop dedicates approximately 75% of instructional time to encoder-based transformer architectures, with a primary focus on RoBERTa models and their application to social science research questions. Participants will gain hands-on experience using the HuggingFace Transformers library to fine-tune pre-trained models on their own text datasets, with model training conducted in Python and inference application examples provided in both Python and R. Through guided exercises, attendees will learn to prepare (their) datasets, configure training parameters, and evaluate model performance. The remaining 25% of the workshop introduces synthetic labeling techniques, demonstrating how decoder-based large language models (e.g., OpenAI or Google API) can be leveraged to generate training labels for unlabeled datasets. This component addresses a common challenge in social science research where obtaining manually labeled data is resource-intensive. Participants will see a practical introduction to this new field. By the end of the workshop, attendees will have both theoretical understanding and practical skills to independently apply transformer-based text classification to their own research projects, with the flexibility to integrate these models into either Python or R-based analysis pipelines.
Requirements:
- Programming proficiency: Solid working knowledge of R and/or Python, including experience with data manipulation. Familiarity with Python is beneficial but not required.
- Machine learning foundations: Understanding of basic machine learning concepts, including supervised learning, training/validation/test splits, and evaluation metrics (accuracy, precision, recall, F1-score).
- Text processing basics: Prior experience with text data preprocessing and basic natural language processing concepts.
- Accounts: Active Google account (for Google Colab) and HuggingFace account (free registration at huggingface.co).