Conveners
Working with Datasets
- Nicholas Kluge
- Shiza Fatimah
Description
Participants are introduced to the fundamentals of how to work with text datasets (e.g., downloading, documenting, deduplication, filtering, and synthetic creation). Extra: participants will also receive an introduction to the basics of tokenization/text encoding.
