1 min readMay 12, 2020
Hi Sajjad Dehqani, The preprocessing is done in the tokenizer step, encoding = tokenizer.encode_plus(question, context).
If you want you can print encoding[“input_ids”] or tokenizer.convert_ids_to_tokens[encoding[“input_ids”]] to see the preprocessed input tokens.