Hi Sajjad Dehqani, The preprocessing is done in the tokenizer step, encoding = tokenizer.encode_plus

Thanks for the stunning post, I have a question in my mind.
1
Sajjad Dehghani
Ramsri Goutham
·Follow
1 min read·
May 12, 2020
--
Hi Sajjad Dehqani, The preprocessing is done in the tokenizer step, encoding = tokenizer.encode_plus(question, context).
If you want you can print encoding[“input_ids”] or tokenizer.convert_ids_to_tokens[encoding[“input_ids”]] to see the preprocessed input tokens.
--
--
Written by Ramsri Goutham1.8K Followers
·137 Following
Teaches NLP courses at https://www.learnnlp.academy/ ◆ Building AI SaaS Apps: https://questgen.ai/ and https://supermeme.ai/
No responses yet
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams