Parameters . vocab_size (int, optional, defaults to 30522) — Vocabulary size of the RoBERTa model.Defines the number of different tokens that can be represented by the inputs_ids passed when calling RobertaModel or TFRobertaModel. hidden_size (int, optional, defaults to 768) — Dimensionality of the encoder layers and the pooler layer.; num_hidden_layers (int, optional, defaults to 12 ...
The Trainer class is optimized for 🤗 Transformers models and can have surprising behaviors when you use it on other models. When using it on your own model, make sure: your model always return tuples or subclasses of ModelOutput.; your model can compute the loss if a labels argument is provided and that loss is returned as the first element of the tuple (if your model returns tuples)