| accumFreq | int | 1 | Update the model every specified number of steps. | 
| batchSize | int | 256 | Batch size per GPU. | 
| beta1 | float | 0.9 | Adam optimizer's beta1 parameter. | 
| beta2 | float | 0.98or0.999 | Adam optimizer's beta2 parameter.  If model name contains vit, thenbeta2default is0.98elsebeta2default is0.999. | 
| contextLength | int | 77 | Maximum number of tokens in the input text to train with. | 
| dynamicBatchSize | bool | True | Whether to use dynamic batch size. If True, the batch size will be adjusted to fit the GPU memory. | 
| epochs | int | 5 | Number of epochs for which to train | 
| epochsCooldown | int | "" | Perform cooldown epochs from the total epochs minus cooldown epochs onward. | 
| eps | float | 1.0e-6or1.0e-8 | Adam optimizer's epsilon value. If model name contains vit, thenepsdefault is1.0e-6elseepsdefault is1.0e-8. | 
| forceCustomText | bool | False | Force use of a custom text model. | 
| forcePatchDropout | float | "" | Override the patch dropout during training. | 
| forceQuickGELU | bool | False | Force the use of QuickGELU activation. | 
| frozenRight | bool | False | Whether to use sampling with replacement for right-side web dataset shard selection. | 
| gradCheckpointing | bool | True | Enable gradient checkpointing. | 
| gradClipNorm | float | "" | Gradient clipping norm. | 
| imageMean | list[float] | "" | Override default image mean values. | 
| imageStd | list[float] | "" | Override default image standard deviations. | 
| localLoss | bool | False | Calculate loss with local features. | 
| lockImage | bool | False | Lock the image tower by disabling gradients. | 
| lockImageFreezeBnStats | bool | False | Freeze BatchNorm running stats in locked image tower layers. | 
| lockImageUnlockedGroups | int | 0 | Leave the last n image tower groups unlocked. | 
| lockText | bool | False | Lock the text tower by disabling gradients. | 
| lockTextFreezeLayerNorm | bool | False | Freeze layer norm running stats in locked text tower layers. | 
| lockTextUnlockedLayers | int | 0 | Leave the last n text tower layers unlocked. | 
| logitBias | float | "" | Initialization of the logit bias. | 
| logitScale | float | "" | Initialization of the logit scale. | 
| lr | float | 5.0e-4 | Learning rate. | 
| lrCooldownPower | float | 1.0 | Power for the polynomial cooldown schedule. | 
| lrScheduler | str | cosine | Learning rate scheduler. One of cosine,const,const-cooldown | 
| precision | str | amp | Floating-point precision to use. One of amp,amp_bf16,amp_bfloat16,bf16,fp16,fp32. | 
| poolingMethod | str | None | Pooling method to use, default value is model dependent. One of mean,cls,max,cls_last_hidden_state. | 
| saveFrequency | int | 1 | How often to save checkpoints. | 
| seed | int | 0 | Random seed for training consistency. | 
| skipScheduler | bool | False | Skip the learning rate decay. | 
| warmup | int | 10000 | Number of steps for warmup. | 
| wd | float | 0.2 | Weight decay. | 
| weightedLoss | str | ce | Type of loss function used for weighted training. One of ceorsiglip. | 
| workers | int | 1 | Number of data loader workers per GPU. | 
| weightKey | str | marqtune__score | The column name in the dataset that contains the score. Defaults to marqtune__scorewhich is constant score value1for all rows. |