This guide details various read hooks you can use to convert different types of raw input data into HDF5 format for machine learning tasks on Cerebras Systems.
Read
read_hook
before converting the data into a SemanticDataArray
. Tokenize
Write
TokenFlow
. This structured approach ensures efficient handling and tokenization of input data for downstream tasks.read_hook_kwargs
property must have data keys with the suffix _key
to segregate these from other parameters for read_hook_kwargs
. These keys will be exclusively used to read data from the input while other parameters which are not data keys will be used to create semantic data array from the read hooks.
prompt
and completion
semantic regions. The tokenizer’s sep_token
attribute is used as a separator token if present; else we use <|sep|>
.