6 days ago ˇ In this paper, we investigate the dynamical properties of tokens in a pre-trained Mamba model. In particular, we derive the dynamical system governing the ...
Sep 27, 2024 ˇ We propose the task of grammar error explanation, where a system needs to provide one-sentence explanations for each grammatical error in a pair of erroneous ...
Sep 10, 2024 ˇ We propose a novel dynamic router mechanism for Mixture-of-Experts models, which can effectively leverage the Transformer attention mechanism to evaluate token ...
7 days ago ˇ Basically it's a candy boost, it can even give you up to 60k candies. I was about to finish the whole event in just 2 games of zombies.
Sep 26, 2024 ˇ At Toyota Technological Institute at Chicago (TTIC) and the University of Chicago, I worked on text generation and structured prediction with Prof. Kevin Gimpel ...
Sep 19, 2024 ˇ This token representation after the transformer is fed into an output layer ... The first layer employs a Gaussian error linear unit (GELU; Hendrycks & Gimpel ...
Sep 18, 2024 ˇ In order to encourage diversity of paraphrases, we display the Jaccard similarity between tokens in the original sentence and the paraphrase as workers typed.
Oct 1, 2024 ˇ The decoding-based strategies focus on modifying the output distribution at the token level to mitigate toxicity. Techniques such as Generative ...
Sep 20, 2024 ˇ They compress sequences by incorporating input elements one at a time, using only operations with respect to the sequence length to process each input token and ...
8 days ago ˇ We denote the i-th token as Ti and all preceding tokens as T<i. Model parameters are denoted by W. The generated token sequence is compared to the ground truth.