gemma_knowledg_tree
updated
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published
• 49
Measuring Massive Multitask Language Understanding
Paper
• 2009.03300
• Published
• 3
HellaSwag: Can a Machine Really Finish Your Sentence?
Paper
• 1905.07830
• Published
• 6
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper
• 1911.11641
• Published
• 5
SocialIQA: Commonsense Reasoning about Social Interactions
Paper
• 1904.09728
• Published
• 4
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Paper
• 1905.10044
• Published
• 2
On the Measure of Intelligence
Paper
• 1911.01547
• Published
• 5
Evaluating Large Language Models Trained on Code
Paper
• 2107.03374
• Published
• 8
Program Synthesis with Large Language Models
Paper
• 2108.07732
• Published
• 5
Training Verifiers to Solve Math Word Problems
Paper
• 2110.14168
• Published
• 6
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Paper
• 2304.06364
• Published
• 3
Beyond the Imitation Game: Quantifying and extrapolating the
capabilities of language models
Paper
• 2206.04615
• Published
• 5
BBQ: A Hand-Built Bias Benchmark for Question Answering
Paper
• 2110.08193
• Published
• 1
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
Models
Paper
• 2009.11462
• Published
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Paper
• 2109.07958
• Published
• 1
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and
Implicit Hate Speech Detection
Paper
• 2203.09509
• Published
• 2