{
ARCULAE
HOTPOTQA // METHODOLOGY // ARTIFACT
SETTINGS · METRICS · CHECKSUM
Public protocol record for the published HotPotQA benchmark disclosure.

HotPotQA Methodology Artifact Record

Scroll to inspect
Methodology Record

Published setting (explicit)

  • Dataset: hotpot_qa / distractor / validation
  • Scenario: kg_exploration_hotpotqa_distractor_v1
  • Granularity: sentence (supporting-facts setting)
  • Corpus scope: example (per-example closed-world context)
  • Slice: n_queries=50, seed=0
  • Run slice: See Evaluation Sheet (JSON)
  • Metrics file: hotpotqa_official_metrics.json (version v1)

Reported values

Metric groupMetricValue
Supporting facts (micro)F10.9130434783
Supporting facts (micro)Precision0.9292035398
Supporting facts (micro)Recall0.8974358974
Supporting facts (micro)EM0.7200000000
Supporting facts (macro)F10.9229523810
Supporting docs (micro)F11.0000000000
Answer (macro)EM0.7200000000
Answer (macro)F10.8666269841
Joint (macro)EM0.5200000000
Joint (macro)F10.8086674049 (80.87)
Joint (macro)F1 95% bootstrap CI[0.7293608588, 0.8755161213]

External benchmark context

Official HotPotQA benchmark homepage and leaderboard:

https://hotpotqa.github.io/

Back to Technology Benchmarks