{

ARCULAE

Marketplace

Home Discover About Knowledge Economy Contact

Solution

Technology Security Publishers Subscribers Pricing Closed Alpha

Account

HOTPOTQA // METHODOLOGY // ARTIFACT

SETTINGS · METRICS · CHECKSUM

Public protocol record for the published HotPotQA benchmark disclosure.

HotPotQA Methodology Artifact Record

Scroll to inspect

Methodology Record

001 — Protocol

Published setting (explicit)

Dataset: hotpot_qa / distractor / validation
Scenario: kg_exploration_hotpotqa_distractor_v1
Granularity: sentence (supporting-facts setting)
Corpus scope: example (per-example closed-world context)
Slice: n_queries=50, seed=0
Run slice: See Evaluation Sheet (JSON)
Metrics file: hotpotqa_official_metrics.json (version v1)

003 — Metrics

Reported values

Metric group	Metric	Value
Supporting facts (micro)	F1	0.9130434783
Supporting facts (micro)	Precision	0.9292035398
Supporting facts (micro)	Recall	0.8974358974
Supporting facts (micro)	EM	0.7200000000
Supporting facts (macro)	F1	0.9229523810
Supporting docs (micro)	F1	1.0000000000
Answer (macro)	EM	0.7200000000
Answer (macro)	F1	0.8666269841
Joint (macro)	EM	0.5200000000
Joint (macro)	F1	0.8086674049 (80.87)
Joint (macro)	F1 95% bootstrap CI	[0.7293608588, 0.8755161213]

005 — References

External benchmark context

Official HotPotQA benchmark homepage and leaderboard:

https://hotpotqa.github.io/

Back to Technology Benchmarks