AI & ML interests

Ontology + Concordance: The meeting of meaning

Recent Activity

dsinghvi  updated a dataset 2 days ago
ontocord/moral_education_permissive
dsinghvi  published a dataset 2 days ago
ontocord/moral_education_permissive
huu-ontocord  updated a dataset 2 days ago
ontocord/finephrase_permissive
View all activity

ajibawa-2023 
posted an update about 1 month ago
view post
Post
2119
Stitched-Reasoning-Trajectories-7M

Dataset: ajibawa-2023/Stitched-Reasoning-Trajectories-7M
Stitched-Reasoning-Trajectories-7M is a massive-scale, synthetic multi-hop reasoning dataset. It was built by algorithmically "stitching" together discrete reasoning traces from the original glaiveai/reasoning-v1-20m dataset into continuous, coherent, and logically structured multi-agent trajectories.

By extracting internal sub-questions from <think> blocks and mapping high-information keyword overlaps, this dataset transforms single-turn Q&A pairs into deep, multi-step research plans. To ensure high quality and eliminate "topic drift," every trajectory has been verified using a dense semantic embedding model (BAAI/bge-large-en-v1.5).

The resulting dataset consists of 709 .jsonl files containing over 7.2 million entirely deduplicated, highly coherent reasoning chains.