AI & ML interests

Ontology + Concordance: The meeting of meaning

Recent Activity

dsinghvi updated a dataset 2 days ago

ontocord/moral_education_permissive

dsinghvi published a dataset 2 days ago

ontocord/moral_education_permissive

huu-ontocord updated a dataset 2 days ago

ontocord/finephrase_permissive

View all activity

Papers

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

View all Papers

dsinghvi

updated a dataset 2 days ago

ontocord/moral_education_permissive

Viewer • Updated 2 days ago • 72.9k • 46

dsinghvi

published a dataset 2 days ago

ontocord/moral_education_permissive

Viewer • Updated 2 days ago • 72.9k • 46

huu-ontocord

updated a dataset 2 days ago

ontocord/finephrase_permissive

Viewer • Updated 2 days ago • 58.8M • 24

huu-ontocord

updated a dataset 5 days ago

ontocord/research-plans-analysis

Viewer • Updated 5 days ago • 27k • 78 • 1

huu-ontocord

published a dataset 8 days ago

ontocord/research-plans-analysis

Viewer • Updated 5 days ago • 27k • 78 • 1

huu-ontocord

updated a dataset 11 days ago

ontocord/ORB-Peer-Review

Viewer • Updated 11 days ago • 37.9k • 32

huu-ontocord

published a dataset 11 days ago

ontocord/ORB-Peer-Review

Viewer • Updated 11 days ago • 37.9k • 32

Harsh1729

updated 2 datasets 14 days ago

ontocord/MixtureVitae-v1.5

Viewer • Updated 14 days ago • 28.3M • 4.1k

ontocord/MixtureVitae-v1-upsampled

Viewer • Updated 14 days ago • 484M • 4.31k • 1

huu-ontocord

updated a dataset 16 days ago

ontocord/synthetic-research-plans-prompts

Preview • Updated 16 days ago • 41

huu-ontocord

published a dataset 16 days ago

ontocord/synthetic-research-plans-prompts

Preview • Updated 16 days ago • 41

huu-ontocord

updated a dataset 16 days ago

ontocord/MixtureVitae-v1.5

Viewer • Updated 14 days ago • 28.3M • 4.1k

huu-ontocord

published a dataset 16 days ago

ontocord/MixtureVitae-v1.5

Viewer • Updated 14 days ago • 28.3M • 4.1k

Harsh1729

updated a dataset 18 days ago

ontocord/MixtureVitae-v2

Viewer • Updated 18 days ago • 1k • 97

Harsh1729

published a dataset 18 days ago

ontocord/MixtureVitae-v2

Viewer • Updated 18 days ago • 1k • 97

huu-ontocord

updated a dataset 18 days ago

ontocord/MixtureVitae-v1-upsampled

Viewer • Updated 14 days ago • 484M • 4.31k • 1

Harsh1729

published a dataset 28 days ago

ontocord/MixtureVitae-v1-upsampled

Viewer • Updated 14 days ago • 484M • 4.31k • 1

ajibawa-2023

posted an update about 1 month ago

Post

2119

Stitched-Reasoning-Trajectories-7M

Dataset: ajibawa-2023/Stitched-Reasoning-Trajectories-7M
Stitched-Reasoning-Trajectories-7M is a massive-scale, synthetic multi-hop reasoning dataset. It was built by algorithmically "stitching" together discrete reasoning traces from the original glaiveai/reasoning-v1-20m dataset into continuous, coherent, and logically structured multi-agent trajectories.

By extracting internal sub-questions from <think> blocks and mapping high-information keyword overlaps, this dataset transforms single-turn Q&A pairs into deep, multi-step research plans. To ensure high quality and eliminate "topic drift," every trajectory has been verified using a dense semantic embedding model (BAAI/bge-large-en-v1.5).

The resulting dataset consists of 709 .jsonl files containing over 7.2 million entirely deduplicated, highly coherent reasoning chains.

huu-ontocord

updated a dataset about 1 month ago

ontocord/synthetic-prompt-common-pile-annotated

Viewer • Updated May 5 • 202k • 20

huu-ontocord

published a dataset about 1 month ago

ontocord/synthetic-prompt-common-pile-annotated

Viewer • Updated May 5 • 202k • 20

AI & ML interests

Recent Activity

Papers

Team members 23

ontocord's activity