{"id":116411,"date":"2025-07-05T21:00:00","date_gmt":"2025-07-06T01:00:00","guid":{"rendered":"https:\/\/www.freethink.com\/?post_type=ftm_article&#038;p=116411"},"modified":"2025-07-03T13:27:33","modified_gmt":"2025-07-03T17:27:33","slug":"ai-datasets","status":"publish","type":"ftm_article","link":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets","title":{"rendered":"There are no new ideas in AI \u2014 only new datasets"},"content":{"rendered":"\n<p>Most people know that AI has made unbelievable progress over the last 15 years \u2014 especially in the last five. It might feel like that progress is\u00a0<em>inevitable<\/em>\u00a0\u2014 although large paradigm-shift-level breakthroughs are uncommon, we march on anyway through a stream of slow and steady progress. In fact, some researchers have recently declared a\u00a0&#8220;<a href=\"https:\/\/metr.org\/blog\/2025-03-19-measuring-ai-ability-to-complete-long-tasks\/\">Moore\u2019s Law for AI<\/a>,&#8221; where the computer\u2019s ability to do certain things (in this case, certain types of coding tasks) increases exponentially with time:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"900\" height=\"538\" src=\"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-1.jpg?quality=75&amp;w=900\" alt=\"Line graph showing AI task length increasing over time, doubling every 7 months, with model release dates from GPT-2 to Sonnet 3.7 between 2020 and 2026.\" class=\"wp-image-116412\" title=\"Length of asks AIs can do is doubling every 7 months\" srcset=\"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-1.jpg 900w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-1.jpg?resize=768,459 768w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-1.jpg?resize=320,191 320w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-1.jpg?resize=600,359 600w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-1.jpg?resize=330,197 330w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-1.jpg?resize=540,323 540w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-1.jpg?resize=850,508 850w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-1.jpg?resize=175,105 175w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-1.jpg?resize=275,164 275w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-1.jpg?resize=400,239 400w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-1.jpg?resize=360,215 360w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-1.jpg?resize=500,299 500w\" sizes=\"(max-width: 900px) 100vw, 900px\" \/><div class=\"img-caption\"><figcaption class=\"wp-element-caption\">METR<\/figcaption><div class=\"img-caption__description\">The proposed \u201cMoore\u2019s Law for AI.&#8221; (By the way, anyone who thinks they can run an autonomous agent for an hour with no intervention as of April 2025 is fooling themselves.)\n<\/div><\/div><\/figure>\n\n\n\n<p>Although I don\u2019t really agree with this specific framing for a number of reasons, I can\u2019t deny the trend of progress. Every year, our AIs get a little bit smarter, a little bit faster, and a little bit cheaper, with no end in sight.<\/p>\n\n\n\n<p>Most people think that this continuous improvement comes from a steady supply of ideas from the research community across academia \u2014 mostly MIT, Stanford, CMU \u2014 and industry \u2014 mostly Meta, Google, and a handful of Chinese labs, with lots of research done at other places that we\u2019ll never get to learn about.<\/p>\n\n\n\n<p>And we certainly have made a lot of progress due to research, especially on the systems side of things. This is how we\u2019ve made models cheaper, in particular. Let me cherry-pick a few notable examples from the last couple years:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>In 2022, Stanford researchers gave us&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2205.14135)\">FlashAttention<\/a>, a better way to utilize memory in language models that\u2019s used literally everywhere.<\/li>\n\n\n\n<li>In 2023, Google researchers developed&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2211.17192\">speculative decoding<\/a>, which all model providers use to speed up inference (also developed at&nbsp;<a href=\"https:\/\/arxiv.org\/pdf\/2302.01318\">DeepMind<\/a>, I believe concurrently?).<\/li>\n\n\n\n<li>In 2024, a ragtag group of internet fanatics developed&nbsp;<a href=\"https:\/\/kellerjordan.github.io\/posts\/muon\/\">Muon<\/a>, which seems to be a better optimizer than SGD or Adam and may end up as the way we train language models in the future.<\/li>\n\n\n\n<li>In 2025, DeepSeek released&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2501.12948\">DeepSeek-R1<\/a>, an open-source model that has equivalent reasoning power to similar closed-source models from AI labs (specifically Google and OpenAI).<\/li>\n<\/ul>\n\n\n\n<p>So, we\u2019re definitely figuring stuff out. And the reality is actually cooler than that: We\u2019re engaged in a decentralized globalized exercise of Science \u2014 where findings are shared openly on\u00a0<a href=\"https:\/\/arxiv.org\/\">arXiv<\/a>\u00a0and at conferences and on social media \u2014 and every month we\u2019re getting incrementally smarter.<\/p>\n\n\n\n<p>If we\u2019re doing so much important research, why do some argue that progress is slowing down?&nbsp;<a href=\"https:\/\/www.lesswrong.com\/posts\/4mvphwx5pdsZLMmpY\/recent-ai-model-progress-feels-mostly-like-bullshit\">People are still complaining<\/a>. The two most recent huge models,&nbsp;<a href=\"https:\/\/x.ai\/news\/grok-3\">Grok 3<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/openai.com\/index\/introducing-gpt-4-5\/\">GPT-4.5<\/a>, only obtained a marginal improvement on capabilities of their predecessors. In one particularly salient example, when&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2503.21934v1\">language models were evaluated on the latest math olympiad exam<\/a>, they scored only 5%, indicating that&nbsp;<a href=\"https:\/\/cdn.openai.com\/o1-system-card-20241205.pdf\">recent announcements may have been overblown<\/a>&nbsp;when reporting system ability.<\/p>\n\n\n\n<p>And if we try to chronicle the&nbsp;<em>big<\/em>&nbsp;breakthroughs, the real paradigm shifts, they seem to be happening at a different rate. Let me go through a few that come to mind.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-llms-in-four-breakthroughs\">LLMs in four breakthroughs<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Deep neural networks:<\/strong> Deep neural networks first took off after the&nbsp;<a href=\"https:\/\/www.notion.so\/There-Are-No-New-Ideas-in-AI-Only-New-Data-1cf5109a45d880e6b0d5d6e3a4ba2fdc?pvs=21\">AlexNet model<\/a>&nbsp;won an image recognition competition in 2012.<\/li>\n\n\n\n<li><strong>Transformers + LLMs:<\/strong> In 2017, Google proposed transformers in&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/1706.03762\">Attention Is All You Need<\/a>, which led to&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/1810.04805\">BERT<\/a>&nbsp;(Google, 2018) and the original&nbsp;<a href=\"https:\/\/cdn.openai.com\/research-covers\/language-unsupervised\/language_understanding_paper.pdf\">GPT<\/a>&nbsp;(OpenAI, 2018).<\/li>\n\n\n\n<li><strong>RLHF:<\/strong> This was first proposed (to my knowledge) in the&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2203.02155\">InstructGPT paper<\/a>&nbsp;from OpenAI in 2022.<\/li>\n\n\n\n<li><strong>Reasoning:<\/strong> In 2024, OpenAI released O1, which led to DeepSeek R1.<\/li>\n<\/ol>\n\n\n\n<p>If you squint just a little, these four things (DNNs \u2192 Transformer LMs \u2192 RLHF \u2192 Reasoning) summarize everything that\u2019s happened in AI. We had DNNs (mostly image recognition systems), then we had text classifiers, then we had chatbots, and now we have reasoning models (whatever those are).<\/p>\n\n\n\n<p>Say we want to make a fifth such breakthrough; it could help to study the four cases we have here. What new research ideas led to these groundbreaking events?<\/p>\n\n\n\n<p>It\u2019s not crazy to argue that&nbsp;<strong>all the underlying mechanisms of these breakthroughs existed in the 1990s,<\/strong>&nbsp;if not before. We\u2019re applying relatively simple neural network architectures and doing either supervised learning (1 and 2) or reinforcement learning (3 and 4).<\/p>\n\n\n\n<p>Supervised learning via cross-entropy, the main way we pre-train language models, emerged from Claude Shannon\u2019s work in the 1940s.<\/p>\n\n\n\n<p>Reinforcement learning, the main way we post-train language models via RLHF and reasoning training, is slightly newer. It can be traced to the&nbsp;<a href=\"https:\/\/people.cs.umass.edu\/~barto\/courses\/cs687\/williams92simple.pdf\">introduction of policy-gradient methods in 1992<\/a>&nbsp;(and these ideas were certainly around for the first edition of the Sutton &amp; Barto \u201cReinforcement Learning\u201d textbook in 1998).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-if-our-ideas-aren-t-new-then-what-is\">If our ideas aren\u2019t new, then what is?<\/h3>\n\n\n\n<p>OK, let\u2019s agree for now that these &#8220;major breakthroughs&#8221; were arguably fresh applications of things that we\u2019d known for a while. First of all, this tells us something about the\u00a0<em>next<\/em>\u00a0major breakthrough (that &#8220;secret fifth thing&#8221; I mentioned above): Our breakthrough is probably not going to come from a completely new idea \u2014 rather, it\u2019ll be the resurfacing of something we\u2019ve known for a while.<\/p>\n\n\n\n<p>But there\u2019s a missing piece here: Each of these four breakthroughs&nbsp;<strong>enabled us to learn from a new data source:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>AlexNet and its follow-ups unlocked&nbsp;<a href=\"http:\/\/(https\/\/www.image-net.org\/\">ImageNet<\/a>, a large database of class-labeled images that drove 15 years of progress in computer vision.<\/li>\n\n\n\n<li>Transformers unlocked training on \u201cThe Internet\u201d and a race to download, categorize, and parse all the text on&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2101.00027\">the web<\/a>&nbsp;(which&nbsp;<a href=\"https:\/\/www.lesswrong.com\/posts\/6Fpvch8RR29qLEWNH\/chinchilla-s-wild-implications\">it seems<\/a>&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2305.16264\">we\u2019ve mostly done<\/a>&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2305.13230\">by now<\/a>).<\/li>\n\n\n\n<li>RLHF allowed us to learn from human labels indicating what \u201cgood text\u201d is (mostly a vibes thing).<\/li>\n\n\n\n<li>Reasoning seems to let us learn from\u00a0&#8220;<a href=\"http:\/\/incompleteideas.net\/IncIdeas\/KeytoAI.html\">verifiers<\/a>,&#8221; things like calculators and compilers that can evaluate the outputs of language models.<\/li>\n<\/ol>\n\n\n\n<p>Remind yourself that each of these milestones marks the first time the respective data source (ImageNet, the web, humans, verifiers) was used at scale. Each milestone was followed by a frenzy of activity: Researchers compete to (a) siphon up the remaining useful data from any and all available sources and (b) make better use of the data we have through new tricks to make our systems more efficient and less data-hungry. (I expect we\u2019ll see this trend in reasoning models throughout 2025 and 2026 as researchers compete to find, categorize, and verify everything that might be verified.)<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"900\" height=\"360\" src=\"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-2.jpg?quality=75&amp;w=900\" alt=\"Collage of many small photos forming the word &quot;IMAGENET&quot; in large block letters, representing a large image dataset.\" class=\"wp-image-116413\" title=\"How to train and validate on Imagenet\" srcset=\"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-2.jpg 900w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-2.jpg?resize=768,307 768w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-2.jpg?resize=320,128 320w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-2.jpg?resize=600,240 600w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-2.jpg?resize=330,132 330w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-2.jpg?resize=540,216 540w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-2.jpg?resize=850,340 850w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-2.jpg?resize=175,70 175w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-2.jpg?resize=275,110 275w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-2.jpg?resize=400,160 400w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-2.jpg?resize=360,144 360w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/AI-datasets-2.jpg?resize=500,200 500w\" sizes=\"(max-width: 900px) 100vw, 900px\" \/><div class=\"img-caption\"><figcaption class=\"wp-element-caption\">Stanford Vision Lab<\/figcaption><div class=\"img-caption__description\">Progress in AI may have been inevitable once we gathered ImageNet, at the time the largest public collection of images from the web.\n<\/div><\/div><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-much-do-new-ideas-matter\">How much do new ideas matter?<\/h3>\n\n\n\n<p>There\u2019s something to be said for the fact that our actual technical innovations may not make a huge difference in these cases. Examine the counterfactual. If we hadn\u2019t invented AlexNet, maybe another architecture would have come along that could handle ImageNet. If we never discovered transformers, perhaps we would\u2019ve settled with LSTMs or SSMs or found something else entirely to learn from the mass of useful training data we have available on the web.<\/p>\n\n\n\n<p>This jibes with the theory some people have that nothing matters but data. Some researchers have observed that, for all the training techniques, modeling tricks, and hyperparameter tweaks we make, the thing that makes the biggest difference by-and-large is changing the data.<\/p>\n\n\n\n<p>As one salient example, some researchers worked on&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2212.10544\">developing a new BERT-like model using an architecture other than transformers<\/a>. They spent a year or so tweaking the architecture in hundreds of different ways and managed to produce a different type of model (this is a state-space model or \u201cSSM\u201d) that performed about equivalently to the original transformer when trained on the same data.<\/p>\n\n\n\n<p>This discovered equivalence is really profound because it hints that&nbsp;<em>there is an upper bound to what we might learn from a given dataset<\/em>. All the training tricks and model upgrades in the world won\u2019t get around the cold hard fact that there is only so much you can learn from a given dataset.<\/p>\n\n\n\n<p>And maybe this apathy to new ideas is what we were supposed to take away from&nbsp;<a href=\"http:\/\/www.incompleteideas.net\/IncIdeas\/BitterLesson.html\">The Bitter Lesson<\/a>. If data is the only thing that matters, why are 95% of people working on new methods?<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-where-will-our-next-paradigm-shift-come-from-nbsp-youtube-maybe\">Where will our next paradigm shift come from?&nbsp;(YouTube&#8230;maybe?)<\/h3>\n\n\n\n<p>The obvious takeaway is that our next paradigm shift isn\u2019t going to come from an improvement to RL or a fancy new type of neural net. It\u2019s going to come when we unlock a source of data that we haven\u2019t accessed before or haven\u2019t properly harnessed yet.<\/p>\n\n\n\n<p>One obvious source of information that a lot of people are working towards harnessing is video. According to&nbsp;<a href=\"https:\/\/www.dexerto.com\/entertainment\/how-many-videos-are-there-on-youtube-2197264\/\">a random site on the web<\/a>, about 500 hours of video footage are uploaded to YouTube <em>per minute<\/em>. This is a ridiculous amount of data, much more than is available as text on the entire internet. It\u2019s potentially a much richer source of information too, as videos contain not just words, but the inflection behind them, as well as rich information about physics and culture that just can\u2019t be gleaned from text.<\/p>\n\n\n\n<p>It\u2019s safe to say that as soon as our models get efficient enough or our computers grow beefy enough, Google is going to start training models on YouTube. They own the thing, after all; it would be silly not to use the data to their advantage.<\/p>\n\n\n\n<p>A final contender for the next \u201cbig paradigm\u201d in AI is a data-gathering systems that is some way&nbsp;<em>embodied<\/em> \u2014 or, in the words of a regular person, robots. We\u2019re currently not able to gather and process information from cameras and sensors in a way that\u2019s amenable to training large models on GPUs. If we could build smarter sensors or scale our computers up until they can handle the massive influx of data from a robot with ease, we might be able to use this data in a beneficial way.<\/p>\n\n\n\n<p>It\u2019s hard to say whether YouTube or robots or something else will be the Next Big Thing for AI. We seem pretty deeply entrenched in the camp of language models right now, but we also seem to be running out of language data pretty quickly. But if we want to make progress in AI, maybe we should stop looking for new ideas and start looking for new data.<\/p>\n\n\n\n<p><em>This&nbsp;<a href=\"https:\/\/blog.jxmo.io\/p\/there-are-no-new-ideas-in-ai-only\" target=\"_blank\" rel=\"noreferrer noopener\">article<\/a>&nbsp;was originally published on <a href=\"https:\/\/blog.jxmo.io\/\">Substack<\/a>. It is reprinted with permission of the author.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Our next AI breakthrough will come when we unlock a source of data we\u2019ve either overlooked or never fully harnessed.<\/p>\n","protected":false},"author":25,"featured_media":116430,"template":"","ftm_taxonomy_fields":[46,2202],"ftm_taxonomy_challenges":[],"ftm_taxonomy_statuses":[],"ftm_taxonomy_hidden_tags":[],"class_list":["post-116411","ftm_article","type-ftm_article","status-publish","has-post-thumbnail","hentry","ftm_taxonomy_fields-ai","ftm_taxonomy_fields-opinion"],"acf":[],"apple_news_notices":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v26.9) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>There are no new ideas in AI \u2014 only new datasets<\/title>\n<meta name=\"description\" content=\"Our next AI breakthrough will come when we unlock a dataset we\u2019ve either overlooked or never fully harnessed.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"There are no new ideas in AI \u2014 only new datasets\" \/>\n<meta property=\"og:description\" content=\"Our next AI breakthrough will come when we unlock a dataset we\u2019ve either overlooked or never fully harnessed.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets\" \/>\n<meta property=\"og:site_name\" content=\"Freethink\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/new-datasets-thumb.jpg?resize=1200,630\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:description\" content=\"All four big\u00a0breakthroughs in LLMs happened because we unlocked a new source of data. What will be the next one?\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"8 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"There are no new ideas in AI \u2014 only new datasets","description":"Our next AI breakthrough will come when we unlock a dataset we\u2019ve either overlooked or never fully harnessed.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets","og_locale":"en_US","og_type":"article","og_title":"There are no new ideas in AI \u2014 only new datasets","og_description":"Our next AI breakthrough will come when we unlock a dataset we\u2019ve either overlooked or never fully harnessed.","og_url":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets","og_site_name":"Freethink","og_image":[{"width":1200,"height":630,"url":"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/new-datasets-thumb.jpg?resize=1200,630","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_description":"All four big\u00a0breakthroughs in LLMs happened because we unlocked a new source of data. What will be the next one?","twitter_misc":{"Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets#article","isPartOf":{"@id":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets"},"author":{"name":"kristinhouser","@id":"https:\/\/www.freethink.com\/#\/schema\/person\/e45bf79276f6c14454ee4e1dfa7aca8c"},"headline":"There are no new ideas in AI \u2014 only new datasets","datePublished":"2025-07-06T01:00:00+00:00","mainEntityOfPage":{"@id":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets"},"wordCount":1706,"publisher":{"@id":"https:\/\/www.freethink.com\/#organization"},"image":{"@id":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets#primaryimage"},"thumbnailUrl":"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/new-datasets-thumb.jpg?quality=75","inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets","url":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets","name":"There are no new ideas in AI \u2014 only new datasets","isPartOf":{"@id":"https:\/\/www.freethink.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets#primaryimage"},"image":{"@id":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets#primaryimage"},"thumbnailUrl":"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/new-datasets-thumb.jpg?quality=75","datePublished":"2025-07-06T01:00:00+00:00","description":"Our next AI breakthrough will come when we unlock a dataset we\u2019ve either overlooked or never fully harnessed.","breadcrumb":{"@id":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets#primaryimage","url":"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/new-datasets-thumb.jpg?quality=75","contentUrl":"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/07\/new-datasets-thumb.jpg?quality=75","width":1600,"height":900,"caption":"Akarat Phasura \/ Adobe Stock"},{"@type":"BreadcrumbList","@id":"https:\/\/www.freethink.com\/artificial-intelligence\/ai-datasets#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Articles","item":"https:\/\/www.freethink.com\/articles"},{"@type":"ListItem","position":2,"name":"There are no new ideas in AI \u2014 only new datasets"}]},{"@type":"WebSite","@id":"https:\/\/www.freethink.com\/#website","url":"https:\/\/www.freethink.com\/","name":"Freethink","description":"Move the world","publisher":{"@id":"https:\/\/www.freethink.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.freethink.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.freethink.com\/#organization","name":"Freethink Media","url":"https:\/\/www.freethink.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.freethink.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.freethink.com\/wp-content\/uploads\/2021\/06\/logo.svg","contentUrl":"https:\/\/www.freethink.com\/wp-content\/uploads\/2021\/06\/logo.svg","width":651,"height":124,"caption":"Freethink Media"},"image":{"@id":"https:\/\/www.freethink.com\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.freethink.com\/#\/schema\/person\/e45bf79276f6c14454ee4e1dfa7aca8c","name":"kristinhouser","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.freethink.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/ff88759e0ed195de655c7703310050f17b921ae4fc276d7eb5930cddafa694f9?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ff88759e0ed195de655c7703310050f17b921ae4fc276d7eb5930cddafa694f9?s=96&d=mm&r=g","caption":"kristinhouser"},"url":"https:\/\/www.freethink.com\/author\/kristinhouser"}]}},"jetpack_sharing_enabled":true,"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/ftm_article\/116411","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/ftm_article"}],"about":[{"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/types\/ftm_article"}],"author":[{"embeddable":true,"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/users\/25"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/media\/116430"}],"wp:attachment":[{"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/media?parent=116411"}],"wp:term":[{"taxonomy":"ftm_taxonomy_fields","embeddable":true,"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/ftm_taxonomy_fields?post=116411"},{"taxonomy":"ftm_taxonomy_challenges","embeddable":true,"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/ftm_taxonomy_challenges?post=116411"},{"taxonomy":"ftm_taxonomy_statuses","embeddable":true,"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/ftm_taxonomy_statuses?post=116411"},{"taxonomy":"ftm_taxonomy_hidden_tags","embeddable":true,"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/ftm_taxonomy_hidden_tags?post=116411"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}