The decentralized OORT AI data ranks among the top on Google Kaggle.

Source: Cointelegraph Original text: "Decentralized OORT AI data ranks among the top on Google Kaggle"

The artificial intelligence training image dataset developed by the decentralized AI solution provider OORT has achieved significant success on Google's Kaggle platform.

The "Diverse Tools Kaggle" dataset list from OORT was released in early April; since then, it has climbed to the homepage across multiple categories. Kaggle is an online platform under Google for data science and machine learning competitions, learning, and collaboration.

Ramkumar Subramaniam, a core contributor of the crypto AI project OpenLedger, told Cointelegraph, "The Kaggle homepage ranking is a strong social signal indicating that this dataset is attracting active participation from key communities such as data scientists, machine learning engineers, and practitioners."

Max Li, the founder and CEO of OORT, revealed to Cointelegraph that the company "has observed encouraging participation metrics, which validate that the training data collected through a decentralized model" indeed has early market demand and relevance." He added:

"Spontaneous interest from the community, including active usage and contributions—clearly demonstrates how decentralized, community-driven data pipelines like OORT can achieve rapid distribution and widespread participation without relying on centralized intermediaries."

Li also stated that the OORT plan will release multiple datasets in the coming months. These include in-car voice command datasets, smart home voice command datasets, and deepfake video datasets aimed at enhancing the media authenticity verification capabilities driven by AI.

Cointelegraph has independently verified that the aforementioned dataset successfully made it to the homepage of Kaggle's General AI, Retail and Shopping, Manufacturing, and Engineering categories earlier this month. As of the time of publication, the dataset has lost these ranking positions after a potentially unrelated update on May 6 and another update on May 14.

Although acknowledging this achievement, Subramaniam told Cointelegraph, "this is not a definitive indicator of practical application or enterprise-level quality." He pointed out that the uniqueness of the OORT dataset "lies not only in the ranking but also in the source channels and incentive mechanisms behind the dataset." He further explained:

"Unlike centralized suppliers that may rely on opaque processes, a transparent, token-incentivized system can provide traceability, community co-management, and the potential for continuous optimization, provided that a suitable governance structure is established."

Lex Sokolin, a partner at the artificial intelligence venture capital firm Generative Ventures, stated that while he believes these results are not difficult to replicate, "this indeed proves that crypto projects can leverage decentralized incentive mechanisms to organize activities of economic value."

Data released by the artificial intelligence research organization Epoch AI indicates that human-generated text AI training data is expected to be depleted by 2028. The pressure has become so great that investors are currently facilitating deals to secure the rights to use copyrighted materials for AI companies.

Research reports on the increasing scarcity of AI training data and how this may constrain the development of the field have been circulating for many years. Although synthetic ( AI-generated ) data is being used more widely and has achieved certain results, human-generated data is still generally regarded as the superior choice, as this high-quality data can cultivate better-performing AI models.

In the field of AI training images, the situation has become increasingly complex, as artists are deliberately sabotaging the training efforts. To protect their works from being used without authorization for AI training, the Nightshade tool enables creators to "poison" their images, severely impacting the performance of the models.

Subramaniam pointed out: "We are entering an era where high-quality image data is becoming increasingly scarce." He also emphasized that the widespread application of image poisoning techniques makes this challenge even more severe:

"With the rise of AI training poisoning methods such as image concealment techniques and adversarial watermarks, open-source datasets are facing dual challenges of quantity and credibility."

Regarding this situation, Subramaniam stated that verifiable and community-contributed incentive datasets "are more valuable than ever before." He believes that such projects "not only serve as alternatives but will also become important pillars of AI alignment and data provenance in the data economy."

Related news: Kima joins the Mastercard sandbox to enable stablecoin card top-ups.

View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments