Close Menu
5gantennas.org5gantennas.org
  • Home
  • 5G
    • 5G Technology
  • 6G
  • AI
  • Data
    • Global 5G
  • Internet
  • WIFI
  • 5G Antennas
  • Legacy

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

4 Best Wi-Fi Mesh Networking Systems in 2024

September 6, 2024

India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

August 29, 2024

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
5gantennas.org5gantennas.org
  • Home
  • 5G
    1. 5G Technology
    2. View All

    Deutsche Telekom to operate 12,500 5G antennas over 3.6 GHz band

    August 28, 2024

    URCA Releases Draft “Roadmap” for 5G Rollout in the Bahamas – Eye Witness News

    August 23, 2024

    Smart Launches Smart ZTE Blade A75 5G » YugaTech

    August 22, 2024

    5G Drone Integration Denmark – DRONELIFE

    August 21, 2024

    Hughes praises successful private 5G demo for U.S. Navy

    August 29, 2024

    GSA survey reveals 5G FWA has become “mainstream”

    August 29, 2024

    China Mobile expands 5G Advanced, Chunghwa Telecom enters Europe

    August 29, 2024

    Ateme and ORS Boost 5G Broadcast Capacity with “World’s First Trial of IP-Based Statmux over 5G Broadcast” | TV Tech

    August 29, 2024
  • 6G

    India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

    August 29, 2024

    Vodafonewatch Weekly: Rural 4G, Industrial 5G, 6G Patents | Weekly Briefing

    August 29, 2024

    Southeast Asia steps up efforts to build 6G standards

    August 29, 2024

    Energy efficiency as an inherent attribute of 6G networks

    August 29, 2024

    Finnish working group launches push for 6G technology

    August 28, 2024
  • AI

    Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

    August 29, 2024

    Why Honeywell is betting big on Gen AI

    August 29, 2024

    Ethically questionable or creative genius? How artists are engaging with AI in their work | Art and Design

    August 29, 2024

    “Elon Musk and Trump” arrested for burglary in disturbing AI video

    August 29, 2024

    Nvidia CFO says ‘enterprise AI wave’ has begun and Fortune 100 companies are leading the way

    August 29, 2024
  • Data
    1. Global 5G
    2. View All

    Global 5G Enterprise Market is expected to be valued at USD 34.4 Billion by 2032

    August 12, 2024

    Counterpoint predicts 5G will dominate the smartphone market in early 2024

    August 5, 2024

    Qualcomm’s new chipsets will power affordable 5G smartphones

    July 31, 2024

    Best Super Fast Download Companies — TradingView

    July 31, 2024

    Crypto Markets Rise on Strong US Economic Data

    August 29, 2024

    Microsoft approves construction of third section of Mount Pleasant data center campus

    August 29, 2024

    China has invested $6.1 billion in state-run data center projects over two years, with the “East Data, West Computing” initiative aimed at capitalizing on the country’s untapped land.

    August 29, 2024

    What is the size of the clinical data analysis solutions market?

    August 29, 2024
  • Internet

    NATO believes Russia poses a threat to Western internet and GPS services

    August 29, 2024

    Mpeppe grows fast, building traction among Internet computer owners

    August 29, 2024

    Internet Computer Whale Buys Mpeppe (MPEPE) at 340x ROI

    August 29, 2024

    Long-term internet computer investor adds PEPE rival to holdings

    August 29, 2024

    Biden-Harris Administration Approves Initial Internet for All Proposals in Mississippi and South Dakota

    August 29, 2024
  • WIFI

    4 Best Wi-Fi Mesh Networking Systems in 2024

    September 6, 2024

    Best WiFi deal: Save $200 on the Starlink Standard Kit AX

    August 29, 2024

    Sonos Roam 2 review | Good Housekeeping UK

    August 29, 2024

    Popular WiFi extender that eliminates dead zones in your home costs just $12

    August 29, 2024

    North American WiFi 6 Mesh Router Market Size, Share, Forecast, [2030] – அக்னி செய்திகள்

    August 29, 2024
  • 5G Antennas

    Nokia and Claro bring 5G to Argentina

    August 27, 2024

    Nokia expands FWA portfolio with new 5G devices – SatNews

    July 25, 2024

    Deutsche Telekom to operate 12,150 5G antennas over 3.6 GHz band

    July 24, 2024

    Vodafone and Ericsson develop a compact 5G antenna in Germany

    July 12, 2024

    Vodafone and Ericsson unveil new small antennas to power Germany’s 5G network

    July 11, 2024
  • Legacy
5gantennas.org5gantennas.org
Home»Data»DatologyAI is building technology to automatically curate AI training datasets
Data

DatologyAI is building technology to automatically curate AI training datasets

5gantennas.orgBy 5gantennas.orgFebruary 22, 2024No Comments7 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


Abstract glowing grid and particles

Image credits: Piranka/Getty Images

Large training data sets are the gateway to powerful AI models, but they can also often break down those models.

Bias arises from patterns of bias hidden in large datasets, such as photos of mostly white CEOs in an image classification set. Big data sets can also be complex because they come in a format that is incomprehensible to models, one that contains a lot of noise and extraneous information.

A recent Deloitte survey of companies implementing AI found that 40% said data-related challenges, including thorough data preparation and cleaning, were one of the top concerns hindering their AI efforts. I answered. Another survey of data scientists found that approximately 45% of scientists’ time is spent on data preparation tasks such as “loading” and cleaning data.

Ari Morcos, who has worked in the AI ​​industry for nearly a decade, wanted to abstract away much of the data preparation process involved in training AI models, and he founded a startup to do just that.

Morcos’ company, DatologyAI, builds tools to automatically curate data sets like those used to train OpenAI’s ChatGPT, Google’s Gemini, and other GenAI models. According to Morcos, the platform provides ways to augment datasets with additional data, batch processing them, or splitting them into more manageable chunks, as well as ways to expand them depending on the application of the model (such as composing an email). to identify which data is most important. Model training in progress.

“The model is what they eat. The model reflects the data based on the training,” Morkos told TechCrunch in an email interview. “However, not all data is created equal, and some training data is much more useful than others. If you train a model in the right way, on the right data, It can have a dramatic impact on the resulting model.”

Mr. Morkos holds a Ph.D. in neuroscience from Harvard University, and spent two years at DeepMind applying neurology-inspired techniques to understand and improve his AI models. Also, during his five years in his AI lab at Meta, he uncovered some of the fundamental mechanisms underlying model functionality. Morcos, along with co-founders Matthew Leavitt and Bogdan Gaza, former head of engineering at Amazon and then Twitter, is a company that aims to streamline the curation of his AI datasets in all formats. We have launched DatologyAI.

As Morcos points out, the composition of a training dataset affects nearly every characteristic of the model trained on it, from the model’s performance on a task to its size and depth of domain knowledge. . More efficient data sets can reduce training time and produce smaller models to save computing costs, while data sets that include a particularly diverse range of samples can better handle difficult requests. (Generally speaking).

Interest in GenAI, which is notoriously expensive, is at an all-time high, and executives are putting AI deployment costs at the forefront of their minds.

Many companies are either fine-tuning existing models (including open source models) to suit their purposes, or opting for managed vendor services via APIs. However, for reasons such as governance and compliance, some companies choose to build models from scratch based on custom data and spend tens to millions of dollars in compute to train and run those models. spending.

“Enterprises are collecting a treasure trove of data and want to train efficient, high-performance specialized AI models that can maximize the benefits to their business,” Morkos said. “However, these huge datasets are extremely difficult to use effectively, and incorrect usage can result in poor model performance and increased training and training times. [are larger] more than necessary. ”

DatologyAI can scale up to “petabytes” of data in any format, including text, images, video, audio, tabular, or more “exotic” modalities such as genomic or geospatial, and can be delivered to customers’ infrastructure on-premises or over a network. Can be expanded into structures. virtual private cloud. This sets it apart from other data preparation and curation tools such as CleanLab, Lilac, Labelbox, YData, and Galileo, which tend to be more limited in the scope and type of data they can process, Morcos said. I am claiming.

DatologyAI determines which “concepts” in the data set are more complex (e.g., concepts related to U.S. history in the educational chatbot training set) and therefore require high-quality samples, and which data feeds into the model. You can also determine if there is any potential impact. act unintentionally.

“Resolving” [these problems] We need to automatically identify the concept, its complexity, and how much redundancy is actually needed,” Morkos said. “Data augmentation using other models or synthetic data is often very powerful, but it must be done in a careful and targeted manner. ”

The question is, how effective is DatologyAI’s technology? There are reasons to be skeptical. History has shown that automated data curation, no matter how sophisticated the methodology or the diversity of the data, does not always work as intended.

LAION, a German nonprofit that leads many GenAI projects, was forced to remove its algorithmically-curated AI training data set after it was found to contain images of child sexual abuse. Elsewhere, models such as ChatGPT, which are trained on a mixture of manually and automatically filtered datasets for harmfulness, have been shown to produce harmful content when given certain prompts.

Some experts would argue that there is no escaping manual curation, at least if you want to achieve good results with AI models. From AWS to Google to OpenAI, today’s largest vendors rely on teams of human experts and (sometimes underpaid) annotators to shape and refine their training data sets.

Morcos claims that DatologyAI’s tools are not intended to: Exchange Rather than completely manual curation, it provides suggestions that data scientists may not have thought of, especially those that are unrelated to the problem of trimming the size of training data sets. He is some authority. Cropping datasets while preserving model performance was the focus of his academic paper. Morcos co-authored the paper in 2022 with researchers from Stanford University and the University of Tübingen, which won the best paper award at that year’s NeurIPS machine learning conference.

“Identifying the right data at scale is extremely difficult and is a cutting-edge research challenge,” Morkos said. “[Our approach] This dramatically speeds up model training while improving the performance of downstream tasks. ”

DatologyAI’s technology is at the heart of modern AI, including Jeff Dean, Chief Scientist at Google, Yann LeCun, Chief AI Scientist at Meta, Adam D’Angelo, Quora Founder and OpenAI Board Member It is credited with developing some of the most important technologies.

Other angel investors in DatologyAI’s $11.65 million seed led by Amplify Partners with participation from Radical Ventures, Conviction Capital, Outset Capital and Quiet Capital are Cohere co-founders Aidan Gomez and Ivan Zhang , Douwe Kiela, founder of Contextual AI and formerly of Intel. His vice president of AI is Naveen Rao and his Jascha Sohl-Dickstein, one of the inventors of the generative diffusion model. This is an impressive list of AI luminaries, to say the least, and suggests there may be something to Molkos’ claims.

“A model is only as good as the data it is trained on, and identifying the right training data among billions or even trillions of samples is an incredibly difficult problem.” LeCun told TechCrunch in an emailed statement. “Ari and the team at DatologyAI are among the world’s experts on this issue, and the products they’re building to make high-quality data curation available to anyone who wants to train models. , which I believe is critical to helping AI function for everyone.”

San Francisco-based DatologyAI currently has 10 employees, including its co-founder, and plans to expand to about 25 by the end of the year if it reaches certain growth milestones.

I asked Morcos if these milestones were related to customer acquisition, but he declined to say. And, rather strangely, it did not reveal the size of DatologyAI’s current customer base.





Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleTelit Cinterion unveils latest 5G eMBB LGA module with Snapdragon X72 5G modem RF system at MWC Barcelona 2024
Next Article How I created an AI influencer as a side hustle to earn extra income
5gantennas.org
  • Website

Related Posts

Crypto Markets Rise on Strong US Economic Data

August 29, 2024

Microsoft approves construction of third section of Mount Pleasant data center campus

August 29, 2024

China has invested $6.1 billion in state-run data center projects over two years, with the “East Data, West Computing” initiative aimed at capitalizing on the country’s untapped land.

August 29, 2024
Leave A Reply Cancel Reply

You must be logged in to post a comment.

Latest Posts

4 Best Wi-Fi Mesh Networking Systems in 2024

September 6, 2024

India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

August 29, 2024

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024

Crypto Markets Rise on Strong US Economic Data

August 29, 2024
Don't Miss

Apple focuses on 6G for future iPhones

By 5gantennas.orgDecember 11, 2023

iPhone 15 Pro and Pro MaxWith Apple’s recent listing of cellular platform architects to work…

All connectivity technologies will be integrated in the 6G era, says Abhay Karandikar, DST Secretary, ET Telecom

January 31, 2024

5G-Advanced and 6G networks require additional spectrum

January 24, 2024

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to 5GAntennas.org, your reliable source for comprehensive information on 5G technology, artificial intelligence (AI), and data-related advancements. We are passionate about staying at the forefront of these cutting-edge fields and bringing you the latest insights, trends, and developments.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

4 Best Wi-Fi Mesh Networking Systems in 2024

September 6, 2024

India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

August 29, 2024

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024
Most Popular

Will 5G make 2024 the most connected year in the industry?

December 1, 2023

The current state of 5G in the US and how it can improve

September 28, 2023

How 5G technology will transform gaming on the go

January 31, 2024
© 2025 5gantennas. Designed by 5gantennas.
  • Home
  • About us
  • Contact us
  • DMCA
  • Privacy Policy
  • About Creator

Type above and press Enter to search. Press Esc to cancel.