Close Menu
5gantennas.org5gantennas.org
  • Home
  • 5G
    • 5G Technology
  • 6G
  • AI
  • Data
    • Global 5G
  • Internet
  • WIFI
  • 5G Antennas
  • Legacy

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

4 Best Wi-Fi Mesh Networking Systems in 2024

September 6, 2024

India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

August 29, 2024

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
5gantennas.org5gantennas.org
  • Home
  • 5G
    1. 5G Technology
    2. View All

    Deutsche Telekom to operate 12,500 5G antennas over 3.6 GHz band

    August 28, 2024

    URCA Releases Draft “Roadmap” for 5G Rollout in the Bahamas – Eye Witness News

    August 23, 2024

    Smart Launches Smart ZTE Blade A75 5G » YugaTech

    August 22, 2024

    5G Drone Integration Denmark – DRONELIFE

    August 21, 2024

    Hughes praises successful private 5G demo for U.S. Navy

    August 29, 2024

    GSA survey reveals 5G FWA has become “mainstream”

    August 29, 2024

    China Mobile expands 5G Advanced, Chunghwa Telecom enters Europe

    August 29, 2024

    Ateme and ORS Boost 5G Broadcast Capacity with “World’s First Trial of IP-Based Statmux over 5G Broadcast” | TV Tech

    August 29, 2024
  • 6G

    India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

    August 29, 2024

    Vodafonewatch Weekly: Rural 4G, Industrial 5G, 6G Patents | Weekly Briefing

    August 29, 2024

    Southeast Asia steps up efforts to build 6G standards

    August 29, 2024

    Energy efficiency as an inherent attribute of 6G networks

    August 29, 2024

    Finnish working group launches push for 6G technology

    August 28, 2024
  • AI

    Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

    August 29, 2024

    Why Honeywell is betting big on Gen AI

    August 29, 2024

    Ethically questionable or creative genius? How artists are engaging with AI in their work | Art and Design

    August 29, 2024

    “Elon Musk and Trump” arrested for burglary in disturbing AI video

    August 29, 2024

    Nvidia CFO says ‘enterprise AI wave’ has begun and Fortune 100 companies are leading the way

    August 29, 2024
  • Data
    1. Global 5G
    2. View All

    Global 5G Enterprise Market is expected to be valued at USD 34.4 Billion by 2032

    August 12, 2024

    Counterpoint predicts 5G will dominate the smartphone market in early 2024

    August 5, 2024

    Qualcomm’s new chipsets will power affordable 5G smartphones

    July 31, 2024

    Best Super Fast Download Companies — TradingView

    July 31, 2024

    Crypto Markets Rise on Strong US Economic Data

    August 29, 2024

    Microsoft approves construction of third section of Mount Pleasant data center campus

    August 29, 2024

    China has invested $6.1 billion in state-run data center projects over two years, with the “East Data, West Computing” initiative aimed at capitalizing on the country’s untapped land.

    August 29, 2024

    What is the size of the clinical data analysis solutions market?

    August 29, 2024
  • Internet

    NATO believes Russia poses a threat to Western internet and GPS services

    August 29, 2024

    Mpeppe grows fast, building traction among Internet computer owners

    August 29, 2024

    Internet Computer Whale Buys Mpeppe (MPEPE) at 340x ROI

    August 29, 2024

    Long-term internet computer investor adds PEPE rival to holdings

    August 29, 2024

    Biden-Harris Administration Approves Initial Internet for All Proposals in Mississippi and South Dakota

    August 29, 2024
  • WIFI

    4 Best Wi-Fi Mesh Networking Systems in 2024

    September 6, 2024

    Best WiFi deal: Save $200 on the Starlink Standard Kit AX

    August 29, 2024

    Sonos Roam 2 review | Good Housekeeping UK

    August 29, 2024

    Popular WiFi extender that eliminates dead zones in your home costs just $12

    August 29, 2024

    North American WiFi 6 Mesh Router Market Size, Share, Forecast, [2030] – அக்னி செய்திகள்

    August 29, 2024
  • 5G Antennas

    Nokia and Claro bring 5G to Argentina

    August 27, 2024

    Nokia expands FWA portfolio with new 5G devices – SatNews

    July 25, 2024

    Deutsche Telekom to operate 12,150 5G antennas over 3.6 GHz band

    July 24, 2024

    Vodafone and Ericsson develop a compact 5G antenna in Germany

    July 12, 2024

    Vodafone and Ericsson unveil new small antennas to power Germany’s 5G network

    July 11, 2024
  • Legacy
5gantennas.org5gantennas.org
Home»AI»Meta quietly releases new web scraper for collecting AI data
AI

Meta quietly releases new web scraper for collecting AI data

5gantennas.orgBy 5gantennas.orgAugust 20, 2024No Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


Meta has quietly released a new web crawler that scours the internet for data. All at once To inform AI models.

The crawler, called Meta External Agent, was released last month, according to three companies that track web scrapers and bots on the web. The automated bot essentially copies, or “scrapes,” all the publicly available data from websites, such as the text of news articles or conversations in online discussion groups.

A representative from Dark Visitors, which provides tools to website owners to automatically block known scraper bots, said Meta External Agent is similar to OpenAI’s GPTBot, which scrapes the web for AI training data. Two other organizations involved in tracking web scrapers confirmed the existence of the bot and that it is being used to collect AI training data.

Meta, the parent company of Facebook, Instagram, and Whatsapp, updated its corporate website for developers in late July to include a tab disclosing the existence of the new scraper, according to a version history discovered using the Internet Archive. Aside from updating the page, Meta has not publicly announced the new crawler.

A Meta spokesperson said the company had been using the crawler under a different name for “years,” but that the crawler, called Facebook External Hit, “has been used for a variety of purposes over time, including sharing link previews.”

“Like other companies, we train our generative AI models on content publicly available online,” the spokesperson said. “We recently updated our guidance on how publishers can best exclude their domains from being crawled by Meta’s AI-related crawlers.”

Scraping web data to train AI models is a controversial practice that has led to numerous lawsuits by artists, writers, and others who claim that AI companies have used their content and intellectual property without their consent. Some AI companies, such as OpenAI and Perplexity, have struck deals in recent months to pay content providers for access to their data.luck It was one of several news providers to announce revenue-sharing deals with Perplexity in July.

Flying Under the Radar

According to data from Dark Visitors, roughly 25% of the world’s most popular websites currently block GPTBot, while only 2% block Meta’s new bot.

Websites that attempt to block web scrapers must add a line of code to their codebase to tell scraper bots to ignore information on their site. However, respecting robots.txt usually requires that the specific name of the scraper bot also be added, which is difficult to achieve if the name is not publicly available. Scraper bot operators are free to ignore robots.txt; robots.txt is not enforceable or legally binding.

Such scrapers are used to extract large amounts of data and text from the web to use as training data for generative AI models (also known as large-scale language models, or LLMs) and related tools. Meta’s Llama is one of the largest LLMs available, and is used for things like the AI ​​chat bot Meta AI that’s currently appearing on various Meta platforms. The company hasn’t released the training data used for the latest version of the model, Llama 3, but earlier versions of the model used large data sets compiled by other sources, such as Common Crawl.

Earlier this year, Meta co-founder and longtime CEO Mark Zuckerberg boasted during an earnings call that his social platform was amassing a dataset for AI training that was “larger than Common Crawl, which has been collecting about 3 billion web pages every month since 2011.”

But the new crawler suggests that Meta’s vast data pool may no longer be enough. The company is working on updating Llama and expanding Meta AI, which typically requires new, better training data to keep improving. Meta plans to spend up to $40 billion this year, mostly on AI infrastructure and related costs.

Are you a Meta employee or someone with insights and tips you’d like to share? To contact Kali Hays securely, signal Contact her at +1-949-280-0267 or kali.hays@fortune.com.

Recommended Newsletters: Advanced insights for high-powered executives. Subscribe to our free CEO Daily newsletter now. Subscribe now.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleZayo wins $27.8 million in federal funding to provide ‘middle mile’ internet infrastructure across Dallas County » Dallas Innovates
Next Article Rustad candidate thinks 5G is a ‘genocidal’ weapon: BC United
5gantennas.org
  • Website

Related Posts

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024

Why Honeywell is betting big on Gen AI

August 29, 2024

Ethically questionable or creative genius? How artists are engaging with AI in their work | Art and Design

August 29, 2024

Comments are closed.

Latest Posts

4 Best Wi-Fi Mesh Networking Systems in 2024

September 6, 2024

India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

August 29, 2024

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024

Crypto Markets Rise on Strong US Economic Data

August 29, 2024
Don't Miss

Business News | Communications Minister Scindia promotes 6G leadership and nationwide broadband in meeting with telecom operators

By 5gantennas.orgAugust 24, 2024

New Delhi [India]August 24 (ANI): Union Telecom Minister Jyotiraditya Scindia along with Minister of State…

SingTel and SK Telecom prepare for the 6G future

July 8, 2024

Apple focuses on 6G for future iPhones

December 11, 2023

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to 5GAntennas.org, your reliable source for comprehensive information on 5G technology, artificial intelligence (AI), and data-related advancements. We are passionate about staying at the forefront of these cutting-edge fields and bringing you the latest insights, trends, and developments.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

4 Best Wi-Fi Mesh Networking Systems in 2024

September 6, 2024

India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

August 29, 2024

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024
Most Popular

Will 5G make 2024 the most connected year in the industry?

December 1, 2023

The current state of 5G in the US and how it can improve

September 28, 2023

How 5G technology will transform gaming on the go

January 31, 2024
© 2026 5gantennas. Designed by 5gantennas.
  • Home
  • About us
  • Contact us
  • DMCA
  • Privacy Policy
  • About Creator

Type above and press Enter to search. Press Esc to cancel.