Close Menu
5gantennas.org5gantennas.org
  • Home
  • 5G
    • 5G Technology
  • 6G
  • AI
  • Data
    • Global 5G
  • Internet
  • WIFI
  • 5G Antennas
  • Legacy

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

4 Best Wi-Fi Mesh Networking Systems in 2024

September 6, 2024

India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

August 29, 2024

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
5gantennas.org5gantennas.org
  • Home
  • 5G
    1. 5G Technology
    2. View All

    Deutsche Telekom to operate 12,500 5G antennas over 3.6 GHz band

    August 28, 2024

    URCA Releases Draft “Roadmap” for 5G Rollout in the Bahamas – Eye Witness News

    August 23, 2024

    Smart Launches Smart ZTE Blade A75 5G » YugaTech

    August 22, 2024

    5G Drone Integration Denmark – DRONELIFE

    August 21, 2024

    Hughes praises successful private 5G demo for U.S. Navy

    August 29, 2024

    GSA survey reveals 5G FWA has become “mainstream”

    August 29, 2024

    China Mobile expands 5G Advanced, Chunghwa Telecom enters Europe

    August 29, 2024

    Ateme and ORS Boost 5G Broadcast Capacity with “World’s First Trial of IP-Based Statmux over 5G Broadcast” | TV Tech

    August 29, 2024
  • 6G

    India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

    August 29, 2024

    Vodafonewatch Weekly: Rural 4G, Industrial 5G, 6G Patents | Weekly Briefing

    August 29, 2024

    Southeast Asia steps up efforts to build 6G standards

    August 29, 2024

    Energy efficiency as an inherent attribute of 6G networks

    August 29, 2024

    Finnish working group launches push for 6G technology

    August 28, 2024
  • AI

    Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

    August 29, 2024

    Why Honeywell is betting big on Gen AI

    August 29, 2024

    Ethically questionable or creative genius? How artists are engaging with AI in their work | Art and Design

    August 29, 2024

    “Elon Musk and Trump” arrested for burglary in disturbing AI video

    August 29, 2024

    Nvidia CFO says ‘enterprise AI wave’ has begun and Fortune 100 companies are leading the way

    August 29, 2024
  • Data
    1. Global 5G
    2. View All

    Global 5G Enterprise Market is expected to be valued at USD 34.4 Billion by 2032

    August 12, 2024

    Counterpoint predicts 5G will dominate the smartphone market in early 2024

    August 5, 2024

    Best Super Fast Download Companies — TradingView

    July 31, 2024

    Qualcomm’s new chipsets will power affordable 5G smartphones

    July 31, 2024

    Crypto Markets Rise on Strong US Economic Data

    August 29, 2024

    Microsoft approves construction of third section of Mount Pleasant data center campus

    August 29, 2024

    China has invested $6.1 billion in state-run data center projects over two years, with the “East Data, West Computing” initiative aimed at capitalizing on the country’s untapped land.

    August 29, 2024

    What is the size of the clinical data analysis solutions market?

    August 29, 2024
  • Internet

    NATO believes Russia poses a threat to Western internet and GPS services

    August 29, 2024

    Mpeppe grows fast, building traction among Internet computer owners

    August 29, 2024

    Internet Computer Whale Buys Mpeppe (MPEPE) at 340x ROI

    August 29, 2024

    Long-term internet computer investor adds PEPE rival to holdings

    August 29, 2024

    Biden-Harris Administration Approves Initial Internet for All Proposals in Mississippi and South Dakota

    August 29, 2024
  • WIFI

    4 Best Wi-Fi Mesh Networking Systems in 2024

    September 6, 2024

    Best WiFi deal: Save $200 on the Starlink Standard Kit AX

    August 29, 2024

    Sonos Roam 2 review | Good Housekeeping UK

    August 29, 2024

    Popular WiFi extender that eliminates dead zones in your home costs just $12

    August 29, 2024

    North American WiFi 6 Mesh Router Market Size, Share, Forecast, [2030] – அக்னி செய்திகள்

    August 29, 2024
  • 5G Antennas

    Nokia and Claro bring 5G to Argentina

    August 27, 2024

    Nokia expands FWA portfolio with new 5G devices – SatNews

    July 25, 2024

    Deutsche Telekom to operate 12,150 5G antennas over 3.6 GHz band

    July 24, 2024

    Vodafone and Ericsson develop a compact 5G antenna in Germany

    July 12, 2024

    Vodafone and Ericsson unveil new small antennas to power Germany’s 5G network

    July 11, 2024
  • Legacy
5gantennas.org5gantennas.org
Home»Data»put data in table
Data

put data in table

5gantennas.orgBy 5gantennas.orgFebruary 27, 2024No Comments6 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


(Tee11/Shutterstock)

One of the big advances in data engineering over the past 7-8 years has been the advent of tabular formats. Typically layered on top of columnar Parquet files, table formats such as Apache Iceberg, Delta, and Apache Hudi offer important benefits for big data operations, including the introduction of transactions. However, tabular format also introduces additional costs that customers should be aware of.

Each of the three major table formats was developed by different groups, making their origin stories unique. However, they were primarily developed in response to similar technical limitations in the big data landscape that affects all types of business operations.

For example, Apache Hudi was originally created in 2016 by Uber’s data engineering team, which was a large-scale user (and large-scale developer) of big data technologies. Hudi, short for Hadoop Upserts, Deletes, Incrementals, was born out of a desire to improve file processing for his large-scale Hadoop data lake.

Apache Iceberg, on the other hand, emerged in 2017 from Netflix, which is also a big user of big data technology. The company’s engineers were frustrated by the limitations of the Apache Hive metastore. This limitation could lead to corruption and incorrect answers when the same file was accessed by different query engines.

Image source: Apache Software Foundation

Similarly, the guys at Databricks developed Delta in 2017 when too many data lakes were turning into data swamps. A key component of Databricks’ Delta Lake, the Delta table format allows users to get data warehousing-like quality and precision for data stored in their S3 or HDFS data lakes, or lakehouses. .

As a data engineering automation provider, Nexla supports all three table formats. As our client’s big data repository grew, we realized we needed to improve data management for analytics use cases.

A big advantage of all table formats is the ability to see how records have changed over time. This is a feature that has been commonly used for transactional use cases for decades, but is fairly new for analytical use cases, says his CTO, Avinash Shahdadpuri. Co-founder of Nexura.

“The marquetry format didn’t really have any history,” he says. Data Nami In an interview. “If you had a record and you wanted to see how this record changed over a period of time in his two versions of the Parquet file, it was very difficult to do that.”

Adding a new metadata layer within the tabular format allows users to gain ACID transaction visibility over data stored in Parquet files. Parquet files have become the dominant format for storing columnar data in S3 and HDFS data lakes (other big data include ORC and Avro). format).

“That’s where ACID comes in a little bit. It gives you a history of how this record has changed over time, so you can roll back with more certainty,” says Shahdadpuri. “You can now essentially version control your data.”

Image source: Snowflake

This ability to roll back data to a previous version is useful in certain situations, such as data sets that are continually updated. This is not ideal if new data is appended to the end of the file.

“When data is not just appended (which is the case in 95% of these traditional Parquet file use cases), it can be deleted, merged, and updated much more efficiently than traditional methods. , this method tends to be better. It could be done using classic parquet files,” says Shahdadpuri.

The table format allows users to perform more data operations directly on the data lake, similar to a database. This saves customers time and money by taking data out of the lake, manipulating it, and putting it back into the lake, Shahdapuri said.

Of course, users can leave their data in the database, but traditional databases cannot scale to petabytes. Distributed file systems like HDFS and object stores like S3 can easily scale to petabytes. Additionally, the addition of a table format means users no longer have to compromise on transactionality and accuracy.

It’s not without its disadvantages. There are always tradeoffs in computer architecture, and tabular formats come with their own costs. According to Shahdadpuri, costs come in the form of increased storage and complexity.

Image source: Databricks

In terms of storage, storing metadata in table format can increase storage overhead by as little as 10%, and can incur up to a 2x penalty for continuously changing data. Yes, says Shahdadpuri.

“It can increase your storage costs quite a bit because before you were just storing Parquet. Now you’re storing versions of Parquet,” he says. “Now we’re preserving the meta files we already had in Parquet, which increases costs and ultimately forces us to make trade-offs.

Customers should ask themselves whether they really need the additional functionality that table format brings. If you don’t need the transactional or time-travel capabilities that ACID brings, for example because most of your data is append-only, he says, a traditional guy might be better off sticking with Parquet.

“Having this additional layer definitely adds complexity and adds complexity in a number of ways,” Shahdadpuri says. “So Delta can be a little more performance intensive than Parquet. All of these formats cost a little more performance. But you’re paying for that somewhere, right? ”

There is no single best table format, he says. Instead, the most suitable format emerges after analyzing each client’s specific needs. “It’s up to the customer. It depends on the use case,” says Shahdadpuri. “We want to be independent. As a solution, we support each of these things.”

That said, Nexla officials have observed certain trends in table format adoption. A big factor is how customers align around the big data giants Databricks and Snowflake.

As the creators of Delta, Databricks is firmly in that camp, with Snowflake backing Iceberg. Hudi is not backed by a major big data player, but by Onehouse, a startup founded by Hudi creator Vinoth Chandar. Iceberg is backed by Tabular, co-founded by Ryan Blue, who helped create Iceberg at Netflix.

Larger companies will likely have a mix of different table formats, Shahdapuri said. That leaves room for companies like Nexla to step in and provide tools to automate these forms of integration, or for consulting firms to manually piece them together.

Related products:

Demystifying big data file formats

Lakehouse Data Smackdown moves open table format to square off

Data lakehouses are on the horizon, but it’s not smooth sailing yet

tag:
Acid, ACID transactions, Apache Hudi, Apache Iceberg, big data, data management, delta, delta lake, delta table format, Hadoop, rollback, s3, table format



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleCanada partners internationally to approve 6G principles
Next Article Inkitt, a self-publishing platform that uses AI to develop bestsellers, records $37 million led by Khosla.
5gantennas.org
  • Website

Related Posts

Crypto Markets Rise on Strong US Economic Data

August 29, 2024

Microsoft approves construction of third section of Mount Pleasant data center campus

August 29, 2024

China has invested $6.1 billion in state-run data center projects over two years, with the “East Data, West Computing” initiative aimed at capitalizing on the country’s untapped land.

August 29, 2024
Leave A Reply Cancel Reply

You must be logged in to post a comment.

Latest Posts

4 Best Wi-Fi Mesh Networking Systems in 2024

September 6, 2024

India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

August 29, 2024

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024

Crypto Markets Rise on Strong US Economic Data

August 29, 2024
Don't Miss

Exploring 6G Technology: Partnerships, Progress, and Future Implications

By 5gantennas.orgFebruary 23, 2024

Although 6G technology is still years away from commercialization, interest in its development is steadily…

Auden takes on ever-higher frequencies with 6G efforts

February 26, 2024

Samsung joins AI-RAN Alliance to lead AI and 6G innovation

February 27, 2024

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to 5GAntennas.org, your reliable source for comprehensive information on 5G technology, artificial intelligence (AI), and data-related advancements. We are passionate about staying at the forefront of these cutting-edge fields and bringing you the latest insights, trends, and developments.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

4 Best Wi-Fi Mesh Networking Systems in 2024

September 6, 2024

India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

August 29, 2024

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024
Most Popular

No, they didn’t perform surgery on bananas over 5G

August 28, 2023

MWC 2024: Jio collaborates with Qualcomm to launch new 5G smartphone under Rs 10,000

February 27, 2024

Verizon opens cashierless stores at NFL stadiums using 5G technology

December 11, 2023
© 2025 5gantennas. Designed by 5gantennas.
  • Home
  • About us
  • Contact us
  • DMCA
  • Privacy Policy
  • About Creator

Type above and press Enter to search. Press Esc to cancel.