Apache Software Foundation on CabinetM

Please wait...

Hey CabinetM this profile needs updating!

Is this your Company? Claim this profile

Apache Software Foundation

Headquarters

Forest Hills, Maryland, United States

Founded

1999

Company Website

Company Overview
The mission of the Apache Software Foundation (ASF) is to provide software for the public good. Through a collaborative and meritocratic development process known as The Apache Way, Apache projects deliver enterprise-grade, freely available software products that attract large communities of users. In 2002, ASF launched The Apache Incubator, an entry path for projects and codebases for Apache-powered open-source projects intended to become official Apache Software Foundation projects.

How well is this profile written?

Products and Services

Apache Airflow

Airflow is a platform created by the community to programmatically author, schedule and monitor workflows. Airflow has a modular architecture and uses a message queue to orchestrate an arbitrary number of workers. Easily define your own operators and extend libraries to fit the level of abstraction that suits your environment.

Apache Arrow

Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

Apache Atlas

Atlas is a scalable and extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem. Apache Atlas provides open metadata management and governance capabilities for organizations to build a catalog of their data assets, classify and govern these assets and provide collaboration capabilities around these data assets for data scientists, analysts and the data governance team.

Apache Beam

Apache Beam is an advanced unified programming model. Implement batch and streaming data processing jobs that run on any execution engine.

Apache Druid

Apache Druid is a high performance real-time analytics database. Druid is designed for workflows where fast ad-hoc analytics, instant data visibility, or supporting high concurrency is important.

Apache Flink

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.

Apache Hadoop

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Apache Hadoop HDFS

The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is now an Apache Hadoop subproject.

Apache Hive

The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

Apache HTTP Server

The Apache HTTP Server Project is an effort to develop and maintain an open-source HTTP server for modern operating systems including UNIX and Windows. The goal of this project is to provide a secure, efficient and extensible server that provides HTTP services in sync with the current HTTP standards.

Apache Hudi

Apache Hudi ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores). Hudi brings stream processing to big data, providing fresh data while being an order of magnitude efficient over traditional batch processing.

Apache Iceberg

Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Trino and Spark that use a high-performance format that works just like a SQL table.

Apache Impala

Apache Impala is the open source, native analytic database for Apache Hadoop. Impala raises the bar for SQL query performance on Apache Hadoop while retaining a familiar user experience. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time. Furthermore, Impala uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Hue Beeswax) as Apache Hive, providing a familiar and unified platform for batch-oriented or real-time queries.

Apache Kafka

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Apache Oozie

Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Apache Orc

Apache Orc is columnar storage for Hadoop workloads. It is optimized for large streaming reads, but with integrated support for finding required rows quickly. Storing data in a columnar format lets the reader read, decompress, and process only the values that are required for the current query. Because ORC files are type-aware, the writer chooses the most appropriate encoding for the type and builds an internal index as the file is written.

Apache Parquet

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

Apache Pinot

Apache Pinot is a real-time distributed OLAP datastore, purpose-built to provide ultra low-latency analytics, even at extremely high throughput. It can ingest directly from streaming data sources - such as Apache Kafka and Amazon Kinesis - and make the events available for querying instantly. It can also ingest from batch data sources - such as Hadoop HDFS, Amazon S3, Azure ADLS, Google Cloud Storage.

Apache SINGA

Apache SINGA is an open source deep learning library that provides a flexible architecture for scalable distributed training. The extensible platform supports machine learning models, and can run over a wide range of hardware. SINGA is released under Apache License Version 2.0.

Marketing Function Productivity

Product Category

Analytics: Business Intelligence

Data Visualization & Presentation

Deep Learning

Deep Learning

Enterprise Data Management

Big Data Solutions , Data Lake , Data Management & Intelligence , Data Virtualization , Database Creation and Development , Enterprise Data Warehouses , Master Data Management

Integrations

Data Integration

Productivity & Workflow

Desktop Publishing Software

Web Infrastructure

Web Infrastructure

Web and App Optimization

Usability: User Experience (UX) Testing