Please wait...
Hey CabinetM this profile needs updating!

Apache Software Foundation

 
Forest Hills, Maryland, United States
1999
Company Overview
The mission of the Apache Software Foundation (ASF) is to provide software for the public good. Through a collaborative and meritocratic development process known as The Apache Way, Apache projects deliver enterprise-grade, freely available software products that attract large communities of users. In 2002, ASF launched The Apache Incubator, an entry path for projects and codebases for Apache-powered open-source projects intended to become official Apache Software Foundation projects.
How well is this profile written?

Products and Services

Apache Airflow

 
Airflow is a platform created by the community to programmatically author, schedule and monitor workflows. Airflow has a modular architecture and uses a message queue to orchestrate an arbitrary number of workers. Easily define your own operators and extend libraries to fit the level of abstraction that suits your environment.
Read More →

Apache Arrow

 
Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.
Read More →

Apache Atlas

 
Atlas is a scalable and extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem. Apache Atlas provides open metadata management and governance capabilities for organizations to build a catalog of their data assets, classify and govern these assets and provide collaboration capabilities around these data assets for data scientists, analysts and the data governance team.
Read More →

Apache Beam

 
Apache Beam is an advanced unified programming model. Implement batch and streaming data processing jobs that run on any execution engine.
Read More →

Apache Druid

 
Apache Druid is a high performance real-time analytics database. Druid is designed for workflows where fast ad-hoc analytics, instant data visibility, or supporting high concurrency is important.
Read More →

Apache Flink

 
Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.
Read More →

Apache Hadoop

 
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Read More →

Apache Hadoop HDFS

 
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is now an Apache Hadoop subproject.
Read More →

Apache Hive

 
The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
Read More →

Apache HTTP Server

 
The Apache HTTP Server Project is an effort to develop and maintain an open-source HTTP server for modern operating systems including UNIX and Windows. The goal of this project is to provide a secure, efficient and extensible server that provides HTTP services in sync with the current HTTP standards.
Read More →

Apache Hudi

 
Apache Hudi ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores). Hudi brings stream processing to big data, providing fresh data while being an order of magnitude efficient over traditional batch processing.
Read More →

Apache Iceberg

 
Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Trino and Spark that use a high-performance format that works just like a SQL table.
Read More →

Apache Impala

 
Apache Impala is the open source, native analytic database for Apache Hadoop. Impala raises the bar for SQL query performance on Apache Hadoop while retaining a familiar user experience. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time. Furthermore, Impala uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Hue Beeswax) as Apache Hive, providing a familiar and unified platform for batch-oriented or real-time queries.
Read More →

Apache Kafka

 
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Read More →

Apache Oozie

 
Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
Read More →

Apache Orc

 
Apache Orc is columnar storage for Hadoop workloads. It is optimized for large streaming reads, but with integrated support for finding required rows quickly. Storing data in a columnar format lets the reader read, decompress, and process only the values that are required for the current query. Because ORC files are type-aware, the writer chooses the most appropriate encoding for the type and builds an internal index as the file is written.
Read More →

Apache Parquet

 
Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
Read More →

Apache Pinot

 
Apache Pinot is a real-time distributed OLAP datastore, purpose-built to provide ultra low-latency analytics, even at extremely high throughput. It can ingest directly from streaming data sources - such as Apache Kafka and Amazon Kinesis - and make the events available for querying instantly. It can also ingest from batch data sources - such as Hadoop HDFS, Amazon S3, Azure ADLS, Google Cloud Storage.
Read More →

Apache SINGA

 
Apache SINGA is an open source deep learning library that provides a flexible architecture for scalable distributed training. The extensible platform supports machine learning models, and can run over a wide range of hardware. SINGA is released under Apache License Version 2.0.
Read More →

Apache Solr

 
Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via JSON, XML, CSV or binary over HTTP. You query it via HTTP GET and receive JSON, XML, CSV or binary results.
Read More →

Apache Spark

 
Apache Spark is a unified analytics engine for large-scale data processing. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.
Read More →

Apache Zeppelin

 
Apache Zeppelin is a web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Read More →

OpenOffice

 
Apache OpenOffice is an open source desktop suite that features six business productivity applications: a word processor with web-authoring component, spreadsheet, presentation graphics, drawing, equation editor, and database.
Read More →