Apache Flink

Related tags

java scala big-data flink
Overview

Apache Flink

Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities.

Learn more about Flink at https://flink.apache.org/

Features

  • A streaming-first runtime that supports both batch processing and data streaming programs

  • Elegant and fluent APIs in Java and Scala

  • A runtime that supports very high throughput and low event latency at the same time

  • Support for event time and out-of-order processing in the DataStream API, based on the Dataflow Model

  • Flexible windowing (time, count, sessions, custom triggers) across different time semantics (event time, processing time)

  • Fault-tolerance with exactly-once processing guarantees

  • Natural back-pressure in streaming programs

  • Libraries for Graph processing (batch), Machine Learning (batch), and Complex Event Processing (streaming)

  • Built-in support for iterative programs (BSP) in the DataSet (batch) API

  • Custom memory management for efficient and robust switching between in-memory and out-of-core data processing algorithms

  • Compatibility layers for Apache Hadoop MapReduce

  • Integration with YARN, HDFS, HBase, and other components of the Apache Hadoop ecosystem

Streaming Example

case class WordWithCount(word: String, count: Long)

val text = env.socketTextStream(host, port, '\n')

val windowCounts = text.flatMap { w => w.split("\\s") }
  .map { w => WordWithCount(w, 1) }
  .keyBy("word")
  .window(TumblingProcessingTimeWindow.of(Time.seconds(5)))
  .sum("count")

windowCounts.print()

Batch Example

case class WordWithCount(word: String, count: Long)

val text = env.readTextFile(path)

val counts = text.flatMap { w => w.split("\\s") }
  .map { w => WordWithCount(w, 1) }
  .groupBy("word")
  .sum("count")

counts.writeAsCsv(outputPath)

Building Apache Flink from Source

Prerequisites for building Flink:

  • Unix-like environment (we use Linux, Mac OS X, Cygwin, WSL)
  • Git
  • Maven (we recommend version 3.2.5 and require at least 3.1.1)
  • Java 8 or 11 (Java 9 or 10 may work)
git clone https://github.com/apache/flink.git
cd flink
mvn clean package -DskipTests # this will take up to 10 minutes

Flink is now installed in build-target.

NOTE: Maven 3.3.x can build Flink, but will not properly shade away certain dependencies. Maven 3.1.1 creates the libraries properly. To build unit tests with Java 8, use Java 8u51 or above to prevent failures in unit tests that use the PowerMock runner.

Developing Flink

The Flink committers use IntelliJ IDEA to develop the Flink codebase. We recommend IntelliJ IDEA for developing projects that involve Scala code.

Minimal requirements for an IDE are:

  • Support for Java and Scala (also mixed projects)
  • Support for Maven with Java and Scala

IntelliJ IDEA

The IntelliJ IDE supports Maven out of the box and offers a plugin for Scala development.

Check out our Setting up IntelliJ guide for details.

Eclipse Scala IDE

NOTE: From our experience, this setup does not work with Flink due to deficiencies of the old Eclipse version bundled with Scala IDE 3.0.3 or due to version incompatibilities with the bundled Scala version in Scala IDE 4.4.1.

We recommend to use IntelliJ instead (see above)

Support

Don’t hesitate to ask!

Contact the developers and community on the mailing lists if you need any help.

Open an issue if you found a bug in Flink.

Documentation

The documentation of Apache Flink is located on the website: https://flink.apache.org or in the docs/ directory of the source code.

Fork and Contribute

This is an active open-source project. We are always open to people who want to use the system or contribute to it. Contact us if you are looking for implementation tasks that fit your skills. This article describes how to contribute to Apache Flink.

About

Apache Flink is an open source project of The Apache Software Foundation (ASF). The Apache Flink project originated from the Stratosphere research project.

Issues
  • [FLINK-2030][FLINK-2274][core][utils]Histograms for Discrete and Continuous valued data

    [FLINK-2030][FLINK-2274][core][utils]Histograms for Discrete and Continuous valued data

    This implements the Online Histograms for both categorical and continuous data. For continuous data, we emulate a continuous probability distribution which supports finding cumulative sum upto a particular value, and finding value upto a specific cumulative probability [Quantiles]. For categorical fields, we emulate a probability mass function which supports finding the probability associated with every class. The continuous histogram follows this paper: http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf

    Note: This is a sub-task of https://issues.apache.org/jira/browse/FLINK-1727 which already has a PR pending review at https://github.com/apache/flink/pull/710.

    Edit: This adds methods to evaluate statistics on data sets of vectors, like column wise statistics. These include minimum, maximum, mean, variance, entropy, gini impurity, etc. [FLINK-2379]

    Edit: Splitting the PR to move the Statistics part to another PR. #1032

    component=API/DataSet 
    opened by sachingoel0101 111
  • [FLINK-377] [FLINK-671] Generic Interface / PAPI

    [FLINK-377] [FLINK-671] Generic Interface / PAPI

    This PR contains the new Generic Language Interface and the Python API built on top of it.

    This version hasn't been tested yet on a cluster, this will be done over the weekend. I'm making the PR already so that the reviewing portion starts earlier. (since only minor changes will be necessary to make it work)

    I will mark several parts where i specifically would like some input on.

    Relevant issues: Ideally, [FLINK-1040] will be merged before this is one, as it removes roughly 600 lines of very much hated code in the PlanBinder.

    A while ago the distributed cache was acting up, not maintaining files across subsequent operations. I will verify whether this issue still exists while testing. That would not strictly be a blocking issue, as it stands i could work around that (with the caveat that a few files will remain in the tmp folder).

    component=API/Python 
    opened by zentol 100
  • [FLINK-2111] Add

    [FLINK-2111] Add "stop" signal to cleanly shutdown streaming jobs

    added TERMINATE signal

    • JobManager, TaskManager, ExecutionGraph
    • CliFrontend, JobManagerFrontend
    • StreamTask, StreamSource
    component=Runtime/Coordination component=Runtime/Operators component=Runtime/WebFrontend 
    opened by mjsax 95
  • [FLINK-21675][table-planner-blink] Allow Predicate Pushdown with Watermark Assigner Between Filter and Scan

    [FLINK-21675][table-planner-blink] Allow Predicate Pushdown with Watermark Assigner Between Filter and Scan

    What is the purpose of the change

    Purpose of the change is to fix https://issues.apache.org/jira/browse/FLINK-21675 which allows predicates to be pushed down even though a LogicalWatermarkAssigner is present between the LogicalFilter and LogicalTableScan.

    This means, that, for example, the following table plan:

    LogicalProject(name=[$0], event_time=[$1])
    +- LogicalFilter(condition=[AND(=(LOWER($0), _UTF-16LE'foo'), IS NOT NULL($0))])
       +- LogicalWatermarkAssigner(rowtime=[event_time], watermark=[-($1, 5000:INTERVAL SECOND)])
          +- LogicalTableScan(table=[[default_catalog, default_database, WithWatermark]])
    

    When PP is enabled, and given that there's support for LOWER pushdown, can be re-written to:

    Calc(select=[name, event_time], where=[IS NOT NULL(name)])
    +- WatermarkAssigner(rowtime=[event_time], watermark=[-(event_time, 5000:INTERVAL SECOND)])
       +- TableSourceScan(table=[[default_catalog, default_database, WithWatermark, filter=[equals(lower(name), 'foo')]]], fields=[name, event_time])
    

    Brief change log

    • Implement PushFilterIntoTableSourceScanAcrossWatermarkRule
    • Refactor PushFilterIntoTableSourceScanRule to inherit a common base class PushFilterIntoSourceScanRuleBase
    • Add new rule to FlinkStreamRuleSets under FILTER_TABLESCAN_PUSHDOWN_RULES and LOGICAL_RULES
    • Implement tests

    Verifying this change

    This change added tests and can be verified as follows:

    • Added testPushdownAcrossWatermarkFullPredicateMatch for testing full match of predicates (all predicates in query) and with SupportsWatermarkPushdown support
    • Added testPushdownAcrossWatermarkPartialPredicateMatch for testing that a partial match generates the correct plan (LogicalFilter with the remaining predicates to be filtered).

    Does this pull request potentially affect one of the following parts:

    • Dependencies (does it add or upgrade a dependency): (yes / no)
    • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
    • The serializers: (yes / no / don't know)
    • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
    • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't know)
    • The S3 file system connector: (yes / no / don't know)

    Documentation

    • Does this pull request introduce a new feature? (yes / no)
    • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
    component=TableSQL/Planner review=description? 
    opened by YuvalItzchakov 92
  • [FLINK-9697] Provide connector for modern Kafka

    [FLINK-9697] Provide connector for modern Kafka

    What is the purpose of the change

    This pull request provides connector for Kafka 2.0.0

    Brief change log

    • Provide connector for Kafka 2.0.0

    Verifying this change

    This change is already covered by existing tests*.

    Does this pull request potentially affect one of the following parts:

    • Dependencies (does it add or upgrade a dependency): (yes / no)
    • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
    • The serializers: (yes / no / don't know)
    • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
    • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
    • The S3 file system connector: (yes / no / don't know)

    Documentation

    • Does this pull request introduce a new feature? (yes / no)
    • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
    component= 
    opened by yanghua 77
  • [FLINK-1745] Add exact k-nearest-neighbours algorithm to machine learning library

    [FLINK-1745] Add exact k-nearest-neighbours algorithm to machine learning library

    I added a quadtree data structure for the knn algorithm. @chiwanpark made originally made a pull request for a kNN algorithm, and we coordinated so that I incorporate a tree structure. The quadtree scales very well with the number of training + test points, but scales poorly with the dimension (even the R-tree scales poorly with the dimension). I added a flag that is automatically determines whether or not to use the quadtree. My implementation needed to use the Euclidean or SquaredEuclidean distance since I needed a specific notion of the distance between a test point and a box in the quadtree. I added another test KNNQuadTreeSuite in addition to Chiwan Park's KNNITSuite, since C. Park's parameters will automatically choose the brute-force non-quadtree method.

    For more details on the quadtree + how I used it for the KNN query, please see another branch I created that has a README.md: https://github.com/danielblazevski/flink/tree/FLINK-1745-devel/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn

    component=Library/MachineLearning 
    opened by danielblazevski 76
  • [FLINK-2797][cli] Add support for running jobs in detached mode from CLI

    [FLINK-2797][cli] Add support for running jobs in detached mode from CLI

    Adds an option -d to run jobs in detached mode.

    component=CommandLineClient 
    opened by sachingoel0101 74
  • FLINK-2168 Add HBaseTableSource

    FLINK-2168 Add HBaseTableSource

    Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the How To Contribute guide. In addition to going through the list, please provide a meaningful description of your changes.

    • [ ] General

      • The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
      • The pull request addresses only one issue
      • Each commit in the PR has a meaningful commit message (including the JIRA id)
    • [ ] Documentation

      • Documentation has been added for new functionality
      • Old documentation affected by the pull request has been updated
      • JavaDoc for public methods has been added
    • [ ] Tests & Build

      • Functionality added by the pull request is covered by tests
      • mvn clean verify has been executed successfully locally or a Travis build has passed

    @fhueske Trying to create the first version of this PR. I have made the necessary changes to support HBaseTableSource by creating a HBaseTableInputFormat but lot of code is duplicated with TableInputFormat. I have not unified them for now. I tried compiling this code in my linux box but the @Override that I have added in HBaseTableSource overriding the BatchTableSource API shows as compilation issues but my IntelliJ IDE is fine and does not complain. Pls provide your valuable feed back so that I can rebase the next PR with suitable fixes. Thanks.

    component=API/TableSQL 
    opened by ramkrish86 74
  • [FLINK-9311] [pubsub] Added PubSub source connector with support for checkpointing (ATLEAST_ONCE)

    [FLINK-9311] [pubsub] Added PubSub source connector with support for checkpointing (ATLEAST_ONCE)

    What is the purpose of the change Adding a PubSub connector with support for Checkpointing

    Verifying this change This change added tests and can be verified as follows:

    • Added unit tests.

    • Added integration tests to flink-end-to-end which runs against docker.

    • An example has been added in flink-examples which runs against the actual Google PubSub service. this has been manually verified.

    • Is there a need for integration tests? We feel like there is and have added them.

    Does this pull request potentially affect one of the following parts:

    • Dependencies (does it add or upgrade a dependency): Yes, Google Cloud Sdk for PubSub but because it is a connector this does not add any dependencies in flink itself.
    • The public API, i.e., is any changed class annotated with @Public(Evolving): No
    • The serializers: No
    • The runtime per-record code paths (performance sensitive): Nothing has been changed only a connector has been added.
    • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing,
    • Yarn/Mesos, ZooKeeper: No
    • The S3 file system connector: No

    Documentation Does this pull request introduce a new feature? Yes If yes, how is the feature documented? JavaDocs, added an example in flink-examples and we updated the website docs.

    component=Connectors/GoogleCloudPubSub 
    opened by Xeli 69
  • [FLINK-19667] Add AWS Glue Schema Registry integration

    [FLINK-19667] Add AWS Glue Schema Registry integration

    What is the purpose of the change

    The AWS Glue Schema Registry is a new feature of AWS Glue that allows you to centrally discover, control, and evolve data stream schemas. This request is to add a new format to launch an integration for Apache Flink with AWS Glue Schema Registry.

    Brief change log

    • Added flink-avro-glue-schema-registry module under flink-formats
    • Added end-to-end test named flink-glue-schema-registry-test for the new module

    Verifying this change

    This change added tests and can be verified as follows:

    • Added integration tests for end-to-end deployment

    Does this pull request potentially affect one of the following parts:

    • Dependencies (does it add or upgrade a dependency): (yes)
    • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
    • The serializers: (yes)
    • The runtime per-record code paths (performance sensitive): (don't know)
    • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
    • The S3 file system connector: (no)

    Documentation

    • Does this pull request introduce a new feature? (yes)
    • If yes, how is the feature documented? (JavaDocs)
    component=Formats review=description? 
    opened by mohitpali 57
  • [hotfix][python][docs] Fix python.archives docs

    [hotfix][python][docs] Fix python.archives docs

    What is the purpose of the change

    Fix docs on python.archives option. This is a follow-up to http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Confusing-docs-on-python-archives-td43240.html

    Brief change log

    Fix docs.

    Verifying this change

    This change is a trivial rework / code cleanup without any test coverage.

    Does this pull request potentially affect one of the following parts:

    • Dependencies (does it add or upgrade a dependency): no
    • The public API, i.e., is any changed class annotated with @Public(Evolving): no
    • The serializers: no
    • The runtime per-record code paths (performance sensitive): no
    • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
    • The S3 file system connector: no

    Documentation

    • Does this pull request introduce a new feature? no
    • If yes, how is the feature documented? not applicable
    review=description? 
    opened by YikSanChan 2
  • [hotfix][table-planner-blink] Fix unstable itcase in OverAggregateITCase#testRowNumberOnOver.

    [hotfix][table-planner-blink] Fix unstable itcase in OverAggregateITCase#testRowNumberOnOver.

    What is the purpose of the change

    This pull request aims to correct unstable itcase in OverAggregateITCase#testRowNumberOnOver.

    Brief change log

    • correct unstable itcase in OverAggregateITCase#testRowNumberOnOver.

    Verifying this change

    This change is a trivial rework without any test coverage.

    Does this pull request potentially affect one of the following parts:

    • Dependencies (does it add or upgrade a dependency): (no)
    • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
    • The serializers: (no)
    • The runtime per-record code paths (performance sensitive): (no)
    • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
    • The S3 file system connector: (no)

    Documentation

    • Does this pull request introduce a new feature? (no)
    • If yes, how is the feature documented? (not applicable)
    review=description? 
    opened by beyond1920 2
  • [FLINK-22487][python] Support `print` to print logs in PyFlink

    [FLINK-22487][python] Support `print` to print logs in PyFlink

    What is the purpose of the change

    This pull request will support print to print logs in PyFlink

    Does this pull request potentially affect one of the following parts:

    • Dependencies (does it add or upgrade a dependency): (no)
    • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
    • The serializers: (no)
    • The runtime per-record code paths (performance sensitive): (no)
    • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
    • The S3 file system connector: (no)

    Documentation

    • Does this pull request introduce a new feature? (no)
    • If yes, how is the feature documented? (not applicable)
    component=API/Python review=description? 
    opened by HuangXingBo 2
  • 1.11 backport [FLINK-22424][network] Prevent releasing PipelinedSubpartition while Task can still write to it

    1.11 backport [FLINK-22424][network] Prevent releasing PipelinedSubpartition while Task can still write to it

    this is a backport of https://github.com/apache/flink/pull/15734

    component=Runtime/Network review=description? 
    opened by pnowojski 2
  • [FLINK-22488][hotfix] Update SubtaskGatewayImpl to specify the cause of sendEvent() failure when triggering task failover

    [FLINK-22488][hotfix] Update SubtaskGatewayImpl to specify the cause of sendEvent() failure when triggering task failover

    Contribution Checklist

    This PR updated SubtaskGatewayImpl to specify the cause of sendEvent() failure when triggering task failover. This change is useful to understand why the event could not be sent.

    Brief change log

    This PR updated SubtaskGatewayImpl to specify the cause of sendEvent() failure when triggering task failover.

    Verifying this change

    I manually updated SubtaskGatewayImpl::sendEvent(...) to set failure = new FlinkException("some error") and then run the test KafkaSourceLegacyITCase#testOneToOneSources. The test fails with the following stacktrace in the log:

    ...
    Caused by: org.apache.flink.util.FlinkException: An OperatorEvent from an OperatorCoordinator to a task was lost. Triggering task failover to ensure consistency. Event: 'AddSplitEvents[[[[email protected]]]', targetTask: Source: KafkaSource -> Map -> Map (1/5) - execution #1
    	... 26 more
    Caused by: org.apache.flink.util.FlinkException: some error
    	at org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:75)
    	... 25 more
    

    Does this pull request potentially affect one of the following parts:

    • Dependencies (does it add or upgrade a dependency): (no)
    • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
    • The serializers: (no)
    • The runtime per-record code paths (performance sensitive): (no)
    • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
    • The S3 file system connector: (no)

    Documentation

    • Does this pull request introduce a new feature? (no)
    • If yes, how is the feature documented? (not applicable)
    component= review=description? 
    opened by lindong28 4
  • [FLINK-22456][runtime] Support InitializeOnMaster and FinalizeOnMaster to be used in InputFormat

    [FLINK-22456][runtime] Support InitializeOnMaster and FinalizeOnMaster to be used in InputFormat

    What is the purpose of the change

    Support InitializeOnMaster and FinalizeOnMaster to be used in InputFormat

    Brief change log

    • InputOutputFormatVertexcalls initializeGlobal() when inputFormat instanceof InitializeOnMaster
    • InputOutputFormatVertexcalls finalizeGlobal() when inputFormat instanceof FinalizeOnMaster

    Verifying this change

    This change added tests and can be verified as follows:

    • Extended test that validates the logic of InputOutputFormatVertex.

    Does this pull request potentially affect one of the following parts:

    • Dependencies (does it add or upgrade a dependency): (no)
    • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
    • The serializers: (no)
    • The runtime per-record code paths (performance sensitive): (no)
    • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
    • The S3 file system connector: (no)

    Documentation

    • Does this pull request introduce a new feature? (no)
    • If yes, how is the feature documented? (not applicable)
    component=Runtime/Task review=description? 
    opened by kanata163 2
  • [FLINK-22479][Kinesis][Consumer] Potential lock-up under error condition (Flink 1.14)

    [FLINK-22479][Kinesis][Consumer] Potential lock-up under error condition (Flink 1.14)

    What is the purpose of the change

    Pull in the following fixes from AWS fork:

    • Fix issue where KinesisDataFetcher.shutdownFetcher() hangs (issue, pull request)

    • Log error when shutting down Kinesis Data Fetcher (issue, pull request)

    • Treating TimeoutException as Recoverable Exception (issue, pull request)

    • Add time-out for acquiring subscription and passing events from network to source thread to prevent deadlock (pull request)

    Brief change log

    • Added Timeout when blocking on queue between source and network thread
    • Catching exceptions during source teardown, to allow source to interrupt threads of executor

    Verifying this change

    • Added unit tests for changes
    • Verified on AWS KDA with test applications

    Does this pull request potentially affect one of the following parts:

    • Dependencies (does it add or upgrade a dependency): no
    • The public API, i.e., is any changed class annotated with @Public(Evolving): no
    • The serializers: no
    • The runtime per-record code paths (performance sensitive): yes
    • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
    • The S3 file system connector: no

    Documentation

    • Does this pull request introduce a new feature? no
    • If yes, how is the feature documented? not applicable
    component=Connectors/Kinesis review=description? 
    opened by dannycranmer 2
  • [FLINK-22479][Kinesis][Consumer] Potential lock-up under error condition (Flink 1.12)

    [FLINK-22479][Kinesis][Consumer] Potential lock-up under error condition (Flink 1.12)

    What is the purpose of the change

    Pull in the following fixes from AWS fork:

    • Fix issue where KinesisDataFetcher.shutdownFetcher() hangs (issue, pull request)

    • Log error when shutting down Kinesis Data Fetcher (issue, pull request)

    • Treating TimeoutException as Recoverable Exception (issue, pull request)

    • Add time-out for acquiring subscription and passing events from network to source thread to prevent deadlock (pull request)

    Brief change log

    • Added Timeout when blocking on queue between source and network thread
    • Catching exceptions during source teardown, to allow source to interrupt threads of executor

    Verifying this change

    • Added unit tests for changes
    • Verified on AWS KDA with test applications

    Does this pull request potentially affect one of the following parts:

    • Dependencies (does it add or upgrade a dependency): no
    • The public API, i.e., is any changed class annotated with @Public(Evolving): no
    • The serializers: no
    • The runtime per-record code paths (performance sensitive): yes
    • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
    • The S3 file system connector: no

    Documentation

    • Does this pull request introduce a new feature? no
    • If yes, how is the feature documented? not applicable
    component=Connectors/Kinesis review=description? 
    opened by dannycranmer 3
  • [FLINK-21131][webui] Alignment timeout displayed in checkpoint config…

    [FLINK-21131][webui] Alignment timeout displayed in checkpoint config…

    …uration(WebUI)

    What is the purpose of the change

    Show alignment timeout in checkpoint configuration (web UI)

    Brief change log

    • Alignment timeout is added in checkpoint configuration panel

    Verifying this change

    This change added tests and can be verified as follows:

    • Manual testing
    • RestAPI tests

    Does this pull request potentially affect one of the following parts:

    • Dependencies (does it add or upgrade a dependency): (yes / no)
    • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
    • The serializers: (yes / no / don't know)
    • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
    • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't know)
    • The S3 file system connector: (yes / no / don't know)

    Documentation

    • Does this pull request introduce a new feature? (yes / no)
    • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
    component=Runtime/Checkpointing component=Runtime/WebFrontend review=description? 
    opened by akalash 3
  • [FLINK-18934] Idle stream does not advance watermark in connected stream

    [FLINK-18934] Idle stream does not advance watermark in connected stream

    component=API/DataStream component=Runtime/Task review=description? 
    opened by dawidwys 2
Owner
The Apache Software Foundation
The Apache Software Foundation
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

Apache Gobblin Apache Gobblin is a highly scalable data management solution for structured and byte-oriented data in heterogeneous data ecosystems. Ca

The Apache Software Foundation 1.9k Mar 13, 2021
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Apache Zeppelin Documentation: User Guide Mailing Lists: User and Dev mailing list Continuous Integration: Contributing: Contribution Guide Issue Trac

The Apache Software Foundation 5.2k Mar 13, 2021
Mirror of Apache Storm

Master Branch: Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processi

The Apache Software Foundation 6.2k Mar 12, 2021
Apache Druid: a high performance real-time analytics database.

Website | Documentation | Developer Mailing List | User Mailing List | Slack | Twitter | Download Apache Druid Druid is a high performance real-time a

The Apache Software Foundation 10.6k Mar 13, 2021
Apache Hive

Apache Hive (TM) The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storag

The Apache Software Foundation 3.6k Mar 13, 2021
:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop

Elasticsearch Hadoop Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Apache Hive, Apache Pig, Apach

elastic 1.8k Mar 12, 2021
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more

IMPORTANT NOTE!!! Storm has Moved to Apache. The official Storm git repository is now hosted by Apache, and is mirrored on github here: https://github

Nathan Marz 9k Mar 11, 2021
Hadoop library for large-scale data processing, now an Apache Incubator project

Apache DataFu Follow @apachedatafu Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by

LinkedIn's Attic 588 Feb 16, 2021
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

Heron is a realtime analytics platform developed by Twitter. It has a wide array of architectural improvements over it's predecessor. Heron in Apache

The Apache Software Foundation 3.5k Mar 10, 2021
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

Elephant Bird About Elephant Bird is Twitter's open source library of LZO, Thrift, and/or Protocol Buffer-related Hadoop InputFormats, OutputFormats,

Twitter 1.1k Jan 25, 2021
SAMOA (Scalable Advanced Massive Online Analysis) is an open-source platform for mining big data streams.

SAMOA: Scalable Advanced Massive Online Analysis. This repository is discontinued. The development of SAMOA has moved over to the Apache Software Foun

Yahoo Archive 426 Feb 4, 2021
This code base is retained for historical interest only, please visit Apache Incubator Repo for latest one

Apache Kylin Apache Kylin is an open source Distributed Analytics Engine to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supp

Kylin OLAP Engine 557 Mar 10, 2021
Real-time Query for Hadoop; mirror of Apache Impala

Welcome to Impala Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters. Impala is a modern, massively-distri

Cloudera 5 May 6, 2021
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Oryx Project 1.7k Mar 12, 2021