Stream API

The course is part of this learning path


This training course provides you with a deep dive into the Java Stream API. The Java Stream API is a functional stream processing API and is used to define the logic of a task in a declarative way.

Learning Objectives

What you'll learn:

  • What the Stream API provides and when and why you should use it
  • How to use the Stream API to process elements within a collection
  • How to filter and find elements within a Stream
  • How to group and gather statistics on elements within a Stream


  • A basic understanding of the Java programming language
  • A basic understanding of software development
  • A basic understanding of the software development life cycle

Intended Audience

  • Software Engineers interested in advancing their Java skills
  • Software Architects interested in using advanced features of Java to design and build both applications and frameworks
  • Anyone interested in advanced Java application development and associated tooling
  • Anyone interested in understanding the advanced areas and features of the Java SDK

Okay, welcome back. In this lecture, we'll explore the concept of Streams and how to work with them within Java. In particular, we'll review the following topics, understanding the problem with collections in Java. Thinking of program solutions in a declarative way, as in what should the program do, not how it should be done.

Using the Stream API to process elements within collections. Understanding the difference between intermediate and terminal stream operations. Filtering elements from a Stream. Finding elements within a Stream. And collecting the elements from a Stream into a List. So to begin with, when developing java applications, sooner or later you'll be using collections of data on which operations need to be performed, or to which data needs to be added.

When we think of collections of data, we often think of databases and a SQL query language, that allows us to perform operations on a set of data. Within other tiers on an application, developers use collections of data most of the time using one of the standard collection implementations that come with the JDK, for example java.util.List.

Whether you look at collections of data within the context of a database, or as instances of Java objects, we are often interested in performing operations on this data. When using SQL, you can select the rows of a table that meet certain criteria.

For example, only Flights to New York that leave after 2 PM, or you can select the row that has the earliest departure time. Even things like grouping rows of data by airline, destination, or time of day, are pretty straightforward when using SQL. Best of all, only the rows you select will be placed in memory and the remaining data will remain on the file-system.

When using the Collections API, a declarative query language like SQL does not exist. To get the earliest flight to New York, you as a developer have to obtain an iterator from the collection. Iterate through the list one item at a time, inspect the destination of the flight and when the destination is New York, you have to check the departure time.

Compare this now to the departure time of a flight you found earlier in the collection, and store the current object reference instead of the previous one, when you found a flight that leaves earlier. To make matters even worse, when browsing through collections in Java, developers often rely on extra variables and temporary collections to store information while iterating over the initial set of data.

Just take a moment and think of all the times you've had to write similar boilerplate code similar to this. When you are developing enterprise applications, specially when developing web applications, do you really want the customer to wait for the application to go through the entire collection of data before they are finally presented with a result?

You could write a multi-threaded implementation and make sure that every entry in the collection that meets the criteria, is immediately handled by a separate thread to be displayed to the client, while the remaining part of the collection is iterated over. But, that would mean that you as a developer would have to deal with the concurrency issues that are involved.

Declarative programming is a programming paradigm in which the focus lies on describing the structure and elements of an application in terms of the logic that needs to be implemented for the problem domain, without describing the control flow. In other words, describe what the application should do, instead of describing how it should be done.

We just described the logic needed to find the earliest flight to New York. Here we are looking for a flight to Los Angeles. In both cases, the task that is to be performed is to find the earliest flight to a certain destination. The task remains the same, however, the steps described here differ.

Now look at the SQL statement, that is a declarative way of defining what you want. The database will figure out how to come up with the answer, and will probably do so a lot faster when comparing to the external iteration approach previously described. The Stream API, introduced in Java 8, provides developers with the building blocks to define the logic of a task in a declarative way.

Utilizing lambda expressions and functional interfaces, these building-blocks can be configured in a chained manner to describe the operation that needs to take place on the collection of data. The building blocks and expressions used to describe the operations, will soon be explained in the following slides. But you should notice here that each block describes what needs to be done.

How the collections of data is processed, is defined within the building block. Not only does this mean that the iteration logic is now done internally and no longer done by the developer, it also means that the implementations can take full advantage of the multi-core architectures on which applications are running these days.

As we will see throughout this lesson, the Streams API allows us to manipulate collections of data using pre-defined building blocks and lambda expressions. Even though a Stream can be created using a variety of sources, the examples used in this lesson will use streams that use an implementation of the Collection interface as the source of the data.

In order to provide support for streams, the stream method was added to the Collection interface. To introduce the various possibilities of using streams on collections of data, we will be using a Collection of Flight objects, each one resembling a flight departing from Bostons Logan International Airport. The collection contains all flights that leave this airport.

Each instance containing information about the destination, the airline, the flight number, and of course the departure time. Over 400,000 flights leave this airport each year. Even though this might be a small set of data compared to the amount of data you might be working with on a daily basis, you would not want to iterate over this collection, over and over again the find the flight that you are interested in.

When looking at the abstract methods defined by the Stream interface, you can categorize the methods into two groups. Those methods that return an instance of Stream, and those methods that return non-stream instances. The methods that return instances of Stream are referred to as intermediate operations. These methods can perform operations on a Stream and express the outcome in the form of another stream.

The other methods are called terminal operations. As we will see later, they start the operations on a Stream and collect the outcome of the stream into for example, a List. Intermediate operations return a new instance of Stream.

As a result, these types of operations can be chained together to define multiple operations that need to be performed on the input stream. By doing so, you are creating a pipeline through which the elements in the initial Stream will travel.

You should remember that invoking intermediate operations on a Stream does not produce an immediate result. Intermediate operations are processed once a terminal operation is invoked on the Stream. As a result, the implementation is capable of merging operations on the Stream into a single iteration of the elements within the Stream.

Every pipeline that is defined on a Stream should contain at least one terminal operation. Operations defined by intermediate operations are not executed until the terminal operation is invoked. Besides invoking the intermediate operations and collecting the result of these operations, the terminal operation also closes the Stream once the operations have been completed.

As a result, a Stream instance can only be used once. An attempt to invoke another terminal operation on a closed Stream will result an IllegalStateException. As explained before, once the intermediate operations on a Stream have been performed, a terminal operation will execute the operations in the pipeline and assemble a result.

One terminal operation that will be often seen, is the Streams collect method. This method takes a Collector instance as parameter. An instance of the Collector interface is responsible for accumulating the elements in a stream and placing these in a non-stream result type.

The Collectors class contains several factory methods for obtaining instances of the Collector interface. For the moment, we will use the Collectors.toList method to obtain a collector which accumulates the elements within the Stream into a List object. Since the factory methods in the Collectors class are defined as static, examples in this lesson will make use of a static import.

Filtering. The intermediate filter method takes a Predicate as a parameter. As a result, all elements that to do not comply to the criteria defined by the predicate, will be skipped. In the example shown above, the predicate uses a method reference to the isInternational method of the Flight object. As a result only those flight instances, for which this method return's true, will pass through the filter.

Truncating or skipping over elements. When you are only interested in a limited amount of elements from the stream, the intermediate limit operation can be used to restrict the amount of items in the pipeline. As the name already implies, the skip operation can be used to skip a certain amount of elements within the stream.

Mapping. Often a Stream will contain elements, that contain a large set of information, however, we're probably only interested in a subset of the properties within these elements. This is similar to selecting only a few columns of a database table. By using the map method, the elements in the input Stream are mapped to a different type. In the example shown above, we are only interested in the destination property of the Flight object.

So the map method will map the value, which is of type String, of the property to a new Stream. So the result of the map method will now no longer be a Stream of Flight objects, but instead a Stream of String objects.

Filtering Using Distinct. We have seen that the filter method can be used to filter the contents of a Stream. A different kind of filter is the distinct method. By adding this intermediate operation to a pipeline, duplicates will be filtered out from the resulting Stream. The Stream TakeWhile Method. The new takeWhile Method processes elements on the Stream until one fails the Predicate that was defined by it's parameter. 

The filter method on the other hand, only rejects elements that do not comply to the Predicate, but does not terminate the Stream. The Stream dropWhile Method. The opposite of the takeWhile method is the dropWhile method. When the elements are processed the Stream will reject all elements in the Stream until one matches the Predicate.

The peek method is a non-terminal operation that takes a java.util.function.Consumer as parameter. The Consumer will get called for each element in the stream. The peek method returns a new Stream which contains all the elements in the original stream. The purpose of the peek method is named, to peek at the elements in the stream, not to transform them. Keep in mind that the peek method does not start the internal iteration of the elements in the stream. You need to call a terminal operation for that. The peek method is extremely useful to debug and examine streams to see their current state, since they allow you to observe without interfering with the stream.

The iterate method allows the creation of a sequential ordered Stream. While the first parameter allows the definition of the first element, all other elements are created by applying the UnaryOperator on the previous element in the Stream. To limit the amount of elements that are represented by this Stream, the limit method could be used. Java 9 introduced an overloaded method of the iterate method, which takes a Predicate to define when the Stream must terminate.

The UnaryOperator will be applied to create the next element, followed by the Predicate. When the new element does not match the Predicate, it is not added to the Stream and the Stream is terminated. Matching Criteria. Until now we have been manipulating the contents of a stream.

The Stream API also contains methods for checking the contents of a stream to see whether elements in the stream meet a criteria. The allMatch, anyMatch, and noneMatch methods, return a Boolean. But more importantly, they are also short-circuiting operations. This means that the stream does not have to be processed completely before these methods can return.

Just like the logical and, and or, once part of the expression has been evaluated and does not meet the criteria, the remaining part of the expression is not evaluated. Finding Elements. Instead of matching to check if the stream contains the element you are looking for, the findFirst and findAny methods return a single element from the stream. However, these methods return an instance of Optional.

When the stream does not contain an element, an empty instance of Optional is returned. When the stream does contain an element, this element will be wrapped within the Optional object. The Optional class will be covered in more detail later, but for the moment you should remember that Optional is used as an alternative for returning a null pointer.

Optional acts as a container for objects and might, or might not, contain a value. The findFirst method of the Stream API returns an instance of Optional. So, instead of checking for null values to be returned when the stream does not contain elements, you use the methods on the Optional class to check if it contains a value.

When implementing a method which returns a Stream, you might decide to return an empty Stream when no elements can be found for the given criteria. In the examples shown here, we want to retrieve a list of passengers for a given flightnumber. When the flight cannot be found, an empty Stream is being returned.

In the examples shown here, we want to retrieve a list of passengers for a given flightnumber. When the flight cannot be found, an empty Stream is returned. The second code example shows the use of the Optional class. When the Optional holds a reference to a flight, the map method is executed, returning the Stream of Passengers. Otherwise, the orElse method returns an empty Stream.

Java 9 introduced the ofNullable method. When the reference provided to this method is null, it will return an empty Stream, otherwise, it will create a Stream containing just one reference. In the example shown above, each reference in the Stream, one at most, will be supplied to the Function of the flatMap method.

Stream Sources. As mentioned at the beginning of the lecture, streams can be obtained for a variety of sources. Until now we only looked at Collection implementations as source of the Stream, but you can also create a stream based upon an array, or zero or more object references. Another option is obtaining a stream from a File.

In Java 8 among other changes, the line method has been added to the Files class. For example, another option is obtaining a Stream from a file. In Java 8 among other changes, the line method has been added to the Files class. Streams can be created where each line in a text-file becomes an element in a Stream.

Numeric Streams. Until now, all streams that we have used deal with objects, so when the information in the stream is made up of numeric values, these values are represented by their Object representation, integer, long, double. As a result, each primitive type that is obtained from an object needs to be boxed before it can be put into the Stream, and unboxed when obtained from the stream.

The Stream API defines specialized streams for dealing with these primitive types. Besides the fact that these implementations do not require the values to be boxed, they also provide some additional methods specialized for working with numeric primitive types. Methods like sum and average, only make sense when working with numeric types.

Just like with streams that deal with objects, numeric streams can be created using static methods from the interface, so a stream of double value can be constructed by using the of method. In addition, instances of IntStream and LongStream can also be constructed using the static range and rangeClosed methods.

By supplying the start and end values of the range, an instance of the stream is created containing all the numeric values within that range. The forEach Operation. When working with Streams, you might want to display the content of a Stream. You might, as an example, be tempted to create a List, and write a for-loop to print out every element using the System.out.println method. But this is exactly what we are trying to avoid by using streams.

With streams, we are able to avoid implementing external iteration over collections. So for this requirement, we can let the stream building blocks take care of it internally by using the forEach method, which is used to consume every element in a stream.

Okay before we complete this lecture, pause this video, and consider the following questions to test yourself on the content that we have just reviewed. Write down your answers and then resume the video to compare.

Okay the answers to the above questions are, One, no the standard collections classes are not thread safe, however thread-safe versions can be created by using methods from the Collections class. Two. Thinking of the program logic in terms of what needs to be done, instead of how it needs to be implemented. Three. No, collections of data are a source of data that can be processed using the Stream API. Four. Intermediate operations return a stream and are not executed immediately. Terminal operations return non-stream values and cause the intermediate operations in the pipeline to be executed. Five. Stream, map, distinct, and collect. Six. Collections, Files, and simple values. Seven. Find or findAny.

About the Author
Learning Paths

Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.

He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, Azure, GCP), Security, Kubernetes, and Machine Learning.

Jeremy holds professional certifications for AWS, Azure, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).

Covered Topics