Java Collections and Streams
The Java Stream API provides a more functional programming approach to iterating and processing elements of e.g. a collection. The Java Stream API was added to Java in Java 8. This tutorial is only intended to explain how to use the Java Stream API in the context of the Java Collection API. For a more in-depth explanation of the Java Stream API, see my Java Stream API Tutorial.
Streams are designed to work with Java lambda expressions. Many of the examples in this text will use lambda expressions, so if you don’t already know them, you should read up on them before reading this text.
Obtaining a Stream From a Collection
You obtain a stream from a collection by calling the stream() method of the given collection. Here is an example of obtaining a stream from a collection:
List items = new ArrayList(); items.add("one"); items.add("two"); items.add("three"); Stream stream = items.stream();
First a List of strings is created and three strings are added to it. Then a Stream of strings is obtained by calling the items.stream() method. This is similar to how you obtain an Iterator by calling the items.iterator() method, but a Stream is a different animal than an Iterator .
Stream Processing Phases
Once you have obtained a Stream instance from a Collection instance, you use that stream to process the elements in the collection.
Processing the elements in the stream happens in two steps / phases:
First the stream is configured. The configuration can consist of filters and mappings. These are also referred to as non-terminal operations.
Second, the stream is processed. The processing consists of doing something to the filtered and mapped objects. No processing takes place during the configuring calls. Not until a processing method is called on the stream. The stream processing methods are also referred to as terminal operations.
Stream.filter()
You filter a stream using the filter() method. Here is a stream filtering example:
stream.filter( item -> item.startsWith("o") );
The filter() method takes a Predicate as parameter. The Predicate interface contains a function called test() which the lambda expression passed as parameter above is matched against. In other words, the lambda expression implements the Predicate.test() method.
The test() method is defined like this:
It takes a single parameter and returns a boolean . If you look at the lambda expression above, you can see that it takes a single parameter item and returns a boolean — the result of the item.startsWith(«o») method call.
When you call the filter() method on a Stream , the filter passed as parameter to the filter() method is stored internally. No filtering takes place yet.
The parameter passed to the filter() function determines what items in the stream should be processed, and which that should be excluded from the processing. If the Predicate.test() method of the parameter passed to filter() returns true for an item, that means it should be processed. If false is returned, the item is not processed.
Stream.map()
It is possible to map the items in a collection to other objects. In other words, for each item in the collection you create a new object based on that item. How the mapping is done is up to you. Here is a simple Java stream mapping example:
items.stream() .map( item -> item.toUpperCase() )
This example maps all strings in the items collection to their uppercase equivalents.
Again, this example doesn’t actually perform the mapping. It only configures the stream for mapping. Once one of the stream processing methods are invoked, the mapping (and filtering) will be performed.
Stream.collect()
The collect() method is one of the stream processing methods on the Stream interface. When this method is invoked, the filtering and mapping will take place and the object resulting from those actions will be collected. Here is a stream.collect() example:
List filtered = items.stream() .filter( item -> item.startsWith("o") ) .collect(Collectors.toList());
This example creates a stream, adds a filter, and collects all object accepted by the filter in a List . The filter only accepts items (strings) which start with the character o . The resulting List thus contains all strings from the items collection which starts with the character o .
Stream.min() and Stream.max()
The min() and max() methods are stream processing methods. Once these are called, the stream will be iterated, filtering and mapping applied, and the minimum or maximum value in the stream will be returned.
Here is a Java Stream.min() example:
String shortest = items.stream() .min(Comparator.comparing(item -> item.length())) .get();
The min() and max() methods return an Optional instance which has a get() method on, which you use to obtain the value. In case the stream has no elements the get() method will return null.
The min() and max() methods take a Comparator as parameter. The Comparator.comparing() method creates a Comparator based on the lambda expression passed to it. In fact, the comparing() method takes a Function which is a functional interface suited for lambda expressions. It takes one parameter and returns a value.
Stream.count()
The count() method simply returns the number of elements in the stream after filtering has been applied. Here is an example:
long count = items.stream() .filter( item -> item.startsWith("t")) .count();
This example iterates the stream and keeps all elements that start with the character t , and then counts these elements.
The count() method returns a long which is the count of elements in the stream after filtering etc.
Stream.reduce()
The reduce() method can reduce the elements of a stream to a single value. Here is an example:
String reduced2 = items.stream() .reduce((acc, item) -> acc + " " + item) .get();
The reduce() method takes a BinaryOperator as parameter, which can easily be implemented using a lambda expression. The BinaryOperator.apply() method is the method implemented by the lambda expression above. This method takes two parameters. The acc which is the accumulated value, and item which is an element from the stream. Thus, the value created by the reduce() function is the accumulated value after processing the last element in the stream. In the example above, each item is concatenated to the accumulated value. This is done by the lambda expression implementing the BinaryOperator .
The reduce() method taking a BinaryOperator as parameter returns an Optional . In case the stream contains no elements, the Optional.get() returns null. Otherwise it returns the reduced value.
There is another reduce() method which takes two parameters. It takes an initial value for the accumulated value, and then a BinaryOperator . Here is an example:
String reduced = items.stream() .reduce("", (acc, item) -> acc + " " + item);
This example takes an empty string as initial value, and then the same lambda expression as the previous example. This version of the reduce() method returns the accumulated value directly, and not an Optional . If the stream contains no elements, the initial value will be returned.
The reduce() method can be combined with the filter() method too. Here is an example:
String reduced = items.stream() .filter( item -> item.startsWith("o")) .reduce("", (acc, item) -> acc + " " + item);
This example keeps all elements that start with the character o , and then reduce these elements into a single value.
Interface Stream
A sequence of elements supporting sequential and parallel aggregate operations. The following example illustrates an aggregate operation using Stream and IntStream :
int sum = widgets.stream() .filter(w -> w.getColor() == RED) .mapToInt(w -> w.getWeight()) .sum();
In this example, widgets is a Collection
In addition to Stream , which is a stream of object references, there are primitive specializations for IntStream , LongStream , and DoubleStream , all of which are referred to as «streams» and conform to the characteristics and restrictions described here.
To perform a computation, stream operations are composed into a stream pipeline. A stream pipeline consists of a source (which might be an array, a collection, a generator function, an I/O channel, etc), zero or more intermediate operations (which transform a stream into another stream, such as filter(Predicate) ), and a terminal operation (which produces a result or side-effect, such as count() or forEach(Consumer) ). Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed.
A stream implementation is permitted significant latitude in optimizing the computation of the result. For example, a stream implementation is free to elide operations (or entire stages) from a stream pipeline — and therefore elide invocation of behavioral parameters — if it can prove that it would not affect the result of the computation. This means that side-effects of behavioral parameters may not always be executed and should not be relied upon, unless otherwise specified (such as by the terminal operations forEach and forEachOrdered ). (For a specific example of such an optimization, see the API note documented on the count() operation. For more detail, see the side-effects section of the stream package documentation.)
Collections and streams, while bearing some superficial similarities, have different goals. Collections are primarily concerned with the efficient management of, and access to, their elements. By contrast, streams do not provide a means to directly access or manipulate their elements, and are instead concerned with declaratively describing their source and the computational operations which will be performed in aggregate on that source. However, if the provided stream operations do not offer the desired functionality, the BaseStream.iterator() and BaseStream.spliterator() operations can be used to perform a controlled traversal.
A stream pipeline, like the «widgets» example above, can be viewed as a query on the stream source. Unless the source was explicitly designed for concurrent modification (such as a ConcurrentHashMap ), unpredictable or erroneous behavior may result from modifying the stream source while it is being queried.
- must be non-interfering (they do not modify the stream source); and
- in most cases must be stateless (their result should not depend on any state that might change during execution of the stream pipeline).
Such parameters are always instances of a functional interface such as Function , and are often lambda expressions or method references. Unless otherwise specified these parameters must be non-null.
A stream should be operated on (invoking an intermediate or terminal stream operation) only once. This rules out, for example, «forked» streams, where the same source feeds two or more pipelines, or multiple traversals of the same stream. A stream implementation may throw IllegalStateException if it detects that the stream is being reused. However, since some stream operations may return their receiver rather than a new stream object, it may not be possible to detect reuse in all cases.
Streams have a BaseStream.close() method and implement AutoCloseable . Operating on a stream after it has been closed will throw IllegalStateException . Most stream instances do not actually need to be closed after use, as they are backed by collections, arrays, or generating functions, which require no special resource management. Generally, only streams whose source is an IO channel, such as those returned by Files.lines(Path) , will require closing. If a stream does require closing, it must be opened as a resource within a try-with-resources statement or similar control structure to ensure that it is closed promptly after its operations have completed.
Stream pipelines may execute either sequentially or in parallel. This execution mode is a property of the stream. Streams are created with an initial choice of sequential or parallel execution. (For example, Collection.stream() creates a sequential stream, and Collection.parallelStream() creates a parallel one.) This choice of execution mode may be modified by the BaseStream.sequential() or BaseStream.parallel() methods, and may be queried with the BaseStream.isParallel() method.