The Scanner API, initially introduced in JDK 5, and having received updates in subsequent JDK releases provides an easy and accessible way to parse and tokenize an input source.
Scanner is able to parse many different input sources including:
* String
* File
* InputStream
* Readable
* ReadableByteChannel
* Path
A regex delimiter can be provided to Scanner as a String
or as a Pattern
.
The parsed data can be processed as a Stream<String>
with tokens()
like in the example below:
String advocates = """
Billy,Korando,NA,2600
David,Delabassée,EMEA,8522
Denys,Makogon,EMEA,233
José,Paumard,EMEA,6131
Nicolai,Parlog,EMEA,12400
""";
try (Scanner scanner = new Scanner(advocates)
.useDelimiter("[,\n]+")) {
Stream<String> stream = scanner.tokens();
stream.forEach(System.out::println);
}
try (Scanner scanner = new Scanner(advocates)
.useDelimiter(Pattern.compile("[,\n]+"))) {
Stream<String> stream = scanner.tokens();
stream.forEach(System.out::println);
}
The data can also be processed in a loop using hasNext()
and next()
:
String advocates = """
Billy,Korando,NA,2600
David,Delabassée,EMEA,8522
Denys,Makogon,EMEA,233
José,Paumard,EMEA,6131
Nicolai,Parlog,EMEA,12400
""";
try (Scanner scanner = new Scanner(advocates)
.useDelimiter("[,\n]+")) {
while (scanner.hasNext()) {
System.out.println(scanner.next());
}
}
Scanner has several different versions of hasNext()
and next()
implemented which can detect and convert the following types:
* BigDecimal
* BigInterger
* Boolean
* Byte
* Double
* Float
* Int
* Long
* Short
So in this example the final value in the line is converted to an Int type and can by doubled:
String advocates = """
Billy,Korando,NA,2600
David,Delabassée,EMEA,8522
Denys,Makogon,EMEA,233
José,Paumard,EMEA,6131
Nicolai,Parlog,EMEA,12400
""";
try (Scanner scanner = new Scanner(advocates)
.useDelimiter("[,\n]+")) {
while (scanner.hasNext()) {
if (scanner.hasNextInt()) {
System.out.println(
scanner.nextInt() * 2);
} else {
System.out.println(scanner.next());
}
}
}
Billy Korando NA 5200
David Delabassée EMEA 17044
Denys Makogon EMEA 466
José Paumard EMEA 12262
Nicolai Parlog EMEA 24800
Scanner also allows for easy localization of data by passing in a Locale
to useLocale()
. In the below example the final value in the line is treated as a String with the Locale.US
, as “.” is only used as a decimal point in the US, but treated as a Integer with the Locale.Germany
because “.” is used to denote thousands/between every three integers in a number:
String advocates = """
Billy,Korando,NA,2.600
David,Delabassée,EMEA,8.522
Denys,Makogon,EMEA,233
José,Paumard,EMEA,6.131
Nicolai,Parlog,EMEA,12.400
""";
System.out.println("US Locale\n");
try (Scanner scanner = new Scanner(advocates)
.useDelimiter("[,\n]+").useLocale(Locale.US)) {
int i = 0;
while (scanner.hasNext()) {
i++;
if (scanner.hasNextInt()) {
// Will always be false because "."
// is only used as a decimal in US
System.out.print(scanner.nextInt() * 2);
} else {
System.out.print(scanner.next() + " ");
}
if (i % 4 == 0) {
System.out.print("\n");
}
}
}
System.out.println("\nGermany Locale\n");
try (Scanner scanner = new Scanner(advocates)
.useDelimiter("[,\n]+").useLocale(Locale.GERMANY)) {
int i = 0;
while (scanner.hasNext()) {
i++;
if (scanner.hasNextInt()) {
// Will return true for last item,
// as "." is used like "," in Germany
System.out.print(scanner.nextInt() * 2);
} else {
System.out.print(scanner.next() + " ");
}
if (i % 4 == 0) {
System.out.print("\n");
}
}
}
US Locale
Billy Korando NA 2.600
David Delabassée EMEA 8.522
Denys Makogon EMEA 466
José Paumard EMEA 6.131
Nicolai Parlog EMEA 12.400
Germany Locale
Billy Korando NA 5200
David Delabassée EMEA 17044
Denys Makogon EMEA 466
José Paumard EMEA 12262
Nicolai Parlog EMEA 24800
Happy Coding!