Java 8 регулярные выражения

Следующая статья — «Java 8 коллекции».
Предыдущая статья — «Java 8 среда платформы».

Регулярные выражения используются для поиска и редактирования текста.

В Java классы, связанные с регулярными выражениями, находятся в пакете java.util.regex. Три самых основных класса:

java.util.regex.Pattern — скомпилированное представление регулярного выражения. Не имеет публичных конструкторов, для создания нужно использовать один из его фабричных методов compile.
java.util.regex.Matcher — интерпретирует шаблон регулярного выражения и осуществляет сравнение с исходной строкой. У него нет публичных конструкторов, для создания нужно использовать метод matcher класса java.util.regex.Pattern.
java.util.regex.PatternSyntaxException — непроверяемое исключение, которое возникает при наличии синтаксической ошибки в регулярном выражении.

Ниже представлен код тестовой программы RegexTestHarness, которая предназначена для изучения конструкций регулярных выражений, поддерживаемых Java. Код запускается командой java RegexTestHarness без аргументов командной строки. Приложение работает в цикле, запрашивая регулярное выражение и входную строку:

import java.io.Console;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class RegexTestHarness {

    public static void main(String[] args){
        Console console = System.console();
        if (console == null) {
            System.err.println("No console.");
            System.exit(1);
        }
        while (true) {

            Pattern pattern = 
            Pattern.compile(console.readLine("%nEnter your regex: "));

            Matcher matcher = 
            pattern.matcher(console.readLine("Enter input string to search: "));

            boolean found = false;
            while (matcher.find()) {
                console.format("I found the text" +
                    " \"%s\" starting at " +
                    "index %d and ending at index %d.%n",
                    matcher.group(),
                    matcher.start(),
                    matcher.end());
                found = true;
            }
            if(!found){
                console.format("No match found.%n");
            }
        }
    }
}

import java.io.Console;

import java.util.regex.Pattern;

import java.util.regex.Matcher;

public class RegexTestHarness {

public static void main(String[] args){

Console console = System.console();

if (console == null) {

System.err.println("No console.");

System.exit(1);

}

while (true) {

Pattern pattern =

Pattern.compile(console.readLine("%nEnter your regex: "));

Matcher matcher =

pattern.matcher(console.readLine("Enter input string to search: "));

boolean found = false;

while (matcher.find()) {

console.format("I found the text" +

" \"%s\" starting at " +

"index %d and ending at index %d.%n",

matcher.group(),

matcher.start(),

matcher.end());

found = true;

}

if(!found){

console.format("No match found.%n");

}

Строковые литералы

Наиболее базовая форма сравнение с шаблоном, поддерживаемая этим API, — сопоставление со строковым литералом. Например, если регулярное выражение foo и входная строка foo, то сопоставление будет успешным, так как строки идентичны. Попробуйте это с RegexTestHarness:

Enter your regex: foo
Enter input string to search: foo
I found the text foo starting at index 0 and ending at index 3.

Enter your regex: foo

Enter input string to search: foo

I found the text foo starting at index 0 and ending at index 3.

Это сопоставление было успешным. Обратите внимание, что входная строка была длиной три символа, но стартовый индекс был 0, а конечный индекс был 3. По соглашению для диапазонов начальный индекс включается в диапазон, а конечный не включается. Подробнее это было объяснено в статье «Java 8 строки».

Этот API также поддерживает специальные символы, которые влияют на способ сопоставления с шаблоном. Если поменять регулярное выражение на cat., а входную строку на cats, то выход будет таким:

Enter your regex: cat.
Enter input string to search: cats
I found the text cats starting at index 0 and ending at index 4.

Enter your regex: cat.

Enter input string to search: cats

I found the text cats starting at index 0 and ending at index 4.

Сопоставление всё равно было успешно, не смотря на то что во входной строке нет символа точки. Это произошло, потому что символ точки является метасимволом, то есть символом со специальным значением. Метасимвол . означает «любой символ», поэтому сопоставление прошло успешно.

Метасимволы, поддерживаемые java.util.regex: <([{\^-=$!|]})?*+.>

Заметка: В некоторых ситуациях специальные символы, перечисленные выше, не расцениваются как метасимволы. Вы столкнётесь с этим при дальнейшем изучении регулярных выражений. Однако сейчас вы можете использовать этот список для проверки, не является ли какой-нибудь символ метасимволом. Например, символы «@» и «#» никогда не имеют специального значения.

Есть два способа сделать так, чтобы метасимвол расценивался как обычный символ:

Поставить перед метасимволом символ обратной косой черты «\».
Заключить один или несколько метасимволов в «\Q» (начало) и «\E» (конец).

Классы символов

Таблицы поддерживаемых символьных классов:

Конструкция	Описание
`[abc]`	a, b, или c (простой класс)
`[^abc]`	Любой символ, кроме a, b и c (отрицание)
`[a-zA-Z]`	от a до z, или от A до Z, включительно (диапазон)
`[a-d[m-p]]`	от a до d, или от m до p: [a-dm-p] (объединение)
`[a-z&&[def]]`	d, e, или f (пересечение)
`[a-z&&[^bc]]`	от a до z, исключая b и c: [ad-z] (вычитание)
`[a-z&&[^m-p]]`	от a до z, и не от m до p: [a-lq-z] (вычитание)

Термин «класс» в данном случае не связан с классом Java. Для регулярных выражений символьный класс — это множество символов, заключённых в прямоугольные скобки.

Наиболее простая форма символьного класса — это просто набор символов, заключённых в квадратные скобки. Например, регулярное выражение [bcr]at совпадёт со словом "bat" , "cat" , "rat" , потому что оно определяет символьный класс, принимающий любой из символов "b" , "c" , "r" , в качестве первого символа.

Enter your regex: [bcr]at
Enter input string to search: bat
I found the text "bat" starting at index 0 and ending at index 3.

Enter your regex: [bcr]at
Enter input string to search: cat
I found the text "cat" starting at index 0 and ending at index 3.

Enter your regex: [bcr]at
Enter input string to search: rat
I found the text "rat" starting at index 0 and ending at index 3.

Enter your regex: [bcr]at
Enter input string to search: hat
No match found.

Enter your regex: [bcr]at

Enter input string to search: bat

I found the text "bat" starting at index 0 and ending at index 3.

Enter your regex: [bcr]at

Enter input string to search: cat

I found the text "cat" starting at index 0 and ending at index 3.

Enter your regex: [bcr]at

Enter input string to search: rat

I found the text "rat" starting at index 0 and ending at index 3.

Enter your regex: [bcr]at

Enter input string to search: hat

No match found.

Чтобы подошли все символы, кроме перечисленных, используйте метасимвол «^» в начале символьного класса. Эта техника известна как отрицание:

Enter your regex: [^bcr]at
Enter input string to search: bat
No match found.

Enter your regex: [^bcr]at
Enter input string to search: cat
No match found.

Enter your regex: [^bcr]at
Enter input string to search: rat
No match found.

Enter your regex: [^bcr]at
Enter input string to search: hat
I found the text "hat" starting at index 0 and ending at index 3.

Enter your regex: [^bcr]at

Enter input string to search: bat

No match found.

Enter your regex: [^bcr]at

Enter input string to search: cat

No match found.

Enter your regex: [^bcr]at

Enter input string to search: rat

No match found.

Enter your regex: [^bcr]at

Enter input string to search: hat

I found the text "hat" starting at index 0 and ending at index 3.

В таком случае сопоставление будет успешно только в том случае, если первый символ входной строки НЕ содержит один из символов, указанных в символьном классе.

Иногда нужно определить символьный класс, который включает диапазон значений, например буквы от «a до h» или цифры от «1 до 5». Чтобы указать диапазон просто вставьте «-» между первым и последним символом, например [1-5] или [a-h] . Вы можете также использовать диапазоны внутри классов, чтобы ещё больше расширить список возможных символов. Например, [a-zA-Z] будет совпадать с любым символом английского алфавита: от a до z и от A до Z.

Enter your regex: [a-c]
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: [a-c]
Enter input string to search: b
I found the text "b" starting at index 0 and ending at index 1.

Enter your regex: [a-c]
Enter input string to search: c
I found the text "c" starting at index 0 and ending at index 1.

Enter your regex: [a-c]
Enter input string to search: d
No match found.

Enter your regex: foo[1-5]
Enter input string to search: foo1
I found the text "foo1" starting at index 0 and ending at index 4.

Enter your regex: foo[1-5]
Enter input string to search: foo5
I found the text "foo5" starting at index 0 and ending at index 4.

Enter your regex: foo[1-5]
Enter input string to search: foo6
No match found.

Enter your regex: foo[^1-5]
Enter input string to search: foo1
No match found.

Enter your regex: foo[^1-5]
Enter input string to search: foo6
I found the text "foo6" starting at index 0 and ending at index 4.

Enter your regex: [a-c]

Enter input string to search: a

I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: [a-c]

Enter input string to search: b

I found the text "b" starting at index 0 and ending at index 1.

Enter your regex: [a-c]

Enter input string to search: c

I found the text "c" starting at index 0 and ending at index 1.

Enter your regex: [a-c]

Enter input string to search: d

No match found.

Enter your regex: foo[1-5]

Enter input string to search: foo1

I found the text "foo1" starting at index 0 and ending at index 4.

Enter your regex: foo[1-5]

Enter input string to search: foo5

I found the text "foo5" starting at index 0 and ending at index 4.

Enter your regex: foo[1-5]

Enter input string to search: foo6

No match found.

Enter your regex: foo[^1-5]

Enter input string to search: foo1

No match found.

Enter your regex: foo[^1-5]

Enter input string to search: foo6

I found the text "foo6" starting at index 0 and ending at index 4.

Вы также можете использовать объединения, чтобы создать класс одного символа, который вклчает в себя два или более символьных класса. Чтобы создать объединение, просто вложите один класс в другой, например [0-4[6-8]]. Это объединение создаёт класс одного символа, который совпадает с цифрами 0, 1, 2, 3, 4, 6, 7, 8.

Enter your regex: [0-4[6-8]]
Enter input string to search: 0
I found the text "0" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]
Enter input string to search: 5
No match found.

Enter your regex: [0-4[6-8]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]
Enter input string to search: 8
I found the text "8" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]
Enter input string to search: 9
No match found.

Enter your regex: [0-4[6-8]]

Enter input string to search: 0

I found the text "0" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]

Enter input string to search: 5

No match found.

Enter your regex: [0-4[6-8]]

Enter input string to search: 6

I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]

Enter input string to search: 8

I found the text "8" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]

Enter input string to search: 9

No match found.

Чтобы создать символьный класс, который сопадает только с общими символами из вложенных классов, используйте && , например [0-9&&[345]] . Это пересечение создаёт символьный класс, который совпадает только с символами 3, 4 и 5.

Enter your regex: [0-9&&[345]]
Enter input string to search: 3
I found the text "3" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]
Enter input string to search: 4
I found the text "4" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]
Enter input string to search: 5
I found the text "5" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]
Enter input string to search: 2
No match found.

Enter your regex: [0-9&&[345]]
Enter input string to search: 6
No match found.

Enter your regex: [0-9&&[345]]

Enter input string to search: 3

I found the text "3" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]

Enter input string to search: 4

I found the text "4" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]

Enter input string to search: 5

I found the text "5" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]

Enter input string to search: 2

No match found.

Enter your regex: [0-9&&[345]]

Enter input string to search: 6

No match found.

Пример пересечения с двумя диапазонами:

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 3
No match found.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 4
I found the text "4" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 5
I found the text "5" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 7
No match found.

Enter your regex: [2-8&&[4-6]]

Enter input string to search: 3

No match found.

Enter your regex: [2-8&&[4-6]]

Enter input string to search: 4

I found the text "4" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]

Enter input string to search: 5

I found the text "5" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]

Enter input string to search: 6

I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]

Enter input string to search: 7

No match found.

Вы можете использовать вычитание, чтобы исключить символы из символьного класса. Например, [0-9&&[^345]] совпадает со всеми цифрами от 0 до 9, кроме цифр 3, 4 и 5.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 2
I found the text "2" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 3
No match found.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 4
No match found.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 5
No match found.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 9
I found the text "9" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[^345]]

Enter input string to search: 2

I found the text "2" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[^345]]

Enter input string to search: 3

No match found.

Enter your regex: [0-9&&[^345]]

Enter input string to search: 4

No match found.

Enter your regex: [0-9&&[^345]]

Enter input string to search: 5

No match found.

Enter your regex: [0-9&&[^345]]

Enter input string to search: 6

I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[^345]]

Enter input string to search: 9

I found the text "9" starting at index 0 and ending at index 1.

Предопределённые классы символов

java.util.regex.Pattern содержит несколько предопределённых классов символов:

Конструкция	Описание
`.`	Любой символ (может совпадать и с символами конца строки)
`\d`	Цифра: `[0-9]`
`\D`	Не цифра: `[^0-9]`
`\s`	Пробельный символ: `[ \t\n\x0B\f\r]`
`\S`	Непробельный символ: `[^\s]`
`\w`	Символ слова (английская буква, подчёркивание или цифра): `[a-zA-Z_0-9]`
`\W`	Несловарный символ: `[^\w]`

В таблице выше каждая конструкция в левой колонке является сокращением для символьного класса в правой колонке. Например, \d означает диапазон цифр (0-9), \w означает словарный символ (любая прописная или строчная английская буква, символ подчёркивания или любая цифра). Используйте предопределённые классы символов, когда это возможно, так как они облегчают чтение кода и устраняют ошибки, которые могут возникнуть с некорректно написанными символьными классами.

Конструкции, начинающиеся с обратной косой черты, называются управляющими конструкциями или экранированными конструкциями. Мы их уже видели в пункте «Строковые литералы», где упоминалось использование \Q и \E . Если вы используете управляющие конструкции внутри строкового литерала, то вы должны добавить ещё одну косую черту, чтобы эта строка могла скомпилироваться. Например:

private final String REGEX = "\\d"; // одиночная цифра

1	private final String REGEX = "\\d"; // одиночная цифра

В этом примере регулярное выражение \d, а дополнительная обратная косая черта нужна для того, чтобы в результирующей строке с регулярным выражением была одна обратная косая черта. Символ «\» имеет специальное значение внутри строковых констант, а чтобы он означал именно символ обратной косой черты, его нужно написать два раза «\\».

Однако в нашем RegexTestHarness регулярные выражения вводятся с консоли, а там символ «\» не имеет специального значения, поэтому его не нужно удваивать:

Enter your regex: .
Enter input string to search: @
I found the text "@" starting at index 0 and ending at index 1.

Enter your regex: . 
Enter input string to search: 1
I found the text "1" starting at index 0 and ending at index 1.

Enter your regex: .
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \d
Enter input string to search: 1
I found the text "1" starting at index 0 and ending at index 1.

Enter your regex: \d
Enter input string to search: a
No match found.

Enter your regex: \D
Enter input string to search: 1
No match found.

Enter your regex: \D
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \s
Enter input string to search:  
I found the text " " starting at index 0 and ending at index 1.

Enter your regex: \s
Enter input string to search: a
No match found.

Enter your regex: \S
Enter input string to search:  
No match found.

Enter your regex: \S
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \w
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \w
Enter input string to search: !
No match found.

Enter your regex: \W
Enter input string to search: a
No match found.

Enter your regex: \W
Enter input string to search: !
I found the text "!" starting at index 0 and ending at index 1.

Enter your regex: .

Enter input string to search: @

I found the text "@" starting at index 0 and ending at index 1.

Enter your regex: .

Enter input string to search: 1

I found the text "1" starting at index 0 and ending at index 1.

Enter your regex: .

Enter input string to search: a

I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \d

Enter input string to search: 1

I found the text "1" starting at index 0 and ending at index 1.

Enter your regex: \d

Enter input string to search: a

No match found.

Enter your regex: \D

Enter input string to search: 1

No match found.

Enter your regex: \D

Enter input string to search: a

I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \s

Enter input string to search:

I found the text " " starting at index 0 and ending at index 1.

Enter your regex: \s

Enter input string to search: a

No match found.

Enter your regex: \S

Enter input string to search:

No match found.

Enter your regex: \S

Enter input string to search: a

I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \w

Enter input string to search: a

I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \w

Enter input string to search: !

No match found.

Enter your regex: \W

Enter input string to search: a

No match found.

Enter your regex: \W

Enter input string to search: !

I found the text "!" starting at index 0 and ending at index 1.

Квантификаторы

Квантификаторы позволяют вам указать количество вхождений, с которыми будет совпадать шаблон. Ниже представлена таблица, описывающая жадные, ленивые и собственнические квантификаторы. На первый взгляд может показаться, что квантификаторы X?, X?? и X?+ делают одно и то же, так как они все означают «X один раз или ни разу». Но есть небольшие отличия, которые будут объяснены ниже.

Жадный	Ленивый	Собственнический	Значение
`X?`	`X??`	`X?+`	`X`, один раз или ни разу
`X*`	`X*?`	`X*+`	`X`, ноль или более раз
`X+`	`X+?`	`X++`	`X`, один или более раз
`X{n}`	`X{n}?`	`X{n}+`	`X`, точно `n` раз
`X{n,}`	`X{n,}?`	`X{n,}+`	`X`, хотя бы `n` раз
`X{n,m}`	`X{n,m}?`	`X{n,m}+`	`X`, хотя бы `n`, но не более `m` раз

Давайте посмотрит на жадные квантификаторы. Создадим три разных регулярных выражения: буква "a" с последующим ?, * или +. Посмотрим, что произойдёт, когда все эти выражения протестируются с пустой входной строкой "".

Enter your regex: a?
Enter input string to search: 
I found the text "" starting at index 0 and ending at index 0.

Enter your regex: a*
Enter input string to search: 
I found the text "" starting at index 0 and ending at index 0.

Enter your regex: a+
Enter input string to search: 
No match found.

Enter your regex: a?

Enter input string to search:

I found the text "" starting at index 0 and ending at index 0.

Enter your regex: a*

Enter input string to search:

I found the text "" starting at index 0 and ending at index 0.

Enter your regex: a+

Enter input string to search:

No match found.

В примере выше сопоставление успешно в первых двух случаях, так как выражения a? и a* разрешают нулевое вхождение буквы a. Вы также можете заметить, что начальный и конечный индекс равны нулю, что непохоже на примеры, что мы видели до этого. Пустая строка "" не имеет длины, поэтому сопоставление заканчивается на нулевом индексе. Совпадения такого вида называются совпадениями нулевой длины. Совпадение нулевой длины может происходить в нескольких случаях: на пустой строке, в начале входной строки, после последнего символа входной строки, либо между двумя символами входной строки. Совпадения нулевой длины легко обнаружить, потому что их начало и конец имеют один и тот же индекс.

Давайте посмотрим на совпадения нулевой длины в нескольких других примерах. Поменяем входную строку на "a", и вот что получится:

Enter your regex: a?
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.

Enter your regex: a*
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.

Enter your regex: a+
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: a?

Enter input string to search: a

I found the text "a" starting at index 0 and ending at index 1.

I found the text "" starting at index 1 and ending at index 1.

Enter your regex: a*

Enter input string to search: a

I found the text "a" starting at index 0 and ending at index 1.

I found the text "" starting at index 1 and ending at index 1.

Enter your regex: a+

Enter input string to search: a

I found the text "a" starting at index 0 and ending at index 1.

Все три квантификатора нашли букву "a", но первые два нашли ещё и совпадение нулевой длины по индексу 1, то есть после последнего символа входной строки. Запомните, что сопоставитель видит букву "a", сидящую в ячейке между индексами 0 и 1, и наша программа ищет совпадения в цикле до последнего. В зависимости от используемого квантификатора присутствие пустоты после последнего символа может привести или не привести к совпадению.

Теперь изменим входную строку так, что она будет содержать пять английских букв “a”, тогда мы получим следующее:

Enter your regex: a?
Enter input string to search: aaaaa
I found the text "a" starting at index 0 and ending at index 1.
I found the text "a" starting at index 1 and ending at index 2.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "a" starting at index 3 and ending at index 4.
I found the text "a" starting at index 4 and ending at index 5.
I found the text "" starting at index 5 and ending at index 5.

Enter your regex: a*
Enter input string to search: aaaaa
I found the text "aaaaa" starting at index 0 and ending at index 5.
I found the text "" starting at index 5 and ending at index 5.

Enter your regex: a+
Enter input string to search: aaaaa
I found the text "aaaaa" starting at index 0 and ending at index 5.

Enter your regex: a?

Enter input string to search: aaaaa

I found the text "a" starting at index 0 and ending at index 1.

I found the text "a" starting at index 1 and ending at index 2.

I found the text "a" starting at index 2 and ending at index 3.

I found the text "a" starting at index 3 and ending at index 4.

I found the text "a" starting at index 4 and ending at index 5.

I found the text "" starting at index 5 and ending at index 5.

Enter your regex: a*

Enter input string to search: aaaaa

I found the text "aaaaa" starting at index 0 and ending at index 5.

I found the text "" starting at index 5 and ending at index 5.

Enter your regex: a+

Enter input string to search: aaaaa

I found the text "aaaaa" starting at index 0 and ending at index 5.

Выражение a? находит индивидуальное совпадение на каждом символе, так как оно означает букву “a” ноль или один раз. Выражение a* находит два отдельных совпадения: все буквы “a” за раз и затем одно совпадение нулевой длины по индексу 5 (после последнего символа). Выражение a+ совпадает со всеми вхождениями “a”, игнорируя пустоту в конце.

Теперь посмотрим, что будет, если передать входную строку "ababaaaab":

Enter your regex: a?
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "" starting at index 3 and ending at index 3.
I found the text "a" starting at index 4 and ending at index 5.
I found the text "a" starting at index 5 and ending at index 6.
I found the text "a" starting at index 6 and ending at index 7.
I found the text "a" starting at index 7 and ending at index 8.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.

Enter your regex: a*
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "" starting at index 3 and ending at index 3.
I found the text "aaaa" starting at index 4 and ending at index 8.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.

Enter your regex: a+
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "aaaa" starting at index 4 and ending at index 8.

Enter your regex: a?

Enter input string to search: ababaaaab

I found the text "a" starting at index 0 and ending at index 1.

I found the text "" starting at index 1 and ending at index 1.

I found the text "a" starting at index 2 and ending at index 3.

I found the text "" starting at index 3 and ending at index 3.

I found the text "a" starting at index 4 and ending at index 5.

I found the text "a" starting at index 5 and ending at index 6.

I found the text "a" starting at index 6 and ending at index 7.

I found the text "a" starting at index 7 and ending at index 8.

I found the text "" starting at index 8 and ending at index 8.

I found the text "" starting at index 9 and ending at index 9.

Enter your regex: a*

Enter input string to search: ababaaaab

I found the text "a" starting at index 0 and ending at index 1.

I found the text "" starting at index 1 and ending at index 1.

I found the text "a" starting at index 2 and ending at index 3.

I found the text "" starting at index 3 and ending at index 3.

I found the text "aaaa" starting at index 4 and ending at index 8.

I found the text "" starting at index 8 and ending at index 8.

I found the text "" starting at index 9 and ending at index 9.

Enter your regex: a+

Enter input string to search: ababaaaab

I found the text "a" starting at index 0 and ending at index 1.

I found the text "a" starting at index 2 and ending at index 3.

I found the text "aaaa" starting at index 4 and ending at index 8.

Не смотря на то что буква “b” находится в ячейках 1, 3 и 8, программа находит совпадения нулевой длины в этих позициях. Регулярное выражение a? не ищет букву “b”, оно всего лишь ищет присутствие или отсутсвие буквы “a”. Если квантификатор позволяет совпадения “a” ноль раз, то любая входная строка, которая не “a” покажет совпадение нулевой длины.

Чтобы совпадение с шаблоном было точно n раз, нужно просто указать этот номер внутри фигурных скобок:

Enter your regex: a{3}
Enter input string to search: aa
No match found.

Enter your regex: a{3}
Enter input string to search: aaa
I found the text "aaa" starting at index 0 and ending at index 3.

Enter your regex: a{3}
Enter input string to search: aaaa
I found the text "aaa" starting at index 0 and ending at index 3.

Enter your regex: a{3}

Enter input string to search: aa

No match found.

Enter your regex: a{3}

Enter input string to search: aaa

I found the text "aaa" starting at index 0 and ending at index 3.

Enter your regex: a{3}

Enter input string to search: aaaa

I found the text "aaa" starting at index 0 and ending at index 3.

Здесь регулярное выражение a{3} ищет три совпадения с буквой “a” в строке. Первый тест не находит совпадений, так как входная строка не содержит достаточного количества букв “a”. Второй тест содержит ровно три буквы “a” во входной строке, что приводит к совпадению со всей строкой. Третий тест также находит совпадение, так как входная строка содержит ровно 3 буквы “a” в начале. Если наш шаблон появится во входной строке ещё раз, то это приведёт к последующим совпадениям:

Enter your regex: a{3}
Enter input string to search: aaaaaaaaa
I found the text "aaa" starting at index 0 and ending at index 3.
I found the text "aaa" starting at index 3 and ending at index 6.
I found the text "aaa" starting at index 6 and ending at index 9.

Enter your regex: a{3}

Enter input string to search: aaaaaaaaa

I found the text "aaa" starting at index 0 and ending at index 3.

I found the text "aaa" starting at index 3 and ending at index 6.

I found the text "aaa" starting at index 6 and ending at index 9.

Чтобы шаблон появлялся хотя бы n раз, нужно поставить запятую после числа:

Enter your regex: a{3,}
Enter input string to search: aaaaaaaaa
I found the text "aaaaaaaaa" starting at index 0 and ending at index 9.

Enter your regex: a{3,}

Enter input string to search: aaaaaaaaa

I found the text "aaaaaaaaa" starting at index 0 and ending at index 9.

Этот тест нашёл только одной совпадение с той же самой входной строкой, потому что девять букв “a” во входной строку удовлетворяют условию «как минимум 3 буквы “a”».

Максимальное количество вхождений указывается после запятой в фигурных скобках:

Enter your regex: a{3,6} // find at least 3 (but no more than 6) a's in a row
Enter input string to search: aaaaaaaaa
I found the text "aaaaaa" starting at index 0 and ending at index 6.
I found the text "aaa" starting at index 6 and ending at index 9.

Enter your regex: a{3,6} // find at least 3 (but no more than 6) a's in a row

Enter input string to search: aaaaaaaaa

I found the text "aaaaaa" starting at index 0 and ending at index 6.

I found the text "aaa" starting at index 6 and ending at index 9.

До сих пор мы тестировали квантификаторы на входных строках, содержащих только один символ. На самом деле квантификаторы могут быть присоединены только к одному символу за раз, поэтому регулярное выражение abc+ будет означать «буква “a” с последующей буквой “b”, с последующей буквой “c”, повторённой один или более раз». Это НЕ будет означать “abc” один или более раз. Однако квантификаторы могут быть присоединены к символьным классам и схваченным группам, таким как [abc]+ (“a” или “b” или “c” один или более раз) или (abc)+ (группа “abc” один или более раз).

Enter your regex: (dog){3}
Enter input string to search: dogdogdogdogdogdog
I found the text "dogdogdog" starting at index 0 and ending at index 9.
I found the text "dogdogdog" starting at index 9 and ending at index 18.

Enter your regex: dog{3}
Enter input string to search: dogdogdogdogdogdog
No match found.

Enter your regex: (dog){3}

Enter input string to search: dogdogdogdogdogdog

I found the text "dogdogdog" starting at index 0 and ending at index 9.

I found the text "dogdogdog" starting at index 9 and ending at index 18.

Enter your regex: dog{3}

Enter input string to search: dogdogdogdogdogdog

No match found.

Здесь первый пример находит три совпадения, так как квантификатор применяется ко всей группе. Если удалить скобки, то совпадений не найдётся, так как квантификатор {3} будет применяться только к букве “g”.

Так же можно применить квантификатор ко всему символьному классу:

nter your regex: [abc]{3}
Enter input string to search: abccabaaaccbbbc
I found the text "abc" starting at index 0 and ending at index 3.
I found the text "cab" starting at index 3 and ending at index 6.
I found the text "aaa" starting at index 6 and ending at index 9.
I found the text "ccb" starting at index 9 and ending at index 12.
I found the text "bbc" starting at index 12 and ending at index 15.

Enter your regex: abc{3}
Enter input string to search: abccabaaaccbbbc
No match found.

nter your regex: [abc]{3}

Enter input string to search: abccabaaaccbbbc

I found the text "abc" starting at index 0 and ending at index 3.

I found the text "cab" starting at index 3 and ending at index 6.

I found the text "aaa" starting at index 6 and ending at index 9.

I found the text "ccb" starting at index 9 and ending at index 12.

I found the text "bbc" starting at index 12 and ending at index 15.

Enter your regex: abc{3}

Enter input string to search: abccabaaaccbbbc

No match found.

Здесь в первом примере квантификатор {3} применяется ко всему символьному классу, а во втором примере только к букве “c”.

Есть тонкие отличие между жадными, ленивыми и собственническими квантификаторами.

Жадные квантификаторы считаются «жадными», потому что они пытаются сначала прочитать (или съесть) всю строку для первого сопоставления. Если первое сопоставление (вся входная строка) не удалось, то происходит сдвиг на один символ назад по входной строке, и снова происходит попытка сопоставления и т. д., пока больше не останетсся символов. В зависимости от используемого квантификатора последняя попытка будет сопоставлять с 1 или 0 символов.

Ленивые квантификаторы наоборот начинают с начала строки и съедают по одному символу в попытке найти соответствие. И лишь в самую последнюю очередь они сравнивают со всей строкой.

Собственнические квантификаторы всегда съедают всю входную строку и всегда производят только одну попытку сопоставления. В отличие от жадных квантификаторов собственнические квантификаторы никогда на сдвигаются назад, даже если это привело бы к успешному сопоставлению.

Пример:

Enter your regex: .*foo  // greedy quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo  // reluctant quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // possessive quantifier
Enter input string to search: xfooxxxxxxfoo
No match found.

Enter your regex: .*foo // greedy quantifier

Enter input string to search: xfooxxxxxxfoo

I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo // reluctant quantifier

Enter input string to search: xfooxxxxxxfoo

I found the text "xfoo" starting at index 0 and ending at index 4.

I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // possessive quantifier

Enter input string to search: xfooxxxxxxfoo

No match found.

Первый пример использует жадный квантификатор .*, чтобы найти что-нибудь ноль или более раз с последующими буквами “f”, “o”, “o”. Так как квантификатор жадный, то .* съедает сначала всю строку. В этой точке совпадения не получается, так как буквы “f”, “o”, “o” были уже съедены. Поэтому происходит сдвиг назад на одну букву, и снова происходит сопоставление. И так до тех пор, пока сопоставление не пройдёт успешно.

Во втором примере квантификатор ленивый. Он начинает со съедания пустой строки. Так как “f”, “o”, “o” не стоит в начале строки, то квантификатор съедает следующий символ “x”, что приводит к совпадению в диапазоне от 0 до 4. И т. д.

Третий пример ничего не находит ,так как квантификатор .*+ съедает всю строку за раз вместе с последними буквами “f”, “o”, “o”.

Захват групп

Захват групп — это способ расценивать несколько символов как единое целое. Они создаются помещением символов внутрь скобок. Например, регулярное выражение (dog) создаёт одну группу с буквами “d”, “o” и “g”. Порция входной строки, которая совпадает с захватываемой группой, будет сохранена в памяти для последующий обращений.

Захватываемые группы нумеруются подсчётом скобок слева направо. В выражении ((A)(B(C))) будут такие группы:

((A)(B(C)))
(A)
(B(C))
(C)

Чтобы узнать количество групп в выражении, используйте метод groupCount объекта java.util.regex.Matcher. Метод groupCount возвращает int, показывающий количество захватываемых групп в шаблоне. В этом примере groupCount вернёт 4.

Существует также специальная группа 0, которая всегда представляет из себя полное выражение. Эта группа не включается в число, сообщаемое groupCount. Группы, начинающиеся с (?, — это чистые, незахватываемые группы, которые не захватывают текст и и не учитываются в groupCount.

Кусок входной строки, совпавший с захватываемой группой, сохраняется в памяти для дальнейшего обращения. Обратиться к такой группе внутри регулярного выражения можно с помощью обратной косой черты с последующей цифрой, обозначающей номер группы. Например, выражение (\d\d) определяет захватываемую группу из двух идущих подряд цифр, которая может быть использована ещё раз с помощью \1.

Чтобы регулярное выражение искало две цифры с последующими точно такими же двумя цифрами, используйте (\d\d)\1:

Enter your regex: (\d\d)\1
Enter input string to search: 1212
I found the text "1212" starting at index 0 and ending at index 4.

Enter your regex: (\d\d)\1

Enter input string to search: 1212

I found the text "1212" starting at index 0 and ending at index 4.

Если поменять две последние цифры, то регулярное выражение не совпадёт со строкой:

Enter your regex: (\d\d)\1
Enter input string to search: 1234
No match found.

Enter your regex: (\d\d)\1

Enter input string to search: 1234

No match found.

Для вложенных групп это работает точно так же.

Границы совпадений

Вы можете указать, что совпадение с вашим шаблоном должно происходить только в начале или только в конце строки, только между словами и т. д.:

Конструкция	Описание
`^`	В начале строки
`$`	В конце строки
`\b`	На границе слова
`\B`	Не на границе слова
`\A`	В начале входной строки.
`\G`	Конец предыдущего совпадения
`\Z`	Конец ввода для завершающего символа, если есть.
`\z`	Конец ввода.

Enter your regex: ^dog$
Enter input string to search: dog
I found the text "dog" starting at index 0 and ending at index 3.

Enter your regex: ^dog$
Enter input string to search:       dog
No match found.

Enter your regex: \s*dog$
Enter input string to search:             dog
I found the text "            dog" starting at index 0 and ending at index 15.

Enter your regex: ^dog\w*
Enter input string to search: dogblahblah
I found the text "dogblahblah" starting at index 0 and ending at index 11.

Enter your regex: ^dog$

Enter input string to search: dog

I found the text "dog" starting at index 0 and ending at index 3.

Enter your regex: ^dog$

Enter input string to search: dog

No match found.

Enter your regex: \s*dog$

Enter input string to search: dog

I found the text " dog" starting at index 0 and ending at index 15.

Enter your regex: ^dog\w*

Enter input string to search: dogblahblah

I found the text "dogblahblah" starting at index 0 and ending at index 11.

Первый пример успешен, так как шаблон захватывает всю входную строку. Второй пример не находит совпадений, так как входная строка содержит дополнительные пробелы в начале. Третий пример указывает выражение, которое позволяет использовать любое количество пробельных символов с завершающим “dog”. Четвёртый пример требует наличия “dog” в начале строки с последующими словарными буквами.

Чтобы находить совпадения с шаблоном на границах слов используйте \b:

Enter your regex: \bdog\b
Enter input string to search: The dog plays in the yard.
I found the text "dog" starting at index 4 and ending at index 7.

Enter your regex: \bdog\b
Enter input string to search: The doggie plays in the yard.
No match found.

Enter your regex: \bdog\b

Enter input string to search: The dog plays in the yard.

I found the text "dog" starting at index 4 and ending at index 7.

Enter your regex: \bdog\b

Enter input string to search: The doggie plays in the yard.

No match found.

Чтобы искать совпадения Не на границе слов используйте \B:

Enter your regex: \bdog\B
Enter input string to search: The dog plays in the yard.
No match found.

Enter your regex: \bdog\B
Enter input string to search: The doggie plays in the yard.
I found the text "dog" starting at index 4 and ending at index 7.

Enter your regex: \bdog\B

Enter input string to search: The dog plays in the yard.

No match found.

Enter your regex: \bdog\B

Enter input string to search: The doggie plays in the yard.

I found the text "dog" starting at index 4 and ending at index 7.

Чтобы совпадение происходило в конце предыдущего совпадения используйте \G:

Enter your regex: dog 
Enter input string to search: dog dog
I found the text "dog" starting at index 0 and ending at index 3.
I found the text "dog" starting at index 4 and ending at index 7.

Enter your regex: \Gdog 
Enter input string to search: dog dog
I found the text "dog" starting at index 0 and ending at index 3.

Enter your regex: dog

Enter input string to search: dog dog

I found the text "dog" starting at index 0 and ending at index 3.

I found the text "dog" starting at index 4 and ending at index 7.

Enter your regex: \Gdog

Enter input string to search: dog dog

I found the text "dog" starting at index 0 and ending at index 3.

Здесь второй пример находит только одно совпадение, так как второе слово “dog” находится не на границе предыдущего совпадения.

java.util.regex.Pattern

Кроме метода:

public static Pattern compile(String regex)

1	public static Pattern compile(String regex)

в классе Pattern есть ещё фабричный метод:

public static Pattern compile(String regex,
              int flags)

1 2	public static Pattern compile(String regex, int flags)

который принимает в качестве параметра flags следующие флаги, либо их комбинацию через бинарный или:

java.util.regex.Pattern.CANON_EQ — включает каноничную эквивалентность. Если этот флаг указан, то два символа будут рассматриваться как совпадающие, если их каноничные декомпозиции совпадают. Выражение "a\u030A", например, будет совпадать со строкой "\u00E5". По умолчанию совпадение не принимает во внимание каноничное совпадение.
java.util.regex.Pattern.CASE_INSENSITIVE — включает регистронезависимость. По умолчанию регистронезависимость предполагает, что только символы кодировки US-ASCII будут регистронезависимы. Чтобы использовать регистронезависимость для других языков используйте флаг UNICODE_CASE вместе с этим. Регистронезависимость может быть также включена внедрением флага (?i) в регулярное выражение.
java.util.regex.Pattern.COMMENTS — разрешает пробелы и комментарии. В этом режиме пробелы игнорируются, а вложенные комментарии, начинающиеся с символа «#», игнорируются до конца строки. Этот режим может быть также включён внедрением флага (?x) в регулярное выражение.
java.util.regex.Pattern.DOTALL — включает режим dotall. В этом режиме выражение . совпадает с любым символом, включая символ конца строки. По умолчанию это выражение не совпадает с символами конца строки. Этот режим может быть также включён внедрением флага (?s) в регулярное выражение.
java.util.regex.Pattern.LITERAL — При указании этого флага шаблон расценивается как последовательность символов. Метасимволы и управляющие последовательности не имеют специального значения. Флаги CASE_INSENSITIVE и UNICODE_CASE сохраняют своё воздействие при использовании вместе с этим флагом. Другие флаги излишни.
java.util.regex.Pattern.MULTILINE — включает многострочный режим. В многострочном режиме выражения ^ и $ означают сразу после или сразу перед окончанием строки или окончанием входной последовательности. Многострочный режим также может быть включён с помощью встроенного в регулярное выражение флага (?m).
java.util.regex.Pattern.UNICODE_CASE — Когда этот флаг включён при регистронезависимом сравнении ( CASE_INSENSITIVE), то сравнение происходит в соответствии со стандартом Юникода. По умолчанию регистронезависимое сравнение предполагает сравнение только символов кодировки US-ASCII. Этот тип сравнения может быть также включён внедрением в регулярное выражение флага (?u).
java.util.regex.Pattern.UNIX_LINES — режим линий Unix. В этом режиме только символ '\n' расценивается в качестве конца строки для ., ^ и $. Этот режим также может быть включён внедрением флага (?d).

Следующими шагами мы модифицируем RegexTestHarness так, чтобы он принимал шаблоне с регистронезависимым сравнением.

Сначала мы модифицируем код, чтобы он вызывал альтернативную версию compile:

Pattern pattern = 
Pattern.compile(console.readLine("%nEnter your regex: "),
Pattern.CASE_INSENSITIVE);

Pattern pattern =

Pattern.compile(console.readLine("%nEnter your regex: "),

Pattern.CASE_INSENSITIVE);

Затем скомпилируем его и запустим, чтобы получить следующие результаты:

Enter your regex: dog
Enter input string to search: DoGDOg
I found the text "DoG" starting at index 0 and ending at index 3.
I found the text "DOg" starting at index 3 and ending at index 6.

Enter your regex: dog

Enter input string to search: DoGDOg

I found the text "DoG" starting at index 0 and ending at index 3.

I found the text "DOg" starting at index 3 and ending at index 6.

Как вы видите, строка "dog" совпала с обоими вхождениями, независимо от регистра. Чтобы скомпилировать шаблон с несколькими флагами, разделите флаги бинарным ИЛИ, как здесь:

pattern = Pattern.compile("[az]$", Pattern.MULTILINE | Pattern.UNIX_LINES);

1	pattern = Pattern.compile("[az]$", Pattern.MULTILINE \| Pattern.UNIX_LINES);

Вы также можете указать переменную int:

pattern = Pattern.compile("[az]$", Pattern.MULTILINE | Pattern.UNIX_LINES);

1	pattern = Pattern.compile("[az]$", Pattern.MULTILINE \| Pattern.UNIX_LINES);

Также возможно включать различные флаги с помощью вложенных в выражение флагов:

Enter your regex: (?i)foo
Enter input string to search: FOOfooFoOfoO
I found the text "FOO" starting at index 0 and ending at index 3.
I found the text "foo" starting at index 3 and ending at index 6.
I found the text "FoO" starting at index 6 and ending at index 9.
I found the text "foO" starting at index 9 and ending at index 12.

Enter your regex: (?i)foo

Enter input string to search: FOOfooFoOfoO

I found the text "FOO" starting at index 0 and ending at index 3.

I found the text "foo" starting at index 3 and ending at index 6.

I found the text "FoO" starting at index 6 and ending at index 9.

I found the text "foO" starting at index 9 and ending at index 12.

Таблица сопоставления вложенных флагов с константами:

Константа	Эквивалентный вложенный в регулярное выражение флаг
`Pattern.CANON_EQ`	None
`Pattern.CASE_INSENSITIVE`	`(?i)`
`Pattern.COMMENTS`	`(?x)`
`Pattern.MULTILINE`	`(?m)`
`Pattern.DOTALL`	`(?s)`
`Pattern.LITERAL`	None
`Pattern.UNICODE_CASE`	`(?u)`
`Pattern.UNIX_LINES`	`(?d)`

Класс Pattern также имеет статический метод matches:

public static boolean matches(String regex,
                              CharSequence input)

1 2	public static boolean matches(String regex, CharSequence input)

который позволяет быстро сравнить регулярное выражение со строкой.

В классе Pattern также есть метод:

public String[] split(CharSequence input)

1	public String[] split(CharSequence input)

, который позволяет разбить строку на массив строк по регулярному выражению:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class SplitDemo {

    private static final String REGEX = ":";
    private static final String INPUT =
        "one:two:three:four:five";
    
    public static void main(String[] args) {
        Pattern p = Pattern.compile(REGEX);
        String[] items = p.split(INPUT);
        for(String s : items) {
            System.out.println(s);
        }
    }
}

import java.util.regex.Pattern;

import java.util.regex.Matcher;

public class SplitDemo {

private static final String REGEX = ":";

private static final String INPUT =

"one:two:three:four:five";

public static void main(String[] args) {

Pattern p = Pattern.compile(REGEX);

String[] items = p.split(INPUT);

for(String s : items) {

System.out.println(s);

}

В результате получим:

one
two
three
four
five

one

two

three

four

five

Либо:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class SplitDemo2 {

    private static final String REGEX = "\\d";
    private static final String INPUT =
        "one9two4three7four1five";

    public static void main(String[] args) {
        Pattern p = Pattern.compile(REGEX);
        String[] items = p.split(INPUT);
        for(String s : items) {
            System.out.println(s);
        }
    }
}

import java.util.regex.Pattern;

import java.util.regex.Matcher;

public class SplitDemo2 {

private static final String REGEX = "\\d";

private static final String INPUT =

"one9two4three7four1five";

public static void main(String[] args) {

Pattern p = Pattern.compile(REGEX);

String[] items = p.split(INPUT);

for(String s : items) {

System.out.println(s);

}

получим:

one
two
three
four
five

one

two

three

four

five

В классе java.lang.String некоторые методы тоже работают с регулярными выражениями:

public boolean matches(String regex)

1	public boolean matches(String regex)

Аналогично вызову Pattern.matches(regex, str).

public String[] split(String regex,
                      int limit)

1 2	public String[] split(String regex, int limit)

Аналогично вызову Pattern.compile(regex).split(str, n).

public String[] split(String regex)

1	public String[] split(String regex)

Аналогично вызову Pattern.compile(regex).split(str, 0).

java.util.regex.Matcher

Класс Matcher осуществляет сопоставление строки символов с шаблоном в Pattern.

Matcher работает не со всей входной строкой, а лишь с её регионом (который по умолчанию совпадет со всей строкой). Можно изменить регион с помощью метода:

public Matcher region(int start,
                      int end)

1 2	public Matcher region(int start, int end)

Полезные методы:

public int start()

1	public int start()

Возвращает индекс начала предыдущего совпадения.

public int start(int group)

1	public int start(int group)

Возвращает начальный индекс схваченной группы group в предыдущем совпадении.

public int end()

1	public int end()

Возвращает индекс после последнего совпавшего символа.

public int end(int group)

1	public int end(int group)

Возвращает индекс после последнего символа схваченной группы group в предыдущем совпадении.

public boolean lookingAt()

1	public boolean lookingAt()

Пытается сопоставить строку с самого начала региона с шаблоном. Возвращает true, если начало региона входной строки совпадает с шаблоном.

public boolean find()

1	public boolean find()

Пытается найти следующую последовательность во входном регионе, совпадающую с шаблоном.

public boolean find(int start)

1	public boolean find(int start)

Очищает состояние Matcher и пытается найти следующее совпадение с индекса start.

public boolean matches()

1	public boolean matches()

Пытается сопоставить весь регион с шаблоном.

public String replaceAll(String replacement)

1	public String replaceAll(String replacement)

Заменяет все совпадения с шаблонов во входной строке на replacement. Есть аналог метода в классе String: public String replaceAll(String regex, String replacement)

public String replaceFirst(String replacement)

1	public String replaceFirst(String replacement)

Заменяет первое совпадение с шаблоном во входной строке на replacement. Есть аналог метода в классе String: public String replaceFirst(String regex, String replacement):

java.util.regex.PatternSyntaxException

Непроверяемое исключение. Возникает, если есть синтаксическая ошибка в регулярном выражении. Полезные методы:

public String getDescription()

1	public String getDescription()

Возвращает описание ошибки.

public int getIndex()

1	public int getIndex()

Возвращает приблизительный индекс, где находится ошибка, либо -1.

public String getPattern()

1	public String getPattern()

Возвращает регулярное выражение.

public String getMessage()

1	public String getMessage()

Возвращает многострочное сообщение об ошибке с описанием и индексом.

Цикл статей «Учебник Java 8».

Следующая статья — «Java 8 коллекции».
Предыдущая статья — «Java 8 среда платформы».