快捷導(dǎo)航

正則表達式學(xué)習(xí)經(jīng)驗分析第2/2頁

更新時間：2008年05月30日 19:57:03 作者：

正則表達式用來指定字符串模式。當(dāng)你需要定位匹配某種模式的字符串時就可以使用正則表達式。例如，我們下面的一個例程就是在一個HTML文件中通過查找字符串模式<a href="...">來定位所有的超鏈接。

例12－9提示輸入一個模式和一個欲匹配的字符串。它將輸出輸入的字符串是否匹配模式。如果輸入匹配包含分組的模式，程序?qū)褂脠A括號來打印分組邊界，如((11):(59))am

Example 12-9. RegexTest.java
1. import java.util.*;
2. import java.util.regex.*;
3.
4. /**
5.   This program tests regular expression matching.
6.   Enter a pattern and strings to match, or hit Cancel
7.   to exit. If the pattern contains groups, the group
8.   boundaries are displayed in the match.
9. */
10. public class RegExTest
11. {
12.   public static void main(String[] args)
13.   {
14.     Scanner in = new Scanner(System.in);
15.     System.out.println("Enter pattern: ");
16.     String patternString = in.nextLine();
17.
18.     Pattern pattern = null;
19.     try
20.     {
21.       pattern = Pattern.compile(patternString);
22.     }
23.     catch (PatternSyntaxException e)
24.     {
25.       System.out.println("Pattern syntax error");
26.       System.exit(1);
27.     }
28.
29.     while (true)
30.     {
31.       System.out.println("Enter string to match: ");
32.       String input = in.nextLine();
33.       if (input == null || input.equals("")) return;
34.       Matcher matcher = pattern.matcher(input);
35.       if (matcher.matches())
36.       {
37.         System.out.println("Match");
38.         int g = matcher.groupCount();
39.         if (g > 0)
40.         {
41.           for (int i = 0; i < input.length(); i++)
42.           {
43.             for (int j = 1; j <= g; j++)
44.               if (i == matcher.start(j))
45.                 System.out.print('(');
46.             System.out.print(input.charAt(i));
47.             for (int j = 1; j <= g; j++)
48.               if (i + 1 == matcher.end(j))
49.                 System.out.print(')');
50.           }
51.           System.out.println();
52.         }
53.       }
54.       else
55.         System.out.println("No match");
56.     }
57.   }
58. }

  通常地，你不希望匹配整個輸入到某個正則表達式，而是希望在輸入中找出一個或多個匹配的子字符串。使用Matcher類的find方法來尋找下一個匹配。如果它返回True，再使用start和end方法找出匹配的范圍。

while (matcher.find())

{

  int start = matcher.start();

  int end = matcher.end();

  String match = input.substring(start, end);

  . . .

}

例12-10用到了這種機制。它在一個網(wǎng)頁中定位所有的超文本引用并打印它們。為運行程序，在命令行提供一個URL，比如
java HrefMatch http://www.horstmann.com

Example 12-10. HrefMatch.java
1. import java.io.*;
2. import java.net.*;
3. import java.util.regex.*;
4.
5. /**
6.   This program displays all URLs in a web page by
7.   matching a regular expression that describes the
8.   <a href=...> HTML tag. Start the program as
9.   java HrefMatch URL
10. */
11. public class HrefMatch
12. {
13.   public static void main(String[] args)
14.   {
15.     try
16.     {
17.       // get URL string from command line or use default
18.       String urlString;
19.       if (args.length > 0) urlString = args[0];
20.       else urlString = "http://java.sun.com";
21.
22.       // open reader for URL
23.       InputStreamReader in = new InputStreamReader(new URL(urlString).openStream());
24.
25.       // read contents into string buffer
26.       StringBuilder input = new StringBuilder();
27.       int ch;
28.       while ((ch = in.read()) != -1) input.append((char) ch);
29.
30.       // search for all occurrences of pattern
31.       String patternString = "<a\\s+href\\s*=\\s*(\"[^\"]*\"|[^\\s>])\\s*>";
32.       Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
33.       Matcher matcher = pattern.matcher(input);
34.
35.       while (matcher.find())
36.       {
37.         int start = matcher.start();
38.         int end = matcher.end();
39.         String match = input.substring(start, end);
40.         System.out.println(match);
41.       }
42.     }
43.     catch (IOException e)
44.     {
45.       e.printStackTrace();
46.     }
47.     catch (PatternSyntaxException e)
48.     {
49.       e.printStackTrace();
50.     }
51.   }
52. }

  Matcher類的replaceAll方法用一個替換字符串代替出現(xiàn)的所有正則表達式的匹配。比如，下列指令用#替換所有數(shù)字序列

Pattern pattern = Pattern.compile("[0-9]+");

Matcher matcher = pattern.matcher(input);

String output = matcher.replaceAll("#");

  替換字符串可以包含模式中的分組引用：$n被第n個分組替換。替換文本中出現(xiàn)$時，使用\$來包含它。
replaceFirst方法只替換模式的第一次出現(xiàn)。

  最后講一點，Pattern類有一個split方法，它類似于字符串tokenizer。它使用正則表達式匹配作邊界，將輸入分離成字符串?dāng)?shù)組。比如，下面的指令將輸入分離成記號（token），

Pattern pattern = Pattern.compile("\\s*\\p{Punct}\\s*");

String[] tokens = pattern.split(input);

類

--------------------------------------------------------------------------------
java.util.regex.Pattern 1.4

--------------------------------------------------------------------------------
方法
static Pattern compile(String expression)
static Pattern compile(String expression, int flags)
編譯正則表達式字符串到pattern對象用以匹配的快速處理
參數(shù):
expression 正則表達式
flags         下列標(biāo)志中的一個或多個 CASE_INSENSITIVE, UNICODE_CASE, MULTILINE, UNIX_LINES, DOTALL, and CANON_EQ

Matcher matcher(CharSequence input)
返回一個matcher對象，它可以用來在一個輸入中定位模式匹配

String[] split(CharSequence input)
String[] split(CharSequence input, int limit)
將輸入字符串分離成記號，并由pattern來指定分隔符的形式。返回記號數(shù)組。分隔符并不是記號的一部分。
參數(shù):
input 分離成記號的字符串
limit 生成的最大字符串?dāng)?shù)。

--------------------------------------------------------------------------------
類

--------------------------------------------------------------------------------
java.util.regex.Matcher 1.4

--------------------------------------------------------------------------------
方法

--------------------------------------------------------------------------------
boolean matches()
返回輸入是否與模式匹配

boolean lookingAt()
如果輸入的起始匹配模式則返回True

boolean find()
boolean find(int start)
嘗試查找下一個匹配，并在找到匹配時返回True
參數(shù):
start 開始搜索的索引

int start()
int end()
返回當(dāng)前匹配的起始位置和結(jié)尾后位置

String group()
返回當(dāng)前匹配

int groupCount()
返回輸入模式中的分組數(shù)

int start(int groupIndex)
int end(int groupIndex)
返回一個給定分組當(dāng)前匹配中的起始位置和結(jié)尾后位置
參數(shù):
groupIndex分組索引（從1開始），0表示整個匹配

String group(int groupIndex)
返回匹配一個給定分組的字符串
參數(shù):
groupIndex
分組索引（從1開始），0表示整個匹配

String replaceAll(String replacement)
String replaceFirst(String replacement)
返回從matcher輸入得到的字符串，但已經(jīng)用替換表達式替換所有或第一個匹配
參數(shù):
replacement 替換字符串

Matcher reset()
Matcher reset(CharSequence input)
復(fù)位mather狀態(tài)。

上一頁 12閱讀全文