查看: 654|回复: 1

关于正则表达式---ZT [复制链接]

Longe

管理员

论坛测试[砖]家

Rank: 12

金币: 7308
贡献: 615
威望: 9151
最后登录: 2026-7-16
帖子: 1875
积分: 25896
UID: 10

电梯直达

1楼

发表于 2009-11-9 13:04:38 |只看该作者 |正序浏览

第一部分：
-----------------
正则表达式(REs)通常被错误地认为是只有少数人理解的一种神秘语言。在表面上它们确实看起来杂乱无章，如果你不知道它的语法，那么它的代码在你眼里只是一堆文字垃圾而已。实际上，正则表达式是非常简单并且可以被理解。读完这篇文章后，你将会通晓正则表达式的通用语法。

支持多种平台

正则表达式最早是由数学家Stephen Kleene于1956年提出，他是在对自然语言的递增研究成果的基础上提出来的。具有完整语法的正则表达式使用在字符的格式匹配方面上，后来被应用到熔融信息技术领域。自从那时起，正则表达式经过几个时期的发展，现在的标准已经被ISO(国际标准组织)批准和被Open Group组织认定。

正则表达式并非一门专用语言，但它可用于在一个文件或字符里查找和替代文本的一种标准。它具有两种标准：基本的正则表达式(BRE)，扩展的正则表达式(ERE)。ERE包括BRE功能和另外其它的概念。

许多程序中都使用了正则表达式，包括xsh,egrep,sed,vi以及在UNIX平台下的程序。它们可以被很多语言采纳，如HTML 和XML，这些采纳通常只是整个标准的一个子集。

比你想象的还要普通
随着正则表达式移植到交叉平台的程序语言的发展，这的功能也日益完整，使用也逐渐广泛。网络上的搜索引擎使用它，e-mail程序也使用它，即使你不是一个UNIX程序员，你也可以使用规则语言来简化你的程序而缩短你的开发时间。

正则表达式101
很多正则表达式的语法看起来很相似，这是因为你以前你没有研究过它们。通配符是RE的一个结构类型，即重复操作。让我们先看一看ERE标准的最通用的基本语法类型。为了能够提供具有特定用途的范例，我将使用几个不同的程序。

第二部分：
----------------------
字符匹配

正则表达式的关键之处在于确定你要搜索匹配的东西，如果没有这一概念，Res将毫无用处。

每一个表达式都包含需要查找的指令，如表A所示。

Table A: Character-matching regular expressions
格式说明：
---------------
操作：
解释：
例子：
结果：
----------------
.
Match any one character
grep .ord sample.txt
Will match “ford”, “lord”, “2ord”, etc. in the file sample.txt.
-----------------
[ ]
Match any one character listed between the brackets
grep [cng]ord sample.txt
Will match only “cord”, “nord”, and “gord”
---------------------
[^ ]
Match any one character not listed between the brackets

grep [^cn]ord sample.txt
Will match “lord”, “2ord”, etc. but not “cord” or “nord”

grep [a-zA-Z]ord sample.txt
Will match “aord”, “bord”, “Aord”, “Bord”, etc.

grep [^0-9]ord sample.txt
Will match “Aord”, “aord”, etc. but not “2ord”, etc.

重复操作符
重复操作符，或数量词，都描述了查找一个特定字符的次数。它们常被用于字符匹配语法以查找多行的字符，可参见表B。

Table B: Regular expression repetition operators
格式说明：
---------------
操作：
解释：
例子：
结果：
----------------
?
Match any character one time, if it exists
egrep “?erd” sample.txt
Will match “berd”, “herd”, etc. and “erd”
------------------
*
Match declared element multiple times, if it exists
egrep “n.*rd” sample.txt
Will match “nerd”, “nrd”, “neard”, etc.
-------------------
+
Match declared element one or more times
egrep “[n]+erd” sample.txt
Will match “nerd”, “nnerd”, etc., but not “erd”
--------------------
{n}
Match declared element exactly n times
egrep “[a-z]{2}erd” sample.txt
Will match “cherd”, “blerd”, etc. but not “nerd”, “erd”, “buzzerd”, etc.
------------------------
{n,}
Match declared element at least n times
egrep “.{2,}erd” sample.txt
Will match “cherd” and “buzzerd”, but not “nerd”
------------------------
{n,N}
Match declared element at least n times, but not more than N times
egrep “n[e]{1,2}rd” sample.txt
Will match “nerd” and “neerd”

第三部分：
----------------
锚
锚是指它所要匹配的格式，如图C所示。使用它能方便你查找通用字符的合并。例如，我用vi行编辑器命令:s来代表substitute，这一命令的基本语法是：

s/pattern_to_match/pattern_to_substitute/

Table C: Regular expression anchors
-------------
操作
解释
例子
结果
---------------
^
Match at the beginning of a line
s/^/blah /
Inserts “blah “ at the beginning of the line
---------------
$
Match at the end of a line
s/$/ blah/
Inserts “ blah” at the end of the line
---------------
\<
Match at the beginning of a word
s/\Inserts “blah” at the beginning of the word

egrep “\Matches “blahfield”, etc.
------------------
\>
Match at the end of a word
s/\>/blah/
Inserts “blah” at the end of the word

egrep “\>blah” sample.txt
Matches “soupblah”, etc.
---------------
\b
Match at the beginning or end of a word
egrep “\bblah” sample.txt
Matches “blahcake” and “countblah”
-----------------
\B
Match in the middle of a word
egrep “\Bblah” sample.txt
Matches “sublahper”, etc.

间隔

Res中的另一可便之处是间隔(或插入)符号。实际上，这一符号相当于一个OR语句并代表|符号。下面的语句返回文件sample.txt中的“nerd” 和 “merd”的句柄：

egrep “(n|m)erd” sample.txt

间隔功能非常强大，特别是当你寻找文件不同拼写的时候，但你可以在下面的例子得到相同的结果：

egrep “[nm]erd” sample.txt

当你使用间隔功能与Res的高级特性连接在一起时，它的真正用处更能体现出来。

第四部分：
----------------
一些保留字符
Res的最后一个最重要特性是保留字符(也称特定字符)。例如，如果你想要查找“ne*rd”和“ni*rd”的字符，格式匹配语句“n[ei]*rd”与“neeeeerd” 和 “nieieierd”相符合，但并不是你要查找的字符。因为‘*’(星号)是个保留字符，你必须用一个反斜线符号来替代它，即：“n[ei]\*rd”。其它的保留字符包括：

^ (carat)
. (period)
[ (left bracket}
$ (dollar sign)
( (left parenthesis)
) (right parenthesis)
| (pipe)
* (asterisk)
+ (plus symbol)
? (question mark)
{ (left curly bracket, or left brace)
\ backslash
一旦你把以上这些字符包括在你的字符搜索中，毫无疑问Res变得非常的难读。比如说以下的PHP中的eregi搜索引擎代码就很难读了。

eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*$",$sendto)

你可以看到，程序的意图很难把握。但如果你抛开保留字符，你常常会错误地理解代码的意思。

总结
在本文中，我们揭开了正则表达式的神秘面纱，并列出了ERE标准的通用语法。如果你想阅览Open Group组织的规则的完整描述，你可以参见：Regular Expressions，欢迎你在其中的讨论区发表你的问题或观点。

另外一篇文章
----------------------------------------
正则表达式和Java编程语言
-----------------------------------------
类和方法

下面的类根据正则表达式指定的模式，与字符序列进行匹配。

Pattern类

Pattern类的实例表示以字符串形式指定的正则表达式，其语法类似于Perl所用的语法。

用字符串形式指定的正则表达式，必须先编译成Pattern类的实例。生成的模式用于创建Matcher对象，它根据正则表达式与任意字符序列进行匹配。多个匹配器可以共享一个模式，因为它是非专属的。

用compile方法把给定的正则表达式编译成模式，然后用 matcher方法创建一个匹配器，这个匹配器将根据此模式对给定输入进行匹配。pattern 方法可返回编译这个模式所用的正则表达式。

split方法是一种方便的方法，它在与此模式匹配的位置将给定输入序列切分开。下面的例子演示了：

/*
* 用split对以逗号和/或空格分隔的输入字符串进行切分。
*/
import java.util.regex.*;

public class Splitter {
public static void main(String[] args) throws Exception {
// Create a pattern to match breaks
Pattern p = Pattern.compile("[,\\s]+");
// Split input with the pattern
String[] result =
p.split("one,two, three four , five");
for (int i=0; iSystem.out.println(result);- e3 |1 s5 h- w3 d& H, h8 g
}
& J. |. r5 @% T5 e6 H; ~8 c5 m}4 B0 [$ C7 `( @" I% e% K

8 n& W7 s( R+ f" c; g# TMatcher类 9 Y& ?& p4 }' ]. _# j9 ?. u* W

6 l5 v0 K  n6 `$ M3 P8 z2 IMatcher类的实例用于根据给定的字符串序列模式，对字符序列进行匹配。使用CharSequence接口把输入提供给匹配器，以便支持来自多种多样输入源的字符的匹配。
6 O: {) }7 g0 l7 w3 K
3 B0 p& N. q: j0 p通过调用某个模式的matcher方法，从这个模式生成匹配器。匹配器创建之后，就可以用它来执行三类不同的匹配操作：) |) t  H' D% d# Y+ Q0 I

8 T- |: ^; v$ }# Lmatches方法试图根据此模式，对整个输入序列进行匹配。
! G0 r# S0 n) D; f8 i. c3 VlookingAt方法试图根据此模式，从开始处对输入序列进行匹配。
$ C0 x; l7 ~9 g: h$ O7 V3 H% sfind方法将扫描输入序列，寻找下一个与模式匹配的地方。 + K0 s* i  j+ o! h: a; V6 G6 l
* Q5 D" h, z8 A$ X1 }
这些方法都会返回一个表示成功或失败的布尔值。如果匹配成功，通过查询匹配器的状态，可以获得更多的信息
3 `9 h( j) _" [* \# \
3 Q$ U. G2 ]' p, Y这个类还定义了用新字符串替换匹配序列的方法，这些字符串的内容如果需要的话，可以从匹配结果推算得出。
/ q: D- E; g: A# O
( {7 k1 W4 L* t) lappendReplacement方法先添加字符串中从当前位置到下一个匹配位置之间的所有字符，然后添加替换值。appendTail添加的是字符串中从最后一次匹配的位置之后开始，直到结尾的部分。' O& Y4 x# R' ~  J+ a  }& j+ S
/ v1 i2 ^$ P% Z7 n9 B% ^8 J+ m6 Z
例如，在字符串blahcatblahcatblah中，第一个 appendReplacement添加blahdog。第二个 appendReplacement添加blahdog，然后 appendTail添加blah，就生成了： blahdogblahdogblah。请参见示例简单的单词替换。
) j3 x: V2 u5 m6 ~/ q
1 S( \: ?" Q6 n2 \" rCharSequence接口! q7 O1 C& h! q6 p" Z
7 T+ E* d8 j8 W) L4 N
CharSequence接口为许多不同类型的字符序列提供了统一的只读访问。你提供要从不同来源搜索的数据。用String, StringBuffer 和CharBuffer实现CharSequence,，这样就可以很容易地从它们那里获得要搜索的数据。如果这些可用数据源没一个合适的，你可以通过实现CharSequence接口，编写你自己的输入源。
: ]) r/ o% z4 q8 c5 M& C
8 W( w  O! _" _! X1 O( o& @1 kRegex情景范例4 [+ Y: i9 E- d( ]7 p* ~
- O! H" i6 H! S3 E% i! n. Y
以下代码范例演示了java.util.regex软件包在各种常见情形下的用法：
# h4 O7 w0 i8 v$ @) L  ?: }5 l/ W
+ ~' r5 A2 Q& ]  d1 ~) l简单的单词替换2 a, ~' p/ \: V- ]1 L0 I
5 B& V/ D1 d0 j% C- j9 Y( [
/*
9 }/ L3 Y) ]5 u1 G( c* This code writes "One dog, two dogs in the yard."
6 I3 m  Q0 v0 `" |: j7 n9 O* to the standard-output stream:* V' R8 ]9 f- a& Q* X
*/
9 c& |7 ^" f  n. r8 Mimport java.util.regex.*;  t3 o# I1 G0 Y% X8 J

6 s- s2 V  ]0 ]+ R7 t; ypublic class Replacement {/ B2 }, N! h( A% T( ^6 H) n
public static void main(String[] args) 2 b  h4 B* w4 H( A$ e
   throws Exception {) _2 _! }* G- o0 Y6 @
// Create a pattern to match cat
, m/ }9 b$ R) L) n' F6 qPattern p = Pattern.compile("cat");! W1 h$ d4 E: P; {0 s
// Create a matcher with an input string* F$ g+ ^9 e* v
Matcher m = p.matcher("one cat," +/ j  _- `; t. ]+ V# Q' d
   " two cats in the yard");0 u+ l# p: k6 L
StringBuffer sb = new StringBuffer();
1 T8 h0 w. I/ `) i& gboolean result = m.find();
. W) p- Q3 m& y( W) Z/ r// Loop through and create a new String
4 p. ^$ K1 j2 |% v$ J1 N2 w9 y// with the replacements
( v5 Y2 y( L* B, K7 owhile(result) {
' A. j; U* d: cm.appendReplacement(sb, "dog");
) E3 o; Y8 n9 N; X9 O! `) Nresult = m.find();
3 T% x1 T- _  P. E( @" g  r) f& Y}
9 @: `' s2 j* K( X3 o// Add the last segment of input to
6 C( E. k# L: X. c// the new String+ d( Z; A% t1 A0 z+ A5 i
m.appendTail(sb);0 k' Z+ s: u: B
System.out.println(sb.toString());2 n; i1 r. E0 Y9 F$ f  j' R6 l
}
/ D8 Z* ]' g1 _& D! `6 t; @. B5 e$ M}# y& ?9 w! M' [. c' w9 v
; Z* p, Z# C- m' {$ l
电子邮件确认
: {( O6 x  a( `! S9 g  Y6 ~1 D" `$ l  h  O' s9 e
以下代码是这样一个例子：你可以检查一些字符是不是一个电子邮件地址。它并不是一个完整的、适用于所有可能情形的电子邮件确认程序，但是可以在需要时加上它。# j  Y2 M5 w5 k0 n" Y2 `

+ g2 b/ P( K9 ^/*/ g. V8 l4 K* u' y1 }
* Checks for invalid characters
( `! y: z+ T( E& i* f# y! c2 n; {* in email addresses0 ~* H3 S$ q1 v5 x: U
*/6 Q0 `$ h4 e& f5 q' R
public class EmailValidation {
1 E, F& i8 t# {9 s4 f: p3 \! d3 jpublic static void main(String[] args) 7 [, y4 O4 i5 P$ ~1 `( ^( }9 `
         throws Exception {- F7 n, |* o3 D7 |2 _# o' m
         $ y6 k+ K0 W: N/ c. @* O, K
String input = "@sun.com";
' G: d# e2 w' q& R0 B" K; ]//Checks for email addresses starting with
% I, h9 Q! [- _! i* E$ X) V//inappropriate symbols like dots or @ signs.
9 M* m6 K' \9 z0 T+ K3 f/ LPattern p = Pattern.compile("^\\.|^\\@");
/ _+ y1 E4 T' q, o, ?Matcher m = p.matcher(input);% j& ~' o) v( q% b
if (m.find())$ B8 i( }5 O9 R6 E6 s
System.err.println("Email addresses don't start" +
9 v7 Y& s2 ?: k& I       " with dots or @ signs.");
* O: \6 P* D" Q0 p4 C% M, _//Checks for email addresses that start with
9 _/ B  v: [5 z+ a, {/ K//www. and prints a message if it does.
) P; ~0 [6 y7 ~- [' c6 j; X1 up = Pattern.compile("^www\\.");" }: ^& L0 B$ o
m = p.matcher(input);
: l6 t: S! a+ A2 Kif (m.find()) {0 F' j/ f- V" V$ N' G4 f( E
System.out.println("Email addresses don't start" +* G# K) b1 E# ^, Z! h
" with \"www.\", only web pages do.");
5 X) |& G  Y, q: U2 p/ P}
; t! n/ I+ T7 g% O- Np = Pattern.compile("[^A-Za-z0-9\\.\\@_\\-~#]+");2 F. z* Z- i0 W9 M3 c" J7 D
m = p.matcher(input);
% ]1 k/ Z7 \3 S+ XStringBuffer sb = new StringBuffer();7 h- A) t. v" x8 Z' y" E
boolean result = m.find();* X) Z) R7 R: R; a0 [6 G
boolean deletedIllegalChars = false;
  y2 q% ], x+ s% R" j, Q* i( ]+ u1 h3 v2 w2 d; s; G
while(result) {
- J+ a5 i0 W( i7 ]+ a. ]; [3 CdeletedIllegalChars = true;# ]% a7 _7 K" j3 f
m.appendReplacement(sb, "");
2 U! p/ o, S! T$ d& sresult = m.find();; G/ ]. s. V. T* T9 K
}
: W6 u+ i! \5 B6 D. D
% J# h9 g; v6 M// Add the last segment of input to the new String, u2 H1 A/ K. n4 g# \. z" L
m.appendTail(sb);
$ F6 z9 U* w$ m3 R' L
# w% p& E! A- p  S+ t: {input = sb.toString();: h# ~2 w/ a" `  @) {8 M3 z; s3 S6 I: C
( @, g7 K) X0 g% G, K  h' i
if (deletedIllegalChars) {/ a  A5 d" l. v( D3 c. T; y1 C, T
System.out.println("It contained incorrect characters" +/ m6 g; C) B( t4 J, l' b
   " , such as spaces or commas.");) R& f' G& p- Y; ]5 }& e( z0 H2 a
}
& T1 ^) i  j  L8 _1 j, v}
9 Z: J  T: r7 P9 X+ d+ f9 Y  S8 w}/ F1 |4 {5 E% d, m1 s

从文件中删除控制字符
' M& V: p4 i: M3 _/ w: g8 z) q7 Y" \, v* j: a
/* This class removes control characters from a named
) e: X( Y2 L* G; m0 |" }4 F% ^* file.! A9 n, l6 s9 u4 t
*/0 W& f, m- X9 P$ ?! c% t
import java.util.regex.*;
! m3 N; r7 [1 m1 Q# J2 D& Himport java.io.*;
0 x! Q, s  l+ h& [0 n7 g" K% O; u+ x) @* E* z3 u0 W5 E7 G& i
public class Control {/ N! j3 [+ v) G
public static void main(String[] args) / L7 o" I$ D" X( E! g% e/ o1 B  B: J
         throws Exception {
; u8 l4 I% X% B2 `0 n  {2 A
- i1 m+ a* r* w* M6 a//Create a file object with the file name* Z+ A' P' D* @! \. V
//in the argument:
- b: U9 c3 g( e/ H) C* l0 Q- A' B. F$ EFile fin = new File("fileName1");& C, r9 ~: a3 \( I+ r0 B# ]
File fout = new File("fileName2");9 e9 d5 `6 B/ V: t  w* t/ X" k: O/ V7 V/ ]
//Open and input and output stream& O6 u; n8 x& N! R0 K
FileInputStream fis =
9 j4 D- \9 a$ V; q' s6 ?1 Y    new FileInputStream(fin);3 Y. c% R  k! m6 d9 w
FileOutputStream fos =
* b$ f9 n" r. k5 `1 e7 Q8 b$ B# H    new FileOutputStream(fout);
& q) M; x$ A3 I  V% k5 Z! X3 Z8 A, {9 G
BufferedReader in = new BufferedReader(
$ ?, O' J9 \/ z    new InputStreamReader(fis));
( k  \* ]" i' wBufferedWriter out = new BufferedWriter(. h( }2 ^+ n' P. y' V0 {
   new OutputStreamWriter(fos));: I& t! |: r4 Y
/ t2 D( Y& Z  S( W5 F& q
// The pattern matches control characters: p2 _1 X5 c! ]) C2 I- f
Pattern p = Pattern.compile("{cntrl}");$ Z* q' A5 Z. z4 z) f' M: P
Matcher m = p.matcher("");
' c# {# J! s9 cString aLine = null;
% P) l5 d( ?; `/ O5 _  ~6 bwhile((aLine = in.readLine()) != null) {2 M& f9 I* Q$ R5 I) s5 K3 k
m.reset(aLine);4 V; Z* {) w  ]) t1 r, j% A. c
//Replaces control characters with an empty, L9 w7 ^" [1 ?8 [  b5 H
//string.
' r% \" D7 [9 P7 K6 }" L; }/ I' ~String result = m.replaceAll("");
8 V+ [5 z1 q  u* u- W; Zout.write(result);  a+ J; g( Z" n% D/ A0 J9 Q4 c5 L, t
out.newLine();0 B" y/ C1 I5 v( t  D0 m
}
* H! _# \% z2 Vin.close();
: Q0 V0 ^! ^8 W4 q8 W9 |8 g2 p2 nout.close();7 R# g( _) ~+ I( N+ c8 {
}
}
+ q% T6 m4 N2 p$ T* S
/ k4 {$ ~7 O- Q9 T文件查找 ; ^. S( f3 ?* Q* D3 U% _

6 z0 r+ |. W9 c) V* ?: s+ O& ?/*  R! n. q* C' g: O. |
* Prints out the comments found in a .java file.. V6 A+ a5 H- U4 v- Q4 `  ^
*/
# S# m8 a: ]0 I; v" Ximport java.util.regex.*;2 }) a9 L5 }* T$ ^6 I" l
import java.io.*;
/ n# v) u5 v5 ~import java.nio.*;9 z0 d  t0 s, N
import java.nio.charset.*;/ F0 V2 G* z" g" ]$ D4 R
import java.nio.channels.*;
( x4 b& W; ]0 M! {) s  ]+ H) U$ Y5 Y0 f9 x& g
public class CharBufferExample {* D$ Q: }+ `/ L8 n% ~
public static void main(String[] args) throws Exception {
0 G; Y( Q) s8 B// Create a pattern to match comments" ^& i7 _' N" P. D# f$ j
Pattern p =
% S, z8 g& R% u3 Q; M+ \1 jPattern.compile("//.*$", Pattern.MULTILINE);
4 V# W3 d+ P8 o
" E: X2 T/ S9 p  Z  D// Get a Channel for the source file
5 l1 S8 e* m7 G# Z; ~7 `File f = new File("Replacement.java");
% q% X8 s8 _, G( R! nFileInputStream fis = new FileInputStream(f);
7 T, I/ O* R+ D: I: gFileChannel fc = fis.getChannel();: [! G0 R  a' y; W1 V) r. v3 R

) s1 ^( z& }- o, [) M6 ]// Get a CharBuffer from the source file
. Z+ ^; Q, t  q: K0 E2 TByteBuffer bb =
: L4 I- Z3 W* T3 q# c8 g' Lfc.map(FileChannel.MAP_RO, 0, (int)fc.size());5 \* Z8 D# Q# k: y1 F
Charset cs = Charset.forName("8859_1");
; Z  D0 w: }' ICharsetDecoder cd = cs.newDecoder();2 O2 U% F; }) O: N
CharBuffer cb = cd.decode(bb);. ~8 {( [' U; g& u
' O3 h/ h$ C5 L2 L
// Run some matches
) ^, d7 ]+ N3 w$ j& tMatcher m = p.matcher(cb);
9 y: k7 m& X2 A, }while (m.find())$ b3 V( E5 d) p: x% I$ P( k
System.out.println("Found comment: "+m.group());
- p/ Y2 Q# e, U2 N# ~; b}
. ~5 g5 f( H1 a' k% ?}9 S5 \/ W8 ]8 Q: G2 J3 M
* ]: h7 f% d: r+ _' ~
结论
2 I5 f# f! @2 x2 n  Y4 v" L" J8 }现在Java编程语言中的模式匹配和许多其他编程语言一样灵活了。可以在应用程序中使用正则表达式，确保数据在输入数据库或发送给应用程序其他部分之前，格式是正确的，正则表达式还可以用于各种各样的管理性工作。简而言之，在Java编程中，可以在任何需要模式匹配的地方使用正则表达式。 , @8 n2 B9 R2 j1 {( n

9 T. X& _) L8 X$ g+ q! f* \4 EJDK1.4之正規表示式# J$ {4 V4 l7 @* R
written by william chen(06/19/2002)4 }7 z: R8 ^5 R/ b. c+ x+ s4 o% B

9 F/ \% E+ S  }" o--------------------------------------------------------------------------------
# p2 d- J8 A1 o) Q$ h  U' ~" e9 ~: J$ d$ F4 C# n9 u' H/ y, u
什麼是正規表示式呢(Reqular Expressions)
  F  n! C3 w& c" [
9 \" H  k! ~6 D; J0 V就是針對檔案、字串，透過一種很特別的表示式來作search與replace# t! I1 h! a) _1 q' B" l5 {

5 Y- I' F9 b& R因為在unix上有很多系統設定都是存放在文字檔中，因此網管或程式設計常常需要作搜尋與取代  Q0 d( r7 Q3 p) A" @7 w( W
" }) W* K; l/ U( O
所以發展出一種特殊的命令叫做正規表示式
2 {: C! E3 K# n) H, |" L6 ?
! j7 T) |0 m8 l' b# m我們可以很簡單的用 "s/* h" k" E2 i& i0 e3 p+ r
因此jdk1.4提供了一組正規表示式的package供大家使用
3 N8 m: ~; F$ i
6 `2 R1 z1 Z; X, d0 E若是jdk1.4以下的可以到http://jakarta.apache.org/oro取得相關功能的package9 d& r$ W2 \& V1 L, B! f
4 ~2 t9 ^1 R/ m3 p0 Z3 R
剛剛列出的一串符號" s/
+ v6 d" G" `! f適用於j2sdk1.4的正規語法
, S0 J% l8 F( o- i( I
3 o$ H" K5 D5 c7 I"." 代表任何字元4 d1 U/ Y3 N% _- O: A

" t* ~5 |4 H0 V) c7 f( ~3 }, h/ c正規式原字串符合之字串 - b" R+ q  S% l% B; q
. ab a
, m1 I) p  j* r8 @' D5 {.. abc ab
! r. p/ f( Y- C9 h! H# R: y* W
+ S/ q; w( B* a9 w* I"+" 代表一個或以個以上的字元
* `' r! N4 \% K# m) X7 k"*" 代表零個或是零個以上的字元+ j; x+ N. H$ ]* d' b! r: q; S
2 ^8 m% V0 x  v1 j$ b+ F6 Q. A: b& Q
正規式原字串符合之字串
( }8 u) {% E2 o5 U+ ab ab ( T- V+ X. z8 x2 K; i
* abc abc 1 n- x1 U. \4 U( p' w4 p

: e* `* U! I- i" A"( )"群組' Q  ]# s5 V9 X- |% P- w0 G
5 k7 D! a" l9 Y/ R' ?6 |2 r
正規式原字串符合之字串
, l' m6 K. e1 V; n(ab)* aabab abab
  b' ?0 i7 B, k* p; m5 x0 q& a3 d& |8 a% M5 T' W
字元類
2 q- s3 O1 T! ^3 c) o0 H
) X! N4 ~6 c+ j( E% g0 {9 k  a5 C& y正規式原字串符合之字串 7 v% j  [, s# E  O( m4 j
[a-dA-D0-9]* abczA0 abcA0 ' G% ]5 b/ q) x+ u( f
[^a-d]* abe0 e0
4 r" Q3 g7 g; w& S[a-d]* abcdefgh abab
% r0 t% t/ l) m
2 q/ p( x* D) N: ^! S9 ~% f
! |' K' [* @% B( x# \) o, r: r- ^" F簡式
9 s; K: u2 o: ~5 }. v0 t( n6 a, j
! p; V- i6 F( n* E; l\d 等於 [0-9] 數字 7 K& `2 o# Y6 T& q' ^: E* D7 s
\D 等於 [^0-9] 非數字
+ S& }9 R. h* ^9 b" E\s 等於 [ \t\n\x0B\f\r] 空白字元
; h* }* u$ i& S% R( X% W# ^* Y" o" L\S 等於 [^ \t\n\x0B\f\r] 非空白字元 1 x$ m* F/ A: p' B% ^# i
\w 等於 [a-zA-Z_0-9] 數字或是英文字
$ o+ V# a8 B! e4 O. S. }  a\W 等於 [^a-zA-Z_0-9] 非數字與英文字 # U4 |) N2 k, v& C3 M
- B2 {! Q# A+ G# h. I
每一行的開頭或結尾' I' y* p+ C) l% }1 r

2 i: X& R% I0 M% L) m^ 表示每行的開頭1 R) X; x3 |- `. m/ k9 d
$ 表示每行的結尾
* h  |7 l0 u5 X8 u# I. @' {  L, l2 Y; Z4 p8 f& K
--------------------------------------------------------------------------------
& J2 x4 E* ~5 J2 S! l% {4 f1 B
+ W: d0 [. i& P8 V2 L  W) c( L正規表示式 java.util.regex 相關的類別
% B9 X5 X. I! b  N% p  k! i( c# n! \1 e
Pattern—正規表示式的類別
2 l. r! n) r* U4 ~: Y7 cMatcher—經過正規化的結果7 k2 J/ h$ Q6 w
PatternSyntaxExpression—Exception thrown while attempting to compile a regular expression9 w0 T* W7 ~' u8 m9 c, M; y& Q

* O6 x" Z  V& H2 K0 N6 U範例1: 將字串中所有符合"<"的字元取代成"lt;": r5 p9 K3 S" o! d
7 x6 z- c4 [$ k6 U- m$ m
import java.io.*;
$ _' I0 @9 ~# C0 v5 Kimport java.util.regex.*;
8 a5 }6 ]5 {' D) L2 v, I/**
7 [: I& ~0 y0 [  V6 c* 將字串中所有符合"<"的字元取代成"lt;"
8 H# r/ X5 D9 I( g* ]# }6 {*/% x7 o/ P- I: t
public static void replace01(){
! F, M+ w4 {0 l// BufferedReader lets us read line-by-line  D1 n& ]' v) A6 ^
Reader r = new InputStreamReader( System.in );% A) i# W$ F1 [
BufferedReader br = new BufferedReader( r );+ R& V, d0 n+ F# b4 S+ U# m
Pattern pattern = Pattern.compile( "<" ); // 搜尋某字串所有符合'<'的字元
9 S% m9 s, `! n- x. ^4 Atry{
  u3 ~( W4 T) owhile (true) {# y, z' S' W9 f3 Z0 Z
String line = br.readLine();5 h5 v( A; b- U( d. S' ~4 [
// Null line means input is exhausted
8 `' U* R1 A7 dif (line==null)
5 ?, h. c( E4 }! Pbreak;. f+ @7 W/ J* a/ B, _
Matcher a = pattern.matcher(line);
6 T, t: l- l, t7 s' Gwhile(a.find()){
, b7 Y0 C+ ?. V, ?System.out.println("搜尋到的字元是" + a.group());/ J/ r$ U$ Y4 _" _0 A
}0 `3 W8 B4 `( e/ f' a* Z  b
System.out.println(a.replaceAll("lt;"));// 將所有符合字元取代成lt;
/ V$ F& m7 t7 d5 d* ?1 z# |}8 {0 X( v& U' x% H* g" {' V1 \
}catch(Exception ex){ex.printStackTrace();};
! H' {* J7 G7 p: j' U' S$ H}
9 |4 g! A- j' P/ R4 A1 g6 H4 |# |
5 J0 B: o- _/ U, g1 Z: X範例2: 8 X5 d7 R  b; a8 r2 v  V

) B2 O8 A; L% n3 O$ }( |import java.io.*;
/ c- v' d3 ^- u" s+ Zimport java.util.regex.*;% V* V7 n* X' }5 a: _8 H$ p7 V
/**
6 j; j% Y/ p$ |, ~# C3 G9 A* 類似StringTokenizer的功能
$ X4 v- g1 m1 A5 h/ A* 將字串以","分隔然後比對哪個token最長/ y/ D+ N* `, g4 [; z
*/
- z. u7 P6 F3 ~3 p% w; }public static void search01(){
, ^, l. H2 |3 m: r, B. ?: M// BufferedReader lets us read line-by-line
0 x- t; L! I  M8 nReader r = new InputStreamReader( System.in );$ h& s) t4 l4 i$ @
BufferedReader br = new BufferedReader( r );
1 w8 W1 p) l4 ZPattern pattern = Pattern.compile( ",\\s*" );// 搜尋某字串所有","的字元
% m9 R% l( [8 {8 Y* W1 W9 D$ Vtry{" o! ?+ @( a' D1 h! `( V* E  V
while (true) {
8 A/ E: G5 @9 O& l" o* }9 P! oString line = br.readLine();
1 K4 o' f8 q; G7 P5 L# _2 tString words[] = pattern.split(line);
# g$ j7 q' S: d3 S9 N// Null line means input is exhausted
" r2 u5 V* a# i  s8 gif (line==null)+ f2 T- O  y* t2 W
break;! M% ?, b4 w6 c- h7 Z6 _) \' z
// -1 means we haven't found a word yet# Y" G3 _$ I0 Z) o: K+ w
int longest=-1;. S7 r+ t7 [. L, V( v3 y+ o0 I- Q
int longestLength=0;- Q- @  u( w3 e" j: `
for (int i=0; iSystem.out.println("分段:" + words );
% y; E7 ?" L; a4 g* K2 }if (words.length() > longestLength) {: W4 P; b( I7 s/ S- z6 {
longest = i;5 S# }4 {! g" J! @1 `3 ]3 w
longestLength = words.length();+ s5 o/ l& k$ f. B* ~' g) N
}
3 F( W7 Q3 E0 J/ q, K}: q8 R9 c/ e! T# f
System.out.println( "長度最長為:" + words[longest] );5 a" M- \* ~3 m1 K
}
) q: A6 f7 n8 V3 D& v7 t% [, C* S' S0 {}catch(Exception ex){ex.printStackTrace();};6 \& L7 g* D5 A5 F/ h" ^  u9 p
}, o9 z# s: \4 c: s6 W  T7 k2 }
$ W4 {' k# Y7 f( v
--------------------------------------------------------------------------------
1 p' `( W, b7 R+ D1 h8 O5 W! H7 M: w) ]; y  ~
其他的正規語法
" |4 F' I. p2 m3 b& V
. g( {3 ]" x2 [! B) @3 @/^\s* # 忽略每行開始的空白字元
9 y% Y! \' ~( h: c, U* ?2 M4 r, @(M(s|r|rs)\.) # 符合 Ms., Mrs., and Mr. (titles)