正则表达式是处理字符串的方法,以行为单位,通过一些特殊符号的辅助,让用户可以轻易进行查找、删除、替换某特定字符串的操作。
网友看法,有些道理,直接摘抄了:
通配符是系统level的,通配符多用在文件名上,比如查找find,ls,cp,等等;
而正则表达式需要相关工具的支持: egrep, awk, vi, perl。在文本过滤工具里,都是用正则表达式,比如像awk,sed等,是针对文件的内容的。不是所有工具(命令)都支持正则表达式。
说白了就是有些命令支持正则表达式,一些不支持。
不通语系,对字符的翻译规则不通,例如
LANG=C, 顺序为:0,1,2,3,4....A,B,C,D......Za,b,c,d....z
LANG=zh_CN,顺序为:0,1,2,3,4....a,A,b,B,c,C,d,D......z,Z
特殊符号可以规避语系的影响,一些常用的特殊符号:
[:alnum:] 所有的字母和数字,0-9,A-Z,a-z [:alpha:] 所有的字母,A-Z,a-z [:blank:] 所有呈水平排列的空白字符,空格和TAB [:cntrl:] 所有的控制字符,CR,LF,TAL,DEL等 [:digit:] 所有的数字,0-9 [:graph:] 所有的可打印字符,不包括空格(空格和TAB)外的所有按键 [:lower:] 所有的小写字母,,a-z [:print:] 所有的可打印字符,包括空格 [:punct:] 所有的标点字符 [:space:] 所有呈水平或垂直排列的空白字符 [:upper:] 所有的大写字母,A-Z [:xdigit:] 所有的十六进制数,0-9,A-Z,a-z的数字与字符
grep [-A] [-B] ‘搜索字符串’ filename
-A: after + 数字n,除了该行,列出后面的n行,-An,无空格
-B:before + 数字n,除了该行,列出前面的n行,-Bn,无空格
:/$ dmesg | grep -n 'eth' 1564:[ 2.427478] e1000 0000:02:01.0 eth0: (PCI:66MHz:32-bit) 00:0c:29:93:15:12 1565:[ 2.427489] e1000 0000:02:01.0 eth0: Intel(R) PRO/1000 Network Connection 1569:[ 2.433153] e1000 0000:02:01.0 ens33: renamed from eth0 :/$ dmesg | grep -n -A3 -B2 'eth' #-A和-B紧接数字,没有空格 1562-[ 2.364718] Console: switching to colour frame buffer device 100x37 1563-[ 2.395386] [drm] Initialized vmwgfx 2.9.0 20150810 for 0000:00:0f.0 on minor 0 1564:[ 2.427478] e1000 0000:02:01.0 eth0: (PCI:66MHz:32-bit) 00:0c:29:93:15:12 1565:[ 2.427489] e1000 0000:02:01.0 eth0: Intel(R) PRO/1000 Network Connection 1566-[ 2.427823] ahci 0000:02:05.0: version 3.0 1567-[ 2.428781] ahci 0000:02:05.0: AHCI 0001.0300 32 slots 30 ports 6 Gbps 0x3fffffff impl SATA mode 1568-[ 2.428784] ahci 0000:02:05.0: flags: 64bit ncq clo only 1569:[ 2.433153] e1000 0000:02:01.0 ens33: renamed from eth0 1570-[ 2.445075] scsi host3: ahci 1571-[ 2.445243] scsi host4: ahci 1572-[ 2.445375] scsi host5: ahci
使用鸟哥的例子,regular_express.txt
注意:[^]代表反向选取, 在括号外面^[]表示行首
:~/test$ grep -n '^the' regular_express.txt #找行首是the的 12:the symbol '*' is represented as start. :~/test$ grep -n '^[a-z]' regular_express.txt #行首是小写字符的 2:apple is my favorite food. 4:this dress doesn't fit me. 10:motorcycle is cheap than car. 12:the symbol '*' is represented as start. 18:google is the best tools for search keyword. 19:goooooogle yes! 20:go! go! Let's go. :~/test$ grep -n '^[^a-zA-Z]' regular_express.txt #行首不是字符的 1:"Open Source" is a good mechanism to develop programs. :~/test$ grep -n '\.$' regular_express.txt #行尾是点.的,点前加了转义字符,以为.本身是特殊字符20:go! go! Let's go.
*:重复前一个字符0到无穷多次的意思,例如a*,代表 “空~无穷多个a”。与通配符不同,通配符中*表示0到多个字符,a*表示a或者“a若干字符”
. : 一定有一个任意字符
:~/test$ grep -n 'g..d' regular_express.txt # g..d表示g和d之间一定有2个任意字符 1:"Open Source" is a good mechanism to develop programs. 9:Oh! The soup taste good.^M 16:The world <Happy> is the same with "glad". :~/test$ grep -n 'ooo*' regular_express.txt #ooo*,表示有2~无穷多个o 1:"Open Source" is a good mechanism to develop programs. 2:apple is my favorite food. 3:Football game is not use feet only. 9:Oh! The soup taste good.^M 18:google is the best tools for search keyword. 19:goooooogle yes!:~/test$ grep -n 'g.*g' regular_express.txt # g.*g,找g开头,g结尾的字符,“.*”可以理解成0个或人一多个字符,与通配符中的*相当了1:"Open Source" is a good mechanism to develop programs.14:The gd software is a library for drafting programs.^M18:google is the best tools for search keyword.19:goooooogle yes!20:go! go! Let's go.:~/test$ grep -n 'g*g' regular_express.txt # g*g,不一定是g开头,g结尾,因为g*表示0~无穷多个g,1:"Open Source" is a good mechanism to develop programs.3:Football game is not use feet only.9:Oh! The soup taste good.^M13:Oh! My god!14:The gd software is a library for drafting programs.^M16:The world <Happy> is the same with "glad".17:I like dog.18:google is the best tools for search keyword.19:goooooogle yes!20:go! go! Let's go.
^word: word在行首
word$: word在行尾
. :一定有一个任意字符
\ :转义
* : 0~无穷多个前一字符
[list] :在list中的1个字符
[n1-n2] :在字符范围内的1个字符
[^lish] :反向选取
\{n1,n2\} : 连续n1到n2个前一字符
管道命令,可以进行数据替换、删除、新增、选取特定行等。
按字段处理
+ 重复1个或1个以上前一个字符
? 0个或1个前一个字符
| 或 'glad|good'
() 分组 g(la|oo)d
()+ 1个或多个重复组
转载于:https://www.cnblogs.com/liuwanpeng/p/6226656.html