《AWK程序设计语言》笔记-基本使用

awk 介绍

awk是一种使用方便且表现力很强的编程语言,它可以应用在多种不同的计算与数据处理任务中。 每一个awk程序都是由一个或多个 模式–动作 语句组成的序列:awk pattern {action}

awk内建变量

变量 意义 默认值
ARGC 命令行参数的个数 -
ARGV 命令行参数数组 -
FILENAME 当前输入文件名 -
FNR 当前输入文件的记录个数 -
FS 控制着输入行的字段分隔符 “ ”
NF 当前记录的字段个数 -
NR 到目前为止读的记录数量 -
OFMT 数值的输出格式 “%.6g”
OFS 输出字段分隔符 “ ”
ORS 输出的记录的分隔符 “\n”
RLENGTH 被函数match匹配的字符串的长度 -
RS 控制着输入行的记录分隔符 “\n”
RSTART 被函数match匹配的字符串的开始
SUBSEP 小标分割符 “\034”

awk格式化输出

1
2
3
4
5
6
7
8
9
10
11
12
13
[[email protected] awk]$ cat 1.txt
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18

[[email protected] awk]$ awk ' $3 > 0 {print "total pay for", $1, "is", $2*$3}' 1.txt
total pay for Kathy is 40
total pay for Mark is 100
total pay for Mary is 121
total pay for Susie is 76.5

使用printf格式化awk输出

1
2
3
4
5
[[email protected] awk]$ awk '$3 > 0 {printf("total pay for %s is %.2f\n",$1,$2*$3 )}' 1.txt 
total pay for Kathy is 40.00
total pay for Mark is 100.00
total pay for Mary is 121.00
total pay for Susie is 76.50

printf 不会自动产生空格或换行符, 需要自己显式的加上

结合 sort 对awk 格式化的输出进行排序

1
2
3
4
5
[[email protected] awk]$ awk '$3 > 0 {printf("%-8s is %6.2f\n",$1,$2*$3 )}' 1.txt  | sort -k 3 -n
Kathy is 40.00
Susie is 76.50
Mark is 100.00
Mary is 121.00

awk模式匹配

1
2
[[email protected] awk]$ awk '$1 ~ /Sus/ {print $0}' 1.txt 
Susie 4.25 18

awk BEGIN/END

特殊的模式 BEGIN 在第一个输入文件的第一行之前被匹配, END 在最后一个输入文件的最后一行 被处理之后匹配

1
2
3
4
5
6
7
8
9
10
11
[[email protected] awk]$ awk 'BEGIN {print "NAME RATE HOURS";print ""}{print }END {print "DONE"}' 1.txt 
NAME RATE HOURS

Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18

DONE

awk 计算

1
2
[[email protected] awk]$ awk '{pay = pay+$2*$3}END {print "total pay is", pay, "average pay is", pay/NR}' 1.txt 
total pay is 337.5 average pay is 56.25

awk变量作为数值使用时,默认初始值为0,作为字符串时默认值为空字符串,不需要进行初始化.

1
2
3
4
[[email protected] awk]$ awk '$3 > 15 {emp = emp+1}END {print emp, "employees worded more than 15 hours"}' 1.txt 
3 employees worded more than 15 hours
[[email protected] awk]$ awk '{names = names $1 " "}END {print names}' 1.txt
Beth Dan Kathy Mark Mary Susie

awk 流程控制

if/else/while/for

awk提供了用于决策的if-else语句, 以及循环语句, 只能用在action里.

1
2
3
4
[[email protected] awk]$ awk '{for(i=0;i<$2;i=i+1) if(i==4){print $0, count} else{ count = count + 1} count = 0}' 1.txt 
Mark 5.00 20 4
Mary 5.50 22 4
Susie 4.25 18 4

awk数组

awk数组用来存储一组相关的值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
借助数组统计次数
[[email protected] awk]$ cat 2.txt
1
2
3
4
5
1
2
4
5
7
[[email protected] awk]$ awk '{count[$1]++}END{for(i in count) {printf( "%d appears %d times\n", i,count[i])}}' 2.txt | sort -n
1 appears 2 times
2 appears 2 times
3 appears 1 times
4 appears 2 times
5 appears 2 times
7 appears 1 times

搭配模式匹配
[[email protected] awk]$ cat countries
USSR 8649 275 Asia
Canada 3852 25 North America
China 3705 1032 Asia
USA 3615 237 North America
Brazil 3286 134 South America
India 1267 746 Asia
Mexico 762 78 North America
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe
[[email protected] awk]$ awk '$4 ~ /Asia/ {pop["Asia"] += $3}; $4 ~ /Europe/ {pop["Europe"] += $3} END {print "Asian population is", pop["Asia"], "million"; print "European population is", pop["Europe"], "million"}' countries
Asian population is 2173 million
European population is 172 million

您的支持将鼓励我继续创作!