《AWK程序设计语言》笔记-基本使用

awk 介绍

awk是一种使用方便且表现力很强的编程语言,它可以应用在多种不同的计算与数据处理任务中。 每一个awk程序都是由一个或多个 模式–动作 语句组成的序列:awk pattern {action}

awk内建变量

变量 意义 默认值
ARGC 命令行参数的个数 -
ARGV 命令行参数数组 -
FILENAME 当前输入文件名 -
FNR 当前输入文件的记录个数 -
FS 控制着输入行的字段分隔符 “ ”
NF 当前记录的字段个数 -
NR 到目前为止读的记录数量 -
OFMT 数值的输出格式 “%.6g”
OFS 输出字段分隔符 “ ”
ORS 输出的记录的分隔符 “\n”
RLENGTH 被函数match匹配的字符串的长度 -
RS 控制着输入行的记录分隔符 “\n”
RSTART 被函数match匹配的字符串的开始
SUBSEP 小标分割符 “\034”

awk格式化输出

[[email protected] awk]$ cat 1.txt
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18

[[email protected] awk]$ awk ' $3 > 0 {print "total pay for", $1, "is", $2*$3}' 1.txt 
total pay for Kathy is 40
total pay for Mark is 100
total pay for Mary is 121
total pay for Susie is 76.5

使用printf格式化awk输出

[[email protected] awk]$ awk '$3 > 0 {printf("total pay for %s is %.2f\n",$1,$2*$3 )}' 1.txt 
total pay for Kathy is 40.00
total pay for Mark is 100.00
total pay for Mary is 121.00
total pay for Susie is 76.50

printf 不会自动产生空格或换行符, 需要自己显式的加上

结合 sort 对awk 格式化的输出进行排序

[[email protected] awk]$ awk '$3 > 0 {printf("%-8s is %6.2f\n",$1,$2*$3 )}' 1.txt  | sort -k 3 -n
Kathy    is  40.00
Susie    is  76.50
Mark     is 100.00
Mary     is 121.00

awk模式匹配

[[email protected] awk]$ awk '$1 ~ /Sus/ {print $0}' 1.txt 
Susie 4.25 18

awk BEGIN/END

特殊的模式 BEGIN 在第一个输入文件的第一行之前被匹配, END 在最后一个输入文件的最后一行 被处理之后匹配

[[email protected] awk]$ awk 'BEGIN {print "NAME RATE HOURS";print ""}{print }END {print "DONE"}' 1.txt 
NAME RATE HOURS

Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18

DONE

awk 计算

[[email protected] awk]$ awk '{pay = pay+$2*$3}END {print "total pay is", pay, "average pay is", pay/NR}' 1.txt 
total pay is 337.5 average pay is 56.25

awk变量作为数值使用时,默认初始值为0,作为字符串时默认值为空字符串,不需要进行初始化.

[[email protected] awk]$ awk '$3 > 15 {emp = emp+1}END {print emp, "employees worded more than 15 hours"}' 1.txt 
3 employees worded more than 15 hours
[[email protected] awk]$ awk '{names = names $1 " "}END {print names}' 1.txt 
Beth Dan Kathy Mark Mary Susie 

awk 流程控制

if/else/while/for

awk提供了用于决策的if-else语句, 以及循环语句, 只能用在action里.

[[email protected] awk]$ awk '{for(i=0;i<$2;i=i+1) if(i==4){print $0, count} else{ count = count + 1} count = 0}' 1.txt 
Mark 5.00 20 4
Mary 5.50 22 4
Susie 4.25 18 4

awk数组

awk数组用来存储一组相关的值

借助数组统计次数
[[email protected] awk]$ cat 2.txt 
1
2
3
4
5
1
2
4
5
7
[[email protected] awk]$ awk '{count[$1]++}END{for(i in count) {printf( "%d appears %d times\n", i,count[i])}}' 2.txt  | sort -n
1 appears 2 times
2 appears 2 times
3 appears 1 times
4 appears 2 times
5 appears 2 times
7 appears 1 times

搭配模式匹配
[[email protected] awk]$ cat countries 
USSR 8649 275 Asia 
Canada 3852 25 North America 
China 3705 1032 Asia 
USA 3615 237 North America 
Brazil 3286 134 South America 
India 1267 746 Asia
Mexico 762 78 North America 
France 211 55 Europe 
Japan 144 120 Asia 
Germany 96 61 Europe 
England 94 56 Europe
[[email protected] awk]$ awk '$4 ~ /Asia/ {pop["Asia"] += $3}; $4 ~ /Europe/ {pop["Europe"] += $3} END {print "Asian population is", pop["Asia"], "million"; print "European population is", pop["Europe"], "million"}' countries 
Asian population is 2173 million
European population is 172 million

wechat
微信扫一扫,订阅我的博客动态^_^