Text processing is a cornerstone of Linux shell scripting, enabling users to manipulate, analyze, and transform textual data efficiently. Among the most powerful and commonly used tools for text processing are grep, sed, and awk.
Each serves distinct purposes: grep for pattern matching and searching, sed for stream editing and substitution, and awk for advanced data manipulation and reporting.
Using these tools effectively requires understanding their syntax, options, and integration with shell commands, making complex text processing feasible with concise scripts.
grep: Pattern Matching and Searching
grep (Global Regular Expression Print) searches input text for lines matching a specified pattern and prints those lines. It supports regular expressions, case sensitivity controls, and line context options.
Basic command syntax:
grep [options] pattern [file...]Examples:
1. Search for "ERROR" in a file:
grep "ERROR" logfile.txt2. Case-insensitive search:
grep -i "error" logfile.txt3. Display lines with line numbers:
grep -n "pattern" filename4. Show lines before or after a match for context:
grep -B 3 -A 3 "ERROR" logfile.txtgrep is invaluable for quickly filtering logs, configuration files, or command outputs.
sed: Stream Editor for Text Transformation
sed is a non-interactive text editor used for filtering and transforming text streams or files. It works line-by-line with powerful commands for substitutions, deletions, insertions, and complex pattern-based editing.
Basic substitution syntax:
sed 's/pattern/replacement/flags' inputfileCommon flags:
g - global replacement in line
i - case-insensitive matching
Examples:
1. Replace first occurrence of "foo" with "bar" in each line:
sed 's/foo/bar/' file.txt2. Replace all occurrences of "foo" with "bar":
sed 's/foo/bar/g' file.txt3. Delete lines matching a pattern:
sed '/pattern/d' file.txt4. Apply multiple commands:
sed -e 's/foo/bar/g' -e '/baz/d' file.txtsed excels in on-the-fly editing within scripts and pipelines without creating intermediate files.
awk: Data Extraction and Reporting Language
awk is a versatile programming language designed for data extraction and reporting. It processes text files line-by-line, splitting each line into fields, allowing condition-based actions, arithmetic, and string operations.
Basic syntax:
awk 'pattern { action }' filenameFields: $0 - whole line, $1 - first field, $2 - second field, etc.
Examples:
1. Print all lines where the second field equals "ERROR":
awk '$2 == "ERROR"' logfile.txt2. Print the first and third fields of each line:
awk '{ print $1, $3 }' file.txt3. Sum values in the third column:
awk '{ sum += $3 } END { print sum }' file.txt4. Conditional output with formatted printing:
awk '$3 > 100 { printf "High value: %s\n", $0 }' data.txtawk combines powerful pattern matching with programming constructs, enabling creation of detailed reports and data transformations directly within shell scripts.
Integrating grep, sed, and awk
These tools are often combined in pipelines for advanced text processing:
Example: Filtering log errors, replacing terms, and formatting output:
grep "ERROR" logfile.txt | sed 's/Error/ERROR/g' | awk '{ print $1, $2, $5 }'This command extracts error lines, standardizes the case of "ERROR," and prints specific fields.