USD ($)
$
United States Dollar
Euro Member Countries
India Rupee
د.إ
United Arab Emirates dirham
ر.س
Saudi Arabia Riyal

Text Processing Integration

Lesson 6/40 | Study Time: 20 Min

Text processing is a cornerstone of Linux shell scripting, enabling users to manipulate, analyze, and transform textual data efficiently. Among the most powerful and commonly used tools for text processing are grep, sed, and awk.

Each serves distinct purposes: grep for pattern matching and searching, sed for stream editing and substitution, and awk for advanced data manipulation and reporting. 

Using these tools effectively requires understanding their syntax, options, and integration with shell commands, making complex text processing feasible with concise scripts. 

grep: Pattern Matching and Searching

grep (Global Regular Expression Print) searches input text for lines matching a specified pattern and prints those lines. It supports regular expressions, case sensitivity controls, and line context options.


Basic command syntax:

text
grep [options] pattern [file...]


Examples:


1. Search for "ERROR" in a file:

text
grep "ERROR" logfile.txt


2. Case-insensitive search:

text
grep -i "error" logfile.txt


3. Display lines with line numbers:

text
grep -n "pattern" filename


4. Show lines before or after a match for context:

text
grep -B 3 -A 3 "ERROR" logfile.txt


grep is invaluable for quickly filtering logs, configuration files, or command outputs.

sed: Stream Editor for Text Transformation

sed is a non-interactive text editor used for filtering and transforming text streams or files. It works line-by-line with powerful commands for substitutions, deletions, insertions, and complex pattern-based editing.


Basic substitution syntax:

text
sed 's/pattern/replacement/flags' inputfile


Common flags:


g - global replacement in line

i - case-insensitive matching


Examples:


1. Replace first occurrence of "foo" with "bar" in each line:

text
sed 's/foo/bar/' file.txt


2. Replace all occurrences of "foo" with "bar":

text
sed 's/foo/bar/g' file.txt


3. Delete lines matching a pattern:

text
sed '/pattern/d' file.txt


4. Apply multiple commands:

text
sed -e 's/foo/bar/g' -e '/baz/d' file.txt


sed excels in on-the-fly editing within scripts and pipelines without creating intermediate files.

awk: Data Extraction and Reporting Language

awk is a versatile programming language designed for data extraction and reporting. It processes text files line-by-line, splitting each line into fields, allowing condition-based actions, arithmetic, and string operations.


Basic syntax:

text
awk 'pattern { action }' filename


Fields: $0 - whole line, $1 - first field, $2 - second field, etc.


Examples:


1. Print all lines where the second field equals "ERROR":

text
awk '$2 == "ERROR"' logfile.txt


2. Print the first and third fields of each line:

text
awk '{ print $1, $3 }' file.txt


3. Sum values in the third column:

text
awk '{ sum += $3 } END { print sum }' file.txt


4. Conditional output with formatted printing:

text
awk '$3 > 100 { printf "High value: %s\n", $0 }' data.txt


awk combines powerful pattern matching with programming constructs, enabling creation of detailed reports and data transformations directly within shell scripts.

Integrating grep, sed, and awk

These tools are often combined in pipelines for advanced text processing:

Example: Filtering log errors, replacing terms, and formatting output:

bash
grep "ERROR" logfile.txt | sed 's/Error/ERROR/g' | awk '{ print $1, $2, $5 }'

This command extracts error lines, standardizes the case of "ERROR," and prints specific fields.