Text Processing Integration

Lesson 6/40 | Study Time: 20 Min

Course: Linux Mastery: Master the Linux Command Line – Advanced Course

Text processing is a cornerstone of Linux shell scripting, enabling users to manipulate, analyze, and transform textual data efficiently. Among the most powerful and commonly used tools for text processing are grep, sed, and awk.

Each serves distinct purposes: grep for pattern matching and searching, sed for stream editing and substitution, and awk for advanced data manipulation and reporting.

Using these tools effectively requires understanding their syntax, options, and integration with shell commands, making complex text processing feasible with concise scripts.

grep: Pattern Matching and Searching

grep (Global Regular Expression Print) searches input text for lines matching a specified pattern and prints those lines. It supports regular expressions, case sensitivity controls, and line context options.

Basic command syntax:

text

grep [options] pattern [file...]

Examples:

1. Search for "ERROR" in a file:

text

grep "ERROR" logfile.txt

2. Case-insensitive search:

text

grep -i "error" logfile.txt

3. Display lines with line numbers:

text

grep -n "pattern" filename

4. Show lines before or after a match for context:

text

grep -B 3 -A 3 "ERROR" logfile.txt

grep is invaluable for quickly filtering logs, configuration files, or command outputs.

sed: Stream Editor for Text Transformation

sed is a non-interactive text editor used for filtering and transforming text streams or files. It works line-by-line with powerful commands for substitutions, deletions, insertions, and complex pattern-based editing.

Basic substitution syntax:

text

sed 's/pattern/replacement/flags' inputfile

Common flags:

g - global replacement in line

i - case-insensitive matching

Examples:

1. Replace first occurrence of "foo" with "bar" in each line:

text

sed 's/foo/bar/' file.txt

2. Replace all occurrences of "foo" with "bar":

text

sed 's/foo/bar/g' file.txt

3. Delete lines matching a pattern:

text

sed '/pattern/d' file.txt

4. Apply multiple commands:

text

sed -e 's/foo/bar/g' -e '/baz/d' file.txt

sed excels in on-the-fly editing within scripts and pipelines without creating intermediate files.

awk: Data Extraction and Reporting Language

awk is a versatile programming language designed for data extraction and reporting. It processes text files line-by-line, splitting each line into fields, allowing condition-based actions, arithmetic, and string operations.

Basic syntax:

text

awk 'pattern { action }' filename

Fields: $0 - whole line, $1 - first field, $2 - second field, etc.

Examples:

1. Print all lines where the second field equals "ERROR":

text

awk '$2 == "ERROR"' logfile.txt

2. Print the first and third fields of each line:

text

awk '{ print $1, $3 }' file.txt

3. Sum values in the third column:

text

awk '{ sum += $3 } END { print sum }' file.txt

4. Conditional output with formatted printing:

text

awk '$3 > 100 { printf "High value: %s\n", $0 }' data.txt

awk combines powerful pattern matching with programming constructs, enabling creation of detailed reports and data transformations directly within shell scripts.

Integrating grep, sed, and awk

These tools are often combined in pipelines for advanced text processing:

Example: Filtering log errors, replacing terms, and formatting output:

bash

grep "ERROR" logfile.txt | sed 's/Error/ERROR/g' | awk '{ print $1, $2, $5 }'

This command extracts error lines, standardizes the case of "ERROR," and prints specific fields.

Previous Lesson Next Lesson

Andrew Foster

Product Designer

Profile

Class Sessions

1- Scripting Fundamentals 2- Control Structures 3- Functions and Modularity 4- Arrays and Data Structures 5- Advanced Input/Output 6- Text Processing Integration 7- Error Handling and Debugging 8- System Automation Scripts 9- User and Group Management 10- Process and Job Control 11- System Monitoring and Performance 12- Service and Daemon Management 13- System Logging 14- File System and Storage 15- System Maintenance 16- Network Interface Configuration 17- TCP/IP Protocol Stack 18- Network Troubleshooting 19- Firewall and Security 20- Secure File Transfer and Remote Access 21- DNS and DHCP 22- File Permissions and Ownership 23- Authentication and Authorization 24- SELinux and Mandatory Access Control 25- Encryption and Secure Communication 26- System Auditing 27- Security Best Practices 28- Advanced File Operations 29- Batch File Processing 30- Text Transformation and Reporting 31- Data Compression 32- File Integrity and Backup 33- Container Basics 34- Container Orchestration 35- Virtualization Tools 36- System Integration 37- Configuration Management Basics 38- Advanced Scripting for Infrastructure 39- Deployment Automation 40- Monitoring and Alerting