Text transformation and reporting are essential tasks in Linux system administration and data processing. Tools like sed, awk, and grep allow administrators to manipulate streams of text, generate structured reports, and perform complex data extraction through pattern matching. Efficient use of these utilities enables processing of single or multiple files seamlessly for automation, analysis, and auditing purposes.
Stream Editing with sed
sed is a stream editor for filtering and transforming text in-line or from files. Common uses include, substitution, deletion, insertion, and complex pattern-based editing.
1. Syntax for substitution:
sed 's/pattern/replacement/g' inputfileg flag replaces all occurrences on a line.
2. Deleting lines matching a pattern:
sed '/pattern/d' file.txt3. Multiple commands with -e:
sed -e 's/old/new/g' -e '/^$/d' file.txt4. Supports regular expressions for powerful matching.
awk is a domain-specific language designed for text processing and reporting. It splits input lines into fields and executes actions on conditions.
Example: Print the first and third fields of a CSV file:
awk -F',' '{ print $1, $3 }' file.csvSummation example:
awk '{ sum += $2 } END { print sum }' data.txtIt supports built-in variables, conditional statements, loops, and formatted output (printf).
Complex Pattern Matching with grep
grep searches input for lines matching patterns using regular expressions. Options for case-insensitivity (-i), context lines (-A, -B), counting matches (-c), and recursive search (-r).
Example:
grep -i "error" /var/log/*.logUseful for quick extraction and filtering of information across files.
Combine tools in pipelines to extract and format data:
Example: Extract users from /etc/passwd and sort:
awk -F: '{ print $1 }' /etc/passwd | sortUse sed to clean or modify data before reporting.
Multi-file Processing
awk and sed can process multiple files by listing them or using wildcards.
Example with awk:
awk '/pattern/ {print FILENAME ": " $0}' *.logxargs or shell loops combined with these tools enable bulk data transformations.