Text File Processing and Search

Lesson 3/32 | Study Time: 25 Min

Course: Working with Linux: Developer Edition

Text file processing and searching are fundamental tasks in Linux development and system administration. Linux offers a rich set of command-line tools specifically designed to help users manipulate, analyze, and search text files efficiently.

Whether dealing with configuration files, logs, source code, or structured data, mastering these tools empowers developers to automate workflows, troubleshoot issues, and extract meaningful insights from large datasets.

cat: Displaying and Concatenating Files

The cat command is widely used to display the entire content of a text file in the terminal. Additionally, it can concatenate multiple files and create or append content to files.

Key uses:

Display file content on the terminal:

text

cat filename.txt

Concatenate multiple files:

text

cat file1.txt file2.txt > combined.txt

Append text to an existing file:

text

echo "New line" >> filename.txt

View hidden characters and line endings using cat -A which reveals tabs, line breaks, and other non-printable characters.

grep: Searching for Patterns in Files

The grep command allows searching for specific text patterns within files, supporting regular expressions and case sensitivity options.

Important options:

grep "pattern" filename searches for "pattern" in a file.

-i enables case-insensitive search.

-r (recursive) searches files inside directories and subdirectories.

-n displays line numbers of matching lines.

-v inverts the search, displaying lines that do NOT match the pattern.

Color highlighting for matches can be enabled with --color=auto.

Example:

text

grep -inr "error" /var/log/

searches all files recursively in /var/log/ for case-insensitive occurrences of "error" and shows line numbers.

awk: Versatile Text Processing and Reporting

awk is a powerful pattern scanning and text processing language, primarily used for extracting and manipulating columns or fields in structured data like CSV or log files.

Basic usage:

text

awk '{print $1, $3}' filename.txt

prints the first and third fields (columns) of each line.

Example:

text

awk -F: '{print $1}' /etc/passwd

lists all usernames from the passwd file by splitting on the colon delimiter.

sed: Stream Editing and Transformation

sed is a stream editor used to perform basic text transformations on input streams or files, including search and replace, insertion, and deletion.

Common usage:

1. Substituting text

text

sed 's/oldword/newword/g' filename.txt

replaces all instances of "oldword" with "newword".

2. Deleting lines containing a pattern:

text

sed '/pattern/d' filename.txt

3. Inserting or appending lines conditionally.

sed operates non-interactively and can be used in shell scripts to automate editing tasks.

cut and paste: Extracting and Combining Text Columns

cut extracts specific sections or columns from lines in a file, useful for CSV or delimited text data.

Examples:

Extract the first column separated by a delimiter:

text

cut -d ',' -f 1 data.csv

Extract character ranges:

text

cut -c 1-10 filename.txt

paste combines lines from multiple files horizontally, merging columns side by side.

wc: Counting Lines, Words, and Characters

The wc command counts the number of lines, words, and characters in a file and helps assess file size and content volume.

Usage:

text

wc filename.txt

Outputs counts: lines, words, and bytes.

Options can include:

-l for lines only

-w for words only

-c for characters only

Piping and Redirecting Data

Linux allows combining these commands with pipes (|) to build powerful data-processing pipelines for complex tasks.

Example:

text

grep "error" logfile.txt | wc -l

counts how many lines in logfile.txt contain the word "error".

Similarly,

text

cat file.txt | awk '{print $2}' | sort | uniq -c | sort -nr

extracts the second column, counts unique occurrences, and sorts them by frequency.

Previous Lesson Next Lesson

Andrew Foster

Product Designer

Profile

Class Sessions

1- Core Navigation Commands 2- File and Directory Operations 3- Text File Processing and Search 4- Piping and Command Chaining 5- Bash Fundamentals and Configuration 6- Zsh Advanced Features and Plugins 7- Shell Customization Techniques 8- Interactive Shell Features and Productivity 9- Git Fundamentals and Repository Setup 10- Basic Git Operations 11- Branching and Merging Strategies 12- Collaborative Git Workflows 13- Shell Script Fundamentals 14- Control Structures and Functions 15- Practical Automation Examples 16- Error Handling and Debugging Scripts 17- Terminal Emulators and Environment Setup 18- File and System Monitoring Tools 19- Search, Filtering, and Text Processing 20- Development Stack Integration 21- GDB (GNU Debugger) for Application Debugging 22- System Call and Library Tracing 23- Performance Profiling and Analysis 24- Log Analysis and Real-Time Monitoring 25- System Package Managers 26- Language-Specific Package Managers 27- Dependency Lockfiles and Reproducibility 28- Security and Vulnerability Management 29- IDE and Editor Setup 30- Development Stack Configuration 31- Keyboard-Driven Workflows and Efficiency Hacks 32- Font and Visual Configuration