uniq
Command in Linux: Eliminating Duplicate Lines
Summary
The uniq
command in Linux is a simple but powerful tool used to filter out adjacent matching lines from an input file or stream. It's commonly used to clean up data, identify unique entries, and prepare data for further analysis.
Introduction
The uniq
command is a line-oriented filter that reads standard input or a specified file, compares adjacent lines, and writes a single copy of each unique line to standard output. Note that uniq
only compares adjacent lines. This means that to effectively remove all duplicates in a file, you often need to sort the file first using the sort
command. uniq
is particularly useful when dealing with large datasets or log files where identifying unique entries is essential.
Use Case and Examples
Removing Duplicate Lines from a File
This command readsfile.txt
, removes adjacent duplicate lines, and prints the unique lines to the terminal. The original file.txt
remains unchanged. Removing Duplicate Lines and Saving to a New File
This command readsfile.txt
, removes adjacent duplicate lines, and saves the unique lines to a new file named output.txt
. Counting the Number of Occurrences of Unique Lines
This command readsfile.txt
, removes adjacent duplicate lines, and prefixes each unique line with the number of times it appeared consecutively. Ignoring Case When Comparing Lines
This command readsfile.txt
and treats lines as identical regardless of case (e.g., "Line" and "line" are considered the same). Sorting and Removing Duplicates from a File
This command first sorts the lines infile.txt
and then pipes the sorted output to uniq
, which removes all duplicate lines, even if they were not originally adjacent. This ensures all duplicates are removed. Keeping Only Duplicate Lines
This command readsfile.txt
and only prints the lines that are duplicated in the file (adjacent duplicates only). Commonly used flags
Flag | Description | Example |
---|---|---|
-c | Prefix each output line with the count of the number of times the line occurred. | uniq -c file.txt |
-d | Only print duplicate lines, one for each group of identical lines. | uniq -d file.txt |
-i | Ignore differences in case when comparing lines. | uniq -i file.txt |
-u | Only print unique lines, lines that do not have adjacent duplicates. | uniq -u file.txt |
-s, --skip-chars=N | Avoid comparing the first N characters. | uniq -s 3 file.txt (Skips the first 3 characters of each line when comparing.) |
-w, --check-chars=N | Compare no more than N characters in lines. | uniq -w 5 file.txt (Compares only the first 5 characters of each line.) |