`uniq` Command in Linux: Eliminating Duplicate Lines

Summary

The uniq command in Linux is a simple but powerful tool used to filter out adjacent matching lines from an input file or stream. It's commonly used to clean up data, identify unique entries, and prepare data for further analysis.

Introduction

The uniq command is a line-oriented filter that reads standard input or a specified file, compares adjacent lines, and writes a single copy of each unique line to standard output. Note that uniq only compares adjacent lines. This means that to effectively remove all duplicates in a file, you often need to sort the file first using the sort command. uniq is particularly useful when dealing with large datasets or log files where identifying unique entries is essential.

Use Case and Examples

Removing Duplicate Lines from a File

uniq file.txt

This command reads file.txt, removes adjacent duplicate lines, and prints the unique lines to the terminal. The original file.txt remains unchanged.

Removing Duplicate Lines and Saving to a New File

uniq file.txt output.txt

This command reads file.txt, removes adjacent duplicate lines, and saves the unique lines to a new file named output.txt.

Counting the Number of Occurrences of Unique Lines

uniq -c file.txt

This command reads file.txt, removes adjacent duplicate lines, and prefixes each unique line with the number of times it appeared consecutively.

Ignoring Case When Comparing Lines

uniq -i file.txt

This command reads file.txt and treats lines as identical regardless of case (e.g., "Line" and "line" are considered the same).

Sorting and Removing Duplicates from a File

sort file.txt | uniq

This command first sorts the lines in file.txt and then pipes the sorted output to uniq, which removes all duplicate lines, even if they were not originally adjacent. This ensures all duplicates are removed.

Keeping Only Duplicate Lines

uniq -d file.txt

This command reads file.txt and only prints the lines that are duplicated in the file (adjacent duplicates only).

Commonly used flags

Flag	Description	Example
`-c`	Prefix each output line with the count of the number of times the line occurred.	`uniq -c file.txt`
`-d`	Only print duplicate lines, one for each group of identical lines.	`uniq -d file.txt`
`-i`	Ignore differences in case when comparing lines.	`uniq -i file.txt`
`-u`	Only print unique lines, lines that do not have adjacent duplicates.	`uniq -u file.txt`
`-s, --skip-chars=N`	Avoid comparing the first N characters.	`uniq -s 3 file.txt` (Skips the first 3 characters of each line when comparing.)
`-w, --check-chars=N`	Compare no more than N characters in lines.	`uniq -w 5 file.txt` (Compares only the first 5 characters of each line.)

Share on Share on

uniq Command in Linux: Eliminating Duplicate Lines

Introduction

Use Case and Examples

Commonly used flags

Comments

`uniq` Command in Linux: Eliminating Duplicate Lines