join
Command in Linux
Summary
The join
command in Linux is used to combine lines from two files based on a common field. It's essentially a simplified database join operation performed on plain text files.
Introduction
The join
command is a powerful tool for merging data from different files that share a common key. It reads files line by line and outputs a new line for each matching pair of lines found in the input files. By default, it uses the first field (separated by whitespace) as the join field. However, you can specify different fields to use for joining. It's especially useful for combining data extracted from system logs, configuration files, or any other text-based data sources.
Use case and Examples
Basic Join: Joining two files on the first field
# file1.txt:
# 1 apple red
# 2 banana yellow
# 3 cherry red
# file2.txt:
# 1 price_apple 1.00
# 2 price_banana 0.50
# 3 price_cherry 1.50
join file1.txt file2.txt
file1.txt
and file2.txt
based on the first field (1, 2, 3). The output combines matching lines. Output: Joining on different fields: Using -1 and -2 flags
# file1.txt:
# user1 Alice Smith 123-456-7890
# user2 Bob Johnson 987-654-3210
# file2.txt:
# 123-456-7890 [email protected]
# 987-654-3210 [email protected]
join -1 3 -2 1 file1.txt file2.txt
file1.txt
(field 3) and file2.txt
(field 1), using the phone number as the common field. Output: 123-456-7890 Alice Smith user1 [email protected]
987-654-3210 Bob Johnson user2 [email protected]
Specifying a different field separator: Using -t flag
# file1.csv:
# 1,apple,red
# 2,banana,yellow
# file2.csv:
# 1,price_apple,1.00
# 2,price_banana,0.50
join -t ',' file1.csv file2.csv
file1.csv
and file2.csv
using ',' as the field separator. Output: Handling unpairable lines: Using -a flag
# file1.txt:
# 1 apple red
# 2 banana yellow
# 3 cherry red
# 4 grape purple
# file2.txt:
# 1 price_apple 1.00
# 2 price_banana 0.50
# 3 price_cherry 1.50
join -a 1 file1.txt file2.txt
file1.txt
and file2.txt
, and the -a 1
option tells join
to also output unpairable lines from the first file. Output: Commonly used flags
Flag | Description | Example |
---|---|---|
-1 FIELD | Specify the join field for the first file. FIELD is the field number (starting from 1). | join -1 2 file1.txt file2.txt (joins on the second field of file1.txt ) |
-2 FIELD | Specify the join field for the second file. FIELD is the field number (starting from 1). | join -1 1 -2 3 file1.txt file2.txt (joins on field 1 of file1 and field 3 of file2) |
-t CHAR | Specify the field separator character. Default is whitespace. | join -t ',' file1.csv file2.csv (uses comma as the field separator) |
-a FILENUM | Print unpairable lines in addition to paired lines. FILENUM is either 1 or 2, specifying which file to include unpaired lines from. | join -a 1 file1.txt file2.txt (prints unpaired lines from file1.txt ) |
-o FILEDOTFIELD | Construct output line by the specified FILEDOTFIELD list. | join -o 1.2,2.3 file1.txt file2.txt (output the 2nd field of file 1 and 3rd field of file 2) |
-e STRING | Replace missing input fields with STRING. | join -e "N/A" file1.txt file2.txt (replaces missing fields with "N/A") |
-i | Perform case-insensitive comparisons when finding matches. | join -i file1.txt file2.txt (joins ignoring case) |