Skip to content

join Command in Linux

Summary

The join command in Linux is used to combine lines from two files based on a common field. It's essentially a simplified database join operation performed on plain text files.

Introduction

The join command is a powerful tool for merging data from different files that share a common key. It reads files line by line and outputs a new line for each matching pair of lines found in the input files. By default, it uses the first field (separated by whitespace) as the join field. However, you can specify different fields to use for joining. It's especially useful for combining data extracted from system logs, configuration files, or any other text-based data sources.

Use case and Examples

Basic Join: Joining two files on the first field

# file1.txt:
# 1 apple red
# 2 banana yellow
# 3 cherry red

# file2.txt:
# 1 price_apple 1.00
# 2 price_banana 0.50
# 3 price_cherry 1.50

join file1.txt file2.txt
This joins file1.txt and file2.txt based on the first field (1, 2, 3). The output combines matching lines. Output:
1 apple red price_apple 1.00
2 banana yellow price_banana 0.50
3 cherry red price_cherry 1.50

Joining on different fields: Using -1 and -2 flags

# file1.txt:
# user1 Alice Smith 123-456-7890
# user2 Bob Johnson 987-654-3210

# file2.txt:
# 123-456-7890 [email protected]
# 987-654-3210 [email protected]

join -1 3 -2 1 file1.txt file2.txt
This joins file1.txt (field 3) and file2.txt (field 1), using the phone number as the common field. Output:
123-456-7890 Alice Smith user1 [email protected]
987-654-3210 Bob Johnson user2 [email protected]

Specifying a different field separator: Using -t flag

# file1.csv:
# 1,apple,red
# 2,banana,yellow

# file2.csv:
# 1,price_apple,1.00
# 2,price_banana,0.50

join -t ',' file1.csv file2.csv
This joins file1.csv and file2.csv using ',' as the field separator. Output:
1,apple,red,price_apple,1.00
2,banana,yellow,price_banana,0.50

Handling unpairable lines: Using -a flag

# file1.txt:
# 1 apple red
# 2 banana yellow
# 3 cherry red
# 4 grape purple

# file2.txt:
# 1 price_apple 1.00
# 2 price_banana 0.50
# 3 price_cherry 1.50

join -a 1 file1.txt file2.txt
This joins file1.txt and file2.txt, and the -a 1 option tells join to also output unpairable lines from the first file. Output:
1 apple red price_apple 1.00
2 banana yellow price_banana 0.50
3 cherry red price_cherry 1.50
4 grape purple

Commonly used flags

Flag Description Example
-1 FIELD Specify the join field for the first file. FIELD is the field number (starting from 1). join -1 2 file1.txt file2.txt (joins on the second field of file1.txt)
-2 FIELD Specify the join field for the second file. FIELD is the field number (starting from 1). join -1 1 -2 3 file1.txt file2.txt (joins on field 1 of file1 and field 3 of file2)
-t CHAR Specify the field separator character. Default is whitespace. join -t ',' file1.csv file2.csv (uses comma as the field separator)
-a FILENUM Print unpairable lines in addition to paired lines. FILENUM is either 1 or 2, specifying which file to include unpaired lines from. join -a 1 file1.txt file2.txt (prints unpaired lines from file1.txt)
-o FILEDOTFIELD Construct output line by the specified FILEDOTFIELD list. join -o 1.2,2.3 file1.txt file2.txt (output the 2nd field of file 1 and 3rd field of file 2)
-e STRING Replace missing input fields with STRING. join -e "N/A" file1.txt file2.txt (replaces missing fields with "N/A")
-i Perform case-insensitive comparisons when finding matches. join -i file1.txt file2.txt (joins ignoring case)


Share on Share on

Comments