comm

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
comm
Original authorLee E. McMahon
DevelopersAT&T Bell Laboratories, Richard Stallman, David MacKenzie
Initial releaseNovember 1973; 52 years ago (1973-11)
Repository
  • {{URL|example.com|optional display text}}Lua error in Module:EditAtWikidata at line 29: attempt to index field 'wikibase' (a nil value).
Written inC
Engine
    Lua error in Module:EditAtWikidata at line 29: attempt to index field 'wikibase' (a nil value).
    Operating systemUnix, Unix-like, Plan 9, Inferno
    PlatformCross-platform
    TypeCommand
    Licensecoreutils: GPLv3+
    Plan 9: MIT License

    comm is a shell command for comparing two files for common and distinct lines. It reads the files as lines of text and outputs text as three columns. The first two columns contain lines unique to the first and second file, respectively. The last column contains lines common to both. Columns are typically separated with the tab character. If the input text contains lines beginning with the separator character, the output columns can become ambiguous.

    For efficiency, standard implementations of comm expect both input files to be sequenced in the same line collation order, sorted lexically. The sort command can be used for this purpose. The comm algorithm makes use of the collating sequence of the current locale. If the lines in the files are not both collated in accordance with the current locale, the result is undefined.

    The command is specified in the POSIX standard. It has been widely available on Unix-like operating systems since the mid to late 1980s. Originally implemented by Lee E. McMahon, the command first appeared in Version 4 Unix.[1] The version in GNU coreutils was written by Richard Stallman and David MacKenzie.[2]

    Example

    [edit | edit source]
    $ cat foo
    apple
    banana
    eggplant
    $ cat bar
    apple
    banana
    banana
    zucchini
    $ comm foo bar
                      apple
                      banana
              banana
    eggplant
              zucchini
    

    This shows that both files have one banana, but only bar has a second banana.

    In more detail, the output file has the appearance that follows. Note that the column is interpreted by the number of leading tab characters. \t represents a tab character and \n represents a newline (Escape character#Programming and data formats).

    0 1 2 3 4 5 6 7 8 9
    0 \t \t a p p l e \n
    1 \t \t b a n a n a \n
    2 \t b a n a n a \n
    3 e g g p l a n t \n
    4 \t z u c c h i n i \n

    Limits

    [edit | edit source]

    Up to a full line must be buffered from each input file during line comparison, before the next output line is written.

    Some implementations read lines with the function readlinebuffer() which does not impose any line length limits if system memory suffices.

    Other implementations read lines with the function fgets(). This function requires a fixed buffer. For these implementations, the buffer is often sized according to the POSIX macro LINE_MAX.

    Comparison to diff

    [edit | edit source]

    Although also a file comparison command, diff reports significantly different information than comm. In general, diff is more powerful than comm. The simpler comm is best suited for use in scripts.

    The primary distinction between comm and diff is that comm discards information about the order of the lines prior to sorting.

    A minor difference between comm and diff is that comm will not try to indicate that a line has changed between the two files; lines are either shown in the "from file #1", "from file #2", or "in both" columns. This can be useful if one wishes two lines to be considered different even if they only have subtle differences.

    Unlike for diff, the exit code of comm does not indicate whether the files match. As is typical, 0 indicates success, and other positive values indicate an error.

    See also

    [edit | edit source]

    References

    [edit | edit source]
    1. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    2. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    [edit | edit source]