C


General


Nature: systems language; procedurallanguage


History: C was developed from1969-1972 by Dennis Ritchie (with assistance by Brian W. Kernighan) of BellTelephone Laboratories for use in systems programming for UNIX.
“Cis a computer programming language. It was developed out of the construction ofthe UNIX operating system. It has a modular programming structure and is thususeful in object oriented programming, as well as in developing graphical userinterfaces. C++ is a superset of C. Other dialectsinclude Small-C and Visual C.” —Language Finger, Maureen and Mike Mansfield Library, University of Montana.

Hello World example


#include

main()
{
   printf("HelloWorld\n");
}

 

Structure


Format: free form

Lexicalelements


sourcecode character set:A C compiler may use any character set that includes at least the followingcharacters: the 52 upper case and lower case alphabetic characters ( A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b cd e f g h i j k l m n o p q r s t u v w x y z ), the 10 decimal digits (0 1 2 3 4 5 6 7 8 9 ), the blank or spacecharacter, and 29 designated graphic characters ( !# % ^ & * ( ) - _ + = ~ [ ] \ | ; : ' " { } , . < > / ?). Five formatting characters (backspace, horizontal tab, verticle tab, formfeed, and carriage return) are often used in C (formatting characters aretreated as spaces). The dollar sign ($) and the at sign (@) are also commonlyused (but not required by the standard). Some form of line separator isrequired, but it doesn’t have to be an actual character or character sequence.

Executioncharacter set: Theexecution character set for C is required to have the standard characters ofthe source code character set, plus a null character and a newline character.The null character must have the value 0 and is used to mark the end ofstrings. The newline character is used to divide character streams into linesduring input or output. Run time libraries may convert between the newlinecharacter and some other character(s) (or lack of characters) during execution(such as compacting the carriage return/line feed combination into the newlinecharacter or generating the newline character at the end of a logical record ortransforming between various record separators and the newline character).

Whitespace: White spacein C includes the blank (space character), horizontal tab, end-of-line,vertical tab, form feed, and comments. White space is ignored by the compiler(except when required to separate tokens or when used in a character or stringconstant), and therefore can be used freely by the programmer to make theprogram easy for a human to read. Some implementations of C treat nonstandardsource characters as either white space or line breaks.

Linetermination: Eachline in a C source program is terminated with an end-of-line character orcharacter sequence. Optionally, certain formatting characters (such as carriagereturn, form feed or vertical tab) can also terminate lines. An empty line is aline that consists of only a terminating character or character sequence orwhite space and line termination. A logical source line can be continued past aline termination by using the backslash character (\) or the ANSI C trigraph??/ immediately before the line termination. String constants and preprocessorcommand lines can cross line breaks through the use of logical source lines. Insome implementations of C, tokens can also cross line breaks through the use oflogical source lines.

linelength: Many Ccompilers impose a maximum line length (both for physical source lines and forlogical source lines). ANSI C requires logical source lines of at least 509characters.

Escapecharacters: The backslashcharacter (\) is used as an escape character, allowing a programmer to includecharacters that would normally have a special meaning for the compiler.

Alternativecharacters: ANSI Cincludes nine trigraphs (three character sequences) for encoding requiredcharacters outside of the ISO 646-1083 Invariant Code Set. These trigraphsalways start with two consecutive question marks. These are:

Trigraph
normal
??(
[
??)
]
??<
{
??>
}
??/
\
??!
|
??'
^
??-
~
??=
#

Multibytecharacters: ANSI Csupports both wide characters and multibyte characters.

Widecharacters are binary characters that are more than one byte, typically usedfor expressing large alphabets.

Multibytecharacters are the external representation of a wide character, in either thesource or exeuction character set.

Comments: Comments are started by theoccurence of the two character sequence /* at any time other than within acharacter or string constant. Comments are terminated by the two charactersequence */. ANSI C requires that comments be replaced with a single spacecharacter, but many C compilers remove comments without inserting a spacecharacter. Some non-UNIX C compilers allow “nestable comments”, which violatesboth original and ANSI C. To comment out large sections of a C program, usepreprocessor commands:

#if 0
 
#endif

Tokens


AC compiler always collects characters into the longest possible tokens, even ifthe result is not valid C. White space always divides tokens. White space canbe used to prevent misinterpretation of C source code (for example, x--y would be tokenized as the illegal x -- y [combining the two hyphens into asingle token], while x - -y would be toeknized as the valid x - - y). White space must be used toseparate an identifier, reserved word, integer constant, floating pointconstant from a following identifier, reserved word, integer constant, orfloating point constant. Although ANSI C requires that comments be replaced bywhite space, many compilers don’t, which can lead to unwanted token merging.

Operators: C has 15 simple operators ( ! % ^ & * - + = ~ | . < > / ? ), 11compound assignment operators ( += -= *= /= %=<<= >>= &= ^= |= == ), and 10 other compound operators (-> ++ -- << >> <= >= !=&& || ).

Separators: C has 9 separator tokens ( ( ) [ ] { } , ; : ).

 

Creatinga program


Thetypical steps for compiling a program in C on a UNIX machine are:
step
command
input
output
create source code
ed
emacs
use any text editor
type from keyboard or terminal
source code
check
(for lexical errors)
lint
source code file
listing with warnings
preprocess
cc
(or cpp)
source code file
c code file
compile
(convert to assembly for specific hardware platform)
cc2
c code file
assembly source code file
assemble
(for specific hardware platform)
asm
(or as)
(or masm)
assembly language file
a.out
object code file
link
link
object code file
executable code
run
program name
file with
executable code
results of program

 

Porting

www.digital.com/info/porting_assistant “The Digital Porting Assistant(available for Digital UNIX 3.2, and shipped as part of the developer toolkiton Digital UNIX 4.0) is a graphical environment which aids in the portingprocess. In addition to doing lint-like checking of C and Fortran code, it alsocontains extensive on-line help regarding developing software on Digital UNIX.”