How-to: Enable Syntax Highlighting

Syntax highlighting is enabled through a lexical specification, which consists of regular expressions that recognize the tokens of the language. These tokens are then mapped to color classes, which are used by Visual Studio to highlight the source. Implementing syntax highlighting is done in the following three steps (see the individual procedures that follow):

Defining language tokens
Mapping tokens to color classes
Defining the lexical specification

注意

You must provide your own lex/yacc-compatible tools; these tools are not included in the VS SDK and the language service project will not build without them. To build the language service using the sample grammar and parser for the sample My C Package, you must use version 1.24 or later of Bison and version 2.5.4a or later of Flex. Place the Bison and Flex executables in the Babel Tools folder, for example, <InstallPath>\VisualStudioIntegration\Babel\Tools. These version numbers are specific to the My C sample.

Defining language tokens

To build the language service, you must use version 1.24 or later of Bison and version 2.5.4a or later of Flex.

Use the following procedure to define language tokens.

To define language tokens

In Solution Explorer, expand the bservice project and then expand the Source Files folder. Double-click the parser.y file to open it in the editor.

Language tokens are defined using the %token definitions of Bison. The My C language contains the following definitions:

  %token IDENTIFIER NUMBER 
  %token KWIF KWELSE KWWHILE KWFOR KWCONTINUE KWBREAK KWRETURN
  %token KWEXTERN KWSTATIC KWAUTO KWINT KWVOID
  %token ',' ';' '(' ')' '{' '}' '=' 
  %token '+' '-' '*' '/' '!' '&' '|' '^'
  %token EQ NEQ GT GTE LT LTE AMPAMP BARBAR 
  %token LEX_WHITE LEX_COMMENT LEX_LINE_COMMENT

Replace the tokens in the parser.y file with the tokens for your language, or to build the My C language, use the tokens already in the file. The named tokens are specified in the lexer.lex file which is processed by Flex. The lexer.lex file also defines what each token is. This is covered in the "Defining the lexical specification" section in this topic.
Build the solution by choosing Build Solution from the Build menu or pressing Ctrl+Shift+B.

When the service is built, Bison automatically generates a file called parser.cpp that contains the C definitions of the tokens.

Mapping tokens to color classes

The tokens in a language must be mapped to a specific color class. For example, in the My C language, the KWIF token maps to the ClassKeyword color class and the NUMBER token maps to the ClassNumber color class (see ColorClass for details).

In service.cpp, mapping information is provided by the Service::getTokenInfo method. This method returns a pointer to a static TokenInfo array (the TokenInfo structure is defined in the Babel common file, stdservice.h, which is found in the Babel Common folder, for example, [drive]\Program Files\VSIP 8.0\EnvSDK\Babel\Common). There is an entry for each defined token. Each entry in the TokenInfo structure contains the following information:

Element	Example
Token name	IDENTIFIER, NUMBER, ';'
Color Class	ClassIdentifier, ClassNumber, ClassComment
Description	identifier, number, comment
Character Class	CharIdentifier, CharLiteral, CharComment
Trigger Class (optional)	For more information, see Adding Triggers in How-to: Provide Automatic Brace Matching.

The character class typically coincides with the color class. The essential difference is that the color classes can be extended (as is described later) while the character classes are fixed. This allows the Babel package to use the character class for implementing search and navigation functions of the Visual Studio environment.

Example of Mapping Tokens to Color Classes for My C

The token table for the My C language, is as follows:

  static TokenInfo tokenInfoTable[] =
  {
    //TODO: Add your own token information here.
    { IDENTIFIER, ClassIdentifier,    "identifier '%s'", CharIdentifier },
    { NUMBER,     ClassNumber,        "number ('%s')"  , CharLiteral    },
    { KWIF,       ClassKeyword,       "if"             , CharKeyword },
    { KWELSE,     ClassKeyword,       "else"           , CharKeyword },
    { KWWHILE,    ClassKeyword,       "while"          , CharKeyword }, 
  
    { LEX_WHITE,  ClassText,          "white space"    , CharWhiteSpace },
    { LEX_LINE_COMMENT, ClassComment, "comment"        , CharLineComment },
    { LEX_COMMENT,ClassComment,       "comment"        , CharComment },

    //Always end with the 'TokenEnd' token.
    { TokenEnd,     ClassText,      "<unknown>" }
  };

The My C language uses default color classes. These are defined in Babelservice.idl, as follows:

  enum DefaultColorClass
  {
    ClassText,
    ClassKeyword,
    ClassComment,
    ClassIdentifier,
    ClassString,
    ClassNumber
  };

It is also possible to use custom color classes. For more information, see How-to: Provide Custom Color Classes.

Defining the lexical specification

After you have defined language tokens and mapped these tokens to color classes, you can add the lexical specification for the language. This specification is used by the colorizer to provide syntax highlighting. The syntax checker later uses the same specification to provide tokens for the grammar. Use the following procedure to add the lexical specification.

To add the lexical specification

In Solution Explorer, expand the bservice project and then expand the Source Files folder. Double-click the lexer.lex file to open it in the editor.
Add the lex rules for your language.

Most languages already have a lex specification. If you have one, you can simply copy the rules into the lexer.lex file. However, to meet the requirements of the colorizer, there are some constraints on the form of the lex rules. These constraints, in turn, depend on the implementation of the colorizer, which automatically saves the state of the lexer at every line of text. Whenever the line colors change, the lexer is restarted at this state to colorize the specific line again. Each time a token is returned by the lexer, the colorizer locates the color class in the table provided by the getTokenInfo method in the Service class. All the characters scanned for this token are colored with this color class. To make this work, the colorizer assumes two things from the lexical specification:

The lexer returns a token at the end of a line.
No scanned characters are skipped or added during scanning.

The first constraint ensures that the lexer is in a known state at the end of the line. The second constraint is used to determine how many characters are colored. To satisfy the first constraint, add a rule at each lexer state that returns a token for each new line. For example, many lexical specifications contain the following rule for discarding white space:

  [ \t\r\n]+    { /* ignore */ }

Because this rule does not directly return a token on a new line, it should be re-defined to:

  [ \t\r]+    { /* ignore */ }
  \n          { /* ignore */ }

However, now it does not satisfy the second constraint, as it does not apply to scanned characters. The final and correct definition returns a white space token:

  [ \t\r]+    { return LEX_WHITE; }
  \n          { return LEX_WHITE; }

Because the token mapping specifies the CharWhiteSpace character class for LEX_WHITE, only the colorizer knows about the white space. The parser ignores all white space tokens, although you can change this behavior by overriding the Service::isGrammarToken method.

The second constraint can be enforced by always adding an action to a lexical rule so that characters are not lost, and by indicating that there are only three ways to end that action. These ways are as follows:

return Token;

Indicates that the token has been scanned and its name has been returned. This is the typical way to end an action.
yymore();

This approach is used to avoid losing characters in yytext (the currently scanned text) while continuing to scan. Use this in situations where you only want to return the comment token at the end of a comment or line.
yyless(0);

Indicates that something has been scanned and the state has changed, but the input should be scanned again. Here, yyless(0) is used to prevent duplication of characters.

During lexing, the global variable g_service points to the current language service, that is, your language service. The complete interface definition of a language service can be found in babelservice.idl file in the Babel folder (for example, [drive]\Program Files\VSIP 8.0\EnvSDK\Babel\Babelservice.idl) and the stdservice.h header file in the Babel Common folder (for example, [drive]\Program Files\VSIP 8.0\EnvSDK\Babel\Common\stdservice.h). Important functions during lexical analysis are:

void setLexState( in LexState state);

Sets the lexer state. Use in place of the FLEX BEGIN macro.
void lexicalError( in Severity sev, in const char* msg );

Used to send a lexical error. These error messages show up at run time in the task list. This method affects only syntax checking.

A hint is a type of error that is used to set TODO messages in the task list. A hint is given by setting the severity to SevHint (see the Severity enumeration). Start the error message with the hint token, not including the comment start. For an example, see the My C sample.
void enterComment(in LexState commentState = 0);

If the commentState parameter is provided, the lexer is put into that state and the current state is saved. Each time this method is called, the comment nesting level is incremented.
void leaveComment();

Each time this method is called, the comment nesting level is decremented. Whenever it reaches zero, the lexer is put into the original state when enterComment method was first called.

After adding the lexical rules, the colorizer works in Visual Studio. You can test some of your language source files in the Visual Studio environment simply by loading them.