Code::Blocks  SVN r11506
Classes | Public Member Functions | Protected Member Functions | Private Member Functions | Private Attributes | List of all members
Tokenizer Class Reference

This is just a simple lexer class. More...

#include <tokenizer.h>

Collaboration diagram for Tokenizer:

Classes

struct  ExpandedMacro
 replaced buffer information Here is an example of how macro are expanded More...
 

Public Member Functions

 Tokenizer (TokenTree *tokenTree, const wxString &filename=wxEmptyString)
 Tokenizer constructor. More...
 
 ~Tokenizer ()
 Tokenizer destructor. More...
 
bool Init (const wxString &filename=wxEmptyString, LoaderBase *loader=0)
 Initialize the buffer by opening a file through a loader, this function copy the contents from the loader's buffer to its own buffer, so after that, we can safely delete the loader after this function call. More...
 
bool InitFromBuffer (const wxString &buffer, const wxString &fileOfBuffer=wxEmptyString, size_t initLineNumber=0)
 Initialize the buffer by directly using a wxString's content. More...
 
wxString GetToken ()
 Consume and return the current token string. More...
 
wxString PeekToken ()
 Do a "look ahead", and return the next token string. More...
 
void UngetToken ()
 Undo the GetToken. More...
 
void SetTokenizerOption (bool wantPreprocessor, bool storeDocumentation)
 Handle condition preprocessor and store documentation or not. More...
 
void SetState (TokenizerState state)
 Set the Tokenizer skipping options. More...
 
TokenizerState GetState ()
 Return the token reading options value,. More...
 
const wxStringGetFilename () const
 Return the opened files name. More...
 
unsigned int GetLineNumber () const
 Return the line number of the current token string. More...
 
unsigned int GetNestingLevel () const
 Return the brace "{}" level. More...
 
void SaveNestingLevel ()
 Save the brace "{" level, the parser might need to ignore the nesting level in some cases. More...
 
void RestoreNestingLevel ()
 Restore the brace level. More...
 
bool IsOK () const
 If the buffer is correctly loaded, this function return true. More...
 
wxString ReadToEOL (bool stripUnneeded=true)
 return the string from the current position to the end of current line, in most case, this function is used in handling #define, use with care outside this class! More...
 
void ReadParentheses (wxString &str)
 read a string from '(' to ')', note that inner parentheses are considered More...
 
bool SkipToEOL ()
 Skip from the current position to the end of line, use with care outside this class! More...
 
bool SkipToInlineCommentEnd ()
 Skip to then end of the C++ style comment. More...
 
bool IsEOF () const
 Check whether the Tokenizer reaches the end of the buffer (file) More...
 
bool NotEOF () const
 return true if it is Not the end of buffer More...
 
bool ReplaceBufferText (const wxString &target, const Token *macro=0)
 Backward buffer replacement for re-parsing. More...
 
bool ReplaceMacroUsage (const Token *tk)
 Get expanded text for the current macro usage, then replace buffer for re-parsing. More...
 
int GetFirstTokenPosition (const wxString &buffer, const wxString &target)
 Search "target" in the buffer, return first position in buffer. More...
 
int GetFirstTokenPosition (const wxChar *buffer, const size_t bufferLen, const wxChar *key, const size_t keyLen)
 find the sub-string key in the whole buffer, return the first position of the key More...
 
int KMP_Find (const wxChar *text, const wxChar *pattern, const int patternLen)
 KMP find, get the first position, if find nothing, return -1 https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm. More...
 
void SetLastTokenIdx (int tokenIdx)
 a Token is added, associate doxygen style documents(comments before the variables) to the Token More...
 

Protected Member Functions

void BaseInit ()
 Initialize some member variables. More...
 
wxString DoGetToken ()
 Do the actual lexical analysis, both GetToken() and PeekToken() will internally call this function. More...
 
bool CheckMacroUsageAndReplace ()
 check the m_Lex to see it is an identifier like token, and also if it is a macro usage, replace it. More...
 
bool Lex ()
 this function only move the m_TokenIndex and get a lexeme and store it in m_Lex, the m_Lex will be further checked if it is a macro usage or not. More...
 
bool ReadFile ()
 Read a file, and fill the m_Buffer. More...
 
bool IsEscapedChar ()
 Check the current character is a C-Escape character in a string. More...
 
bool SkipToChar (const wxChar &ch)
 Skip characters until we meet a ch. More...
 
bool SkipUnwanted ()
 skips comments, spaces, preprocessor branch. More...
 
bool SkipWhiteSpace ()
 Skip any "tab" "white-space". More...
 
bool SkipComment ()
 Skip the C/C++ comment. More...
 
bool SkipPreprocessorBranch ()
 Skip the C preprocessor directive, such as #ifdef xxxx only the conditional preprocessor directives are handled here, the others such as #include or #warning and all kinds of ptOthers(. More...
 
bool SkipString ()
 Skip the string literal(enclosed in double quotes) or character literal(enclosed in single quotes). More...
 
bool SkipToStringEnd (const wxChar &ch)
 Move to the end of string literal or character literal, the m_TokenIndex will point at the closing quote character. More...
 
bool MoveToNextChar ()
 Move to the next character in the buffer. More...
 
wxChar CurrentChar () const
 Return the current character indexed(pointed) by m_TokenIndex in the m_Buffer. More...
 
wxChar CurrentCharMoveNext ()
 Do the previous two functions sequentially. More...
 
wxChar NextChar () const
 Return (peek) the next character. More...
 
wxChar PreviousChar () const
 Return (peek) the previous character. More...
 

Private Member Functions

bool CharInString (const wxChar ch, const wxChar *chars) const
 Check if a ch matches any characters in the wxChar array. More...
 
bool IsBackslashBeforeEOL ()
 Check the previous char before EOL is a backslash, call this function in the condition that the CurrentChar is '
', here we have two cases: More...
 
bool CalcConditionExpression ()
 #if xxxx, calculate the value of "xxxx" More...
 
bool IsMacroDefined ()
 If the next token string is macro definition, return true this is used in the situation when we are reading the conditional preprocessors such as checking macro defined like below. More...
 
void HandleDefines ()
 handle the macro definition statement: #define XXXXX More...
 
void HandleUndefs ()
 handle the statement: #undef XXXXX More...
 
void AddMacroDefinition (wxString name, int line, wxString para, wxString substitues)
 add a macro definition to the Token database for example: #define AAA(x,y) x+y More...
 
void SkipToNextConditionPreprocessor ()
 Skip to the next conditional preprocessor directive branch. More...
 
void SkipToEndConditionPreprocessor ()
 Skip to the #endif conditional preprocessor directive. More...
 
PreprocessorType GetPreprocessorType ()
 Get current conditional preprocessor type,. More...
 
void HandleConditionPreprocessor (const PreprocessorType type)
 handle the preprocessor directive: #ifdef XXX or #endif or #if or #elif or... More...
 
bool SplitArguments (wxArrayString &results)
 Split the macro arguments, and store them in results, when calling this function, we expect that m_TokenIndex point to the opening '(', or one space char before the opening '('. More...
 
bool GetMacroExpandedText (const Token *tk, wxString &expandedText)
 Get the full expanded text. More...
 
void KMP_GetNextVal (const wxChar *pattern, int next[])
 used in the KMP find function More...
 

Private Attributes

TokenizerOptions m_TokenizerOptions
 Tokenizer options specify the token reading option. More...
 
TokenTreem_TokenTree
 the Token tree to store the macro definition, the token tree is shared with Parserthread More...
 
wxString m_Filename
 Filename of the buffer. More...
 
unsigned int m_FileIdx
 File index, useful when parsing documentation;. More...
 
wxString m_Buffer
 Buffer content, all the lexical analysis is operating on this member variable. More...
 
unsigned int m_BufferLen
 Buffer length. More...
 
wxString m_Lex
 a lexeme string return by the Lex() function, this is a candidate token string, which may be replaced if it is a macro usage More...
 
wxString m_Token
 These variables define the current token string and its auxiliary information, such as the token name, the line number of the token, the current brace nest level. More...
 
unsigned int m_TokenIndex
 index offset in buffer, when parsing a buffer More...
 
unsigned int m_LineNumber
 line offset in buffer, please note that it is 1 based, not 0 based More...
 
unsigned int m_NestLevel
 keep track of block nesting { } More...
 
unsigned int m_UndoTokenIndex
 Backup the previous Token information. More...
 
unsigned int m_UndoLineNumber
 
unsigned int m_UndoNestLevel
 
bool m_PeekAvailable
 Peek token information. More...
 
wxString m_PeekToken
 
unsigned int m_PeekTokenIndex
 
unsigned int m_PeekLineNumber
 
unsigned int m_PeekNestLevel
 
unsigned int m_SavedTokenIndex
 Saved token info (for PeekToken()), m_TokenIndex will be moved forward or backward when either DoGetToken() or SkipUnwanted() is called, so we should save m_TokenIndex before it get modified. More...
 
unsigned int m_SavedLineNumber
 
unsigned int m_SavedNestingLevel
 
bool m_IsOK
 bool variable specifies whether the buffer is ready for parsing More...
 
TokenizerState m_State
 Tokeniser state specifies the token reading option. More...
 
LoaderBasem_Loader
 File loader, it load the content to the m_Buffer, either from the harddisk or memory. More...
 
std::stack< bool > m_ExpressionResult
 preprocessor branch stack, if we meet a #if 1, then the value true will be pushed to to the stack, if we skip the #endif, the true value should be popped. More...
 
std::list< ExpandedMacrom_ExpandedMacros
 this serves as a macro replacement stack, in the above example, if AAA is replaced by BBBB, we store the macro definition of AAA in the m_ExpandedMacros, and if BBBB is also defined as More...
 
wxString m_NextTokenDoc
 normally, this record the doxygen style comments for the next token definition for example, here is a comment More...
 
int m_LastTokenIdx
 store the recent added token index for example, here is a comment More...
 
bool m_ReadingMacroDefinition
 indicates whether we are reading the macro definition This variable will affect how the doxygen comments will be associated to the Token. More...
 

Detailed Description

This is just a simple lexer class.

A Tokenizer does the lexical analysis on a buffer. The buffer is either a wxString loaded from a local source/header file or a wxString already in memory(e.g. the scintilla text buffer). The most public interfaces are two member functions: GetToken() and PeekToken(). The former one eats one token string from buffer, the later one does a "look ahead" on the buffer and return the next token string(peeked string). The peeked string will be cached until the next GetToken() call, thus performance can be improved. Also, Tokenizer class does some kind of handling "Macro expansion" on the buffer, from this point of view, this class is a kind of preprocessor Further more, it handles some "conditional preprocessor directives"(like "#if xxx").

Definition at line 64 of file tokenizer.h.

Constructor & Destructor Documentation

◆ Tokenizer()

Tokenizer::Tokenizer ( TokenTree tokenTree,
const wxString filename = wxEmptyString 
)

Tokenizer constructor.

Parameters
filenamethe file to be opened.

Definition at line 91 of file tokenizer.cpp.

References Init(), wxString::IsEmpty(), m_Filename, m_TokenizerOptions, TokenizerOptions::storeDocumentation, and TokenizerOptions::wantPreprocessor.

◆ ~Tokenizer()

Tokenizer::~Tokenizer ( )

Tokenizer destructor.

Definition at line 122 of file tokenizer.cpp.

Member Function Documentation

◆ AddMacroDefinition()

void Tokenizer::AddMacroDefinition ( wxString  name,
int  line,
wxString  para,
wxString  substitues 
)
private

add a macro definition to the Token database for example: #define AAA(x,y) x+y

Parameters
namemacro name which is "AAA"
linethe line number of the macro definition
parathe formal parameters, which is "(x,y)"
substituesthe definition which is "x+y"

Definition at line 2027 of file tokenizer.cpp.

References TokenTree::at(), TokenTree::insert(), Token::m_Args, m_FileIdx, Token::m_FullType, Token::m_Index, Token::m_ParentIndex, Token::m_TokenKind, TokenTree::m_TokenTicketCount, m_TokenTree, SetLastTokenIdx(), tkMacroDef, TokenTree::TokenExists(), and wxNOT_FOUND.

Referenced by HandleDefines().

◆ BaseInit()

void Tokenizer::BaseInit ( )
protected

◆ CalcConditionExpression()

bool Tokenizer::CalcConditionExpression ( )
private

◆ CharInString()

bool Tokenizer::CharInString ( const wxChar  ch,
const wxChar chars 
) const
inlineprivate

Check if a ch matches any characters in the wxChar array.

Definition at line 366 of file tokenizer.h.

References wxStrlen().

Referenced by Lex().

◆ CheckMacroUsageAndReplace()

bool Tokenizer::CheckMacroUsageAndReplace ( )
protected

check the m_Lex to see it is an identifier like token, and also if it is a macro usage, replace it.

Returns
true if some text replacement happens in the m_Buffer, otherwise return false

Definition at line 1069 of file tokenizer.cpp.

References TokenTree::at(), m_Lex, m_TokenTree, ReplaceMacroUsage(), tkMacroDef, and TokenTree::TokenExists().

Referenced by DoGetToken().

◆ CurrentChar()

wxChar Tokenizer::CurrentChar ( ) const
inlineprotected

◆ CurrentCharMoveNext()

wxChar Tokenizer::CurrentCharMoveNext ( )
inlineprotected

Do the previous two functions sequentially.

Definition at line 339 of file tokenizer.h.

◆ DoGetToken()

wxString Tokenizer::DoGetToken ( )
protected

Do the actual lexical analysis, both GetToken() and PeekToken() will internally call this function.

It just move the m_TokenIndex one step forward, and return a lexeme before the m_TokenIndex.

Definition at line 944 of file tokenizer.cpp.

References CheckMacroUsageAndReplace(), Lex(), m_Lex, and SkipUnwanted().

Referenced by CalcConditionExpression(), GetToken(), PeekToken(), and ReadParentheses().

◆ GetFilename()

const wxString& Tokenizer::GetFilename ( ) const
inline

◆ GetFirstTokenPosition() [1/2]

int Tokenizer::GetFirstTokenPosition ( const wxString buffer,
const wxString target 
)
inline

Search "target" in the buffer, return first position in buffer.

it is used to find the formal argument in the macro definition body.

Parameters
bufferthe content
targetthe search key

Definition at line 240 of file tokenizer.h.

References wxString::GetData(), and wxString::Len().

Referenced by GetMacroExpandedText().

◆ GetFirstTokenPosition() [2/2]

int Tokenizer::GetFirstTokenPosition ( const wxChar buffer,
const size_t  bufferLen,
const wxChar key,
const size_t  keyLen 
)

find the sub-string key in the whole buffer, return the first position of the key

Parameters
bufferthe content of the string
bufferLenlength of the string
keythe search key(sub-string)
keyLenthe search key length

Definition at line 1908 of file tokenizer.cpp.

References _T, KMP_Find(), and wxIsalnum().

◆ GetLineNumber()

unsigned int Tokenizer::GetLineNumber ( ) const
inline

◆ GetMacroExpandedText()

bool Tokenizer::GetMacroExpandedText ( const Token tk,
wxString expandedText 
)
private

Get the full expanded text.

Parameters
tkthe macro definition token, usually a function like macro definition
expandedTextis an output string call this function in the condition that we have just detect the current token is a macro usage, such as in the condition below, that "ABC" is a macro usage:
......ABC(abc, (def)).....
^--------m_TokenIndex

Definition at line 1730 of file tokenizer.cpp.

References _T, wxString::Alloc(), wxString::Find(), wxArrayString::GetCount(), wxString::GetData(), GetFirstTokenPosition(), wxString::IsEmpty(), wxString::Len(), Token::m_Args, Token::m_FullType, m_Lex, Token::m_Name, wxString::Remove(), ReplaceBufferText(), wxString::SetChar(), wxString::size(), SplitArguments(), TRACE, wxString::wx_str(), wxIsalpha(), and wxNOT_FOUND.

Referenced by ReplaceMacroUsage().

◆ GetNestingLevel()

unsigned int Tokenizer::GetNestingLevel ( ) const
inline

Return the brace "{}" level.

the value will increase by one when we meet a "{", decrease by one when we meet a "}".

Definition at line 135 of file tokenizer.h.

Referenced by ParserThread::HandleEnum(), ParserThread::SkipBlock(), and ParserThread::SkipToOneOfChars().

◆ GetPreprocessorType()

PreprocessorType Tokenizer::GetPreprocessorType ( )
private

◆ GetState()

TokenizerState Tokenizer::GetState ( )
inline

◆ GetToken()

wxString Tokenizer::GetToken ( )

◆ HandleConditionPreprocessor()

void Tokenizer::HandleConditionPreprocessor ( const PreprocessorType  type)
private

◆ HandleDefines()

void Tokenizer::HandleDefines ( )
private

◆ HandleUndefs()

void Tokenizer::HandleUndefs ( )
private

◆ Init()

bool Tokenizer::Init ( const wxString filename = wxEmptyString,
LoaderBase loader = 0 
)

Initialize the buffer by opening a file through a loader, this function copy the contents from the loader's buffer to its own buffer, so after that, we can safely delete the loader after this function call.

Definition at line 126 of file tokenizer.cpp.

References _T, BaseInit(), TokenTree::GetFileIndex(), wxString::IsEmpty(), m_BufferLen, m_FileIdx, m_Filename, m_IsOK, m_Loader, m_TokenTree, ReadFile(), wxString::Replace(), TRACE, TRACE2, TRACE2_SET_FLAG, wxString::wx_str(), and wxFileExists().

Referenced by ParserThread::InitTokenizer(), and Tokenizer().

◆ InitFromBuffer()

bool Tokenizer::InitFromBuffer ( const wxString buffer,
const wxString fileOfBuffer = wxEmptyString,
size_t  initLineNumber = 0 
)

Initialize the buffer by directly using a wxString's content.

Parameters
initLineNumberthe start line of the buffer, usually the parser try to parse a function body, so the line information of each local variable tokens are correct.
buffertext content used for parsing
fileOfBufferthe file name where the buffer come from.

Definition at line 174 of file tokenizer.cpp.

References _T, BaseInit(), TokenTree::GetFileIndex(), wxString::Length(), m_Buffer, m_BufferLen, m_FileIdx, m_Filename, m_IsOK, m_LineNumber, m_TokenTree, and wxString::Replace().

Referenced by NativeParserBase::ComputeCallTip(), ParserThread::HandleConditionalArguments(), ParserThread::HandleForLoopArguments(), ParserThread::InitTokenizer(), ParserThread::ParseBufferForNamespaces(), and ParserThread::ParseBufferForUsingNamespace().

◆ IsBackslashBeforeEOL()

bool Tokenizer::IsBackslashBeforeEOL ( )
inlineprivate

Check the previous char before EOL is a backslash, call this function in the condition that the CurrentChar is '
', here we have two cases:

......\ \ \r \n......
^--current char, this is DOS style EOL
......\ \ \n......
^--current char, this is Linux style EOL

Definition at line 386 of file tokenizer.h.

References _T.

Referenced by ReadToEOL(), SkipComment(), SkipToEOL(), and SkipToInlineCommentEnd().

◆ IsEOF()

bool Tokenizer::IsEOF ( ) const
inline

Check whether the Tokenizer reaches the end of the buffer (file)

Definition at line 177 of file tokenizer.h.

Referenced by Lex(), MoveToNextChar(), ReadToEOL(), SkipComment(), SkipString(), SkipToEOL(), SkipToInlineCommentEnd(), SkipToStringEnd(), and SkipWhiteSpace().

◆ IsEscapedChar()

bool Tokenizer::IsEscapedChar ( )
protected

Check the current character is a C-Escape character in a string.

Definition at line 280 of file tokenizer.cpp.

References wxString::GetChar(), m_Buffer, m_BufferLen, m_TokenIndex, and PreviousChar().

Referenced by SkipToStringEnd().

◆ IsMacroDefined()

bool Tokenizer::IsMacroDefined ( )
private

If the next token string is macro definition, return true this is used in the situation when we are reading the conditional preprocessors such as checking macro defined like below.

#ifdef xxx
^-----m_TokenIndex is here

Then we try to see whether we check to see "xxx" is a macro definition, we don't need to expand the "xxx" here

Definition at line 1171 of file tokenizer.cpp.

References _T, Lex(), m_Lex, m_TokenTree, SkipComment(), SkipWhiteSpace(), tkMacroDef, and TokenTree::TokenExists().

Referenced by CalcConditionExpression(), and HandleConditionPreprocessor().

◆ IsOK()

bool Tokenizer::IsOK ( ) const
inline

If the buffer is correctly loaded, this function return true.

Definition at line 153 of file tokenizer.h.

Referenced by ParserThread::Parse(), ParserThread::ParseBufferForNamespaces(), and ParserThread::ParseBufferForUsingNamespace().

◆ KMP_Find()

int Tokenizer::KMP_Find ( const wxChar text,
const wxChar pattern,
const int  patternLen 
)

KMP find, get the first position, if find nothing, return -1 https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.

Definition at line 1673 of file tokenizer.cpp.

References _T, KMP_GetNextVal(), and TRACE.

Referenced by GetFirstTokenPosition().

◆ KMP_GetNextVal()

void Tokenizer::KMP_GetNextVal ( const wxChar pattern,
int  next[] 
)
private

used in the KMP find function

Definition at line 1653 of file tokenizer.cpp.

References _T.

Referenced by KMP_Find().

◆ Lex()

bool Tokenizer::Lex ( )
protected

this function only move the m_TokenIndex and get a lexeme and store it in m_Lex, the m_Lex will be further checked if it is a macro usage or not.

Returns
true if it is an identifier like token. note we need to check an identifier like token to see whether it is a macro usage.

Definition at line 965 of file tokenizer.cpp.

References _T, wxString::assign(), CharInString(), TokenizerConsts::colon, TokenizerConsts::colon_colon, CurrentChar(), TokenizerConsts::equal, IsEOF(), m_Buffer, m_ExpandedMacros, m_Lex, m_NestLevel, m_TokenIndex, wxString::Mid(), MoveToNextChar(), NextChar(), NotEOF(), SkipString(), wxEmptyString, wxIsalnum(), wxIsalpha(), and wxIsdigit().

Referenced by DoGetToken(), GetPreprocessorType(), HandleDefines(), HandleUndefs(), IsMacroDefined(), and SplitArguments().

◆ MoveToNextChar()

bool Tokenizer::MoveToNextChar ( )
protected

◆ NextChar()

wxChar Tokenizer::NextChar ( ) const
inlineprotected

Return (peek) the next character.

Definition at line 347 of file tokenizer.h.

Referenced by Lex(), ReadToEOL(), SkipComment(), SkipToEndConditionPreprocessor(), SkipToEOL(), SkipToInlineCommentEnd(), and SkipToNextConditionPreprocessor().

◆ NotEOF()

bool Tokenizer::NotEOF ( ) const
inline

◆ PeekToken()

wxString Tokenizer::PeekToken ( )

◆ PreviousChar()

wxChar Tokenizer::PreviousChar ( ) const
inlineprotected

Return (peek) the previous character.

Definition at line 356 of file tokenizer.h.

Referenced by IsEscapedChar(), MoveToNextChar(), ReadToEOL(), SkipToEOL(), and SkipToInlineCommentEnd().

◆ ReadFile()

bool Tokenizer::ReadFile ( )
protected

Read a file, and fill the m_Buffer.

Definition at line 212 of file tokenizer.cpp.

References _T, cbRead(), LoaderBase::FileName(), LoaderBase::GetData(), LoaderBase::GetLength(), wxString::Length(), m_Buffer, m_BufferLen, m_Filename, m_Loader, wxEmptyString, and wxFileExists().

Referenced by Init().

◆ ReadParentheses()

void Tokenizer::ReadParentheses ( wxString str)

read a string from '(' to ')', note that inner parentheses are considered

Parameters
strthe returned string

Definition at line 496 of file tokenizer.cpp.

References _T, DoGetToken(), wxString::Last(), NotEOF(), wxIsalnum(), and wxIsalpha().

Referenced by GetToken(), and PeekToken().

◆ ReadToEOL()

wxString Tokenizer::ReadToEOL ( bool  stripUnneeded = true)

return the string from the current position to the end of current line, in most case, this function is used in handling #define, use with care outside this class!

Parameters
stripUnneededtrue if you want to remove comments and compression spaces(two or more spaces should become one space)

Definition at line 367 of file tokenizer.cpp.

References _T, wxString::Append(), CurrentChar(), IsBackslashBeforeEOL(), IsEOF(), m_Buffer, m_LineNumber, m_ReadingMacroDefinition, m_TokenIndex, wxString::Mid(), MoveToNextChar(), NextChar(), NotEOF(), PreviousChar(), SkipComment(), SkipString(), SkipToEOL(), TRACE, and wxString::wx_str().

Referenced by HandleDefines().

◆ ReplaceBufferText()

bool Tokenizer::ReplaceBufferText ( const wxString target,
const Token macro = 0 
)

Backward buffer replacement for re-parsing.

Parameters
targetthe new text going to replace some text on the m_Buffer
macroif it is a macro expansion, we need to remember the referenced(used) macro token so that we can avoid the recursive macro expansion such as the below code:
#define X Y
#define Y X
int X;
http://forums.codeblocks.org/index.php/topic,13384.msg90391.html#msg90391

Macro expansion is just replace some characters in the m_Buffer.

xxxxxxxxxAAAA(u,v)yyyyyyyyy
^------ m_TokenIndex (anchor point)

For example, the above is a wxChar Array(m_Buffer), a macro usage "AAAA(u,v)" is detected and need to expanded. We just do a "backward" text replace here. Before replacement, m_TokenIndex points to the next char of ")" in "AAAA(u,v)"(We say it as an anchor point). After replacement, the new buffer becomes:

xxxNNNNNNNNNNNNNNNyyyyyyyyy
^ <----------- ^
m_TokenIndex was moved backward

Note that "NNNNNNNNNNNNNNN" is the expanded new text. The m_TokenIndex was moved backward to the beginning of the new added text. If the new text is small enough, then m_Buffer's length does not need to increase. The situation when our m_Buffer's length need to be increased is that the new text is too long, so the buffer before "anchor point" can not hold the new text, this way, m_Buffer's length will adjusted. like below:

NNNNNNNNNNNNNNNNNNNNNNyyyyyyyyy ^—m_TokenIndex

Definition at line 1546 of file tokenizer.cpp.

References _T, wxString::GetChar(), wxString::insert(), wxString::IsEmpty(), wxString::Len(), m_Buffer, m_BufferLen, Tokenizer::ExpandedMacro::m_End, m_ExpandedMacros, m_LineNumber, Tokenizer::ExpandedMacro::m_Macro, m_NestLevel, m_PeekAvailable, m_SavedLineNumber, m_SavedNestingLevel, m_SavedTokenIndex, m_TokenIndex, m_UndoLineNumber, m_UndoNestLevel, m_UndoTokenIndex, s_MaxMacroReplaceDepth, wxString::SetChar(), TRACE, and wxString::wx_str().

Referenced by GetMacroExpandedText(), and ReplaceMacroUsage().

◆ ReplaceMacroUsage()

bool Tokenizer::ReplaceMacroUsage ( const Token tk)

Get expanded text for the current macro usage, then replace buffer for re-parsing.

Parameters
tkthe macro definition token
Returns
true if macro expansion succeeded, buffer is actually changed and m_TokenIndex moved backward a bit, and peek status get cleared Both the "function like macro" or "variable like macro" usage can be handled in this function.

Definition at line 1635 of file tokenizer.cpp.

References GetMacroExpandedText(), m_ExpandedMacros, and ReplaceBufferText().

Referenced by CheckMacroUsageAndReplace().

◆ RestoreNestingLevel()

void Tokenizer::RestoreNestingLevel ( )
inline

Restore the brace level.

Definition at line 147 of file tokenizer.h.

◆ SaveNestingLevel()

void Tokenizer::SaveNestingLevel ( )
inline

Save the brace "{" level, the parser might need to ignore the nesting level in some cases.

Definition at line 141 of file tokenizer.h.

◆ SetLastTokenIdx()

void Tokenizer::SetLastTokenIdx ( int  tokenIdx)

a Token is added, associate doxygen style documents(comments before the variables) to the Token

Definition at line 1719 of file tokenizer.cpp.

References TokenTree::AppendDocumentation(), wxString::clear(), wxString::IsEmpty(), m_ExpressionResult, m_FileIdx, m_LastTokenIdx, m_NextTokenDoc, and m_TokenTree.

Referenced by AddMacroDefinition(), and ParserThread::DoAddToken().

◆ SetState()

void Tokenizer::SetState ( TokenizerState  state)
inline

Set the Tokenizer skipping options.

E.g. normally, we read the parentheses as a whole token, but sometimes, we should disable this options,

See also
TokenizerState for more details.

Definition at line 109 of file tokenizer.h.

Referenced by ParserThread::CalcEnumExpression(), ParserThread::DoParse(), ParserThread::GetTemplateArgs(), ParserThread::HandleClass(), ParserThread::HandleEnum(), ParserThread::HandleNamespace(), ParserThread::ParseBufferForNamespaces(), ParserThread::SkipAngleBraces(), and ParserThread::SkipBlock().

◆ SetTokenizerOption()

void Tokenizer::SetTokenizerOption ( bool  wantPreprocessor,
bool  storeDocumentation 
)
inline

Handle condition preprocessor and store documentation or not.

Definition at line 100 of file tokenizer.h.

References TokenizerOptions::storeDocumentation, and TokenizerOptions::wantPreprocessor.

Referenced by ParserThread::ParserThread().

◆ SkipComment()

bool Tokenizer::SkipComment ( )
protected

◆ SkipPreprocessorBranch()

bool Tokenizer::SkipPreprocessorBranch ( )
protected

Skip the C preprocessor directive, such as #ifdef xxxx only the conditional preprocessor directives are handled here, the others such as #include or #warning and all kinds of ptOthers(.

See also
PreprocessorType) will passed to Parserthread class
Returns
true if we do move m_TokenIndex

Definition at line 804 of file tokenizer.cpp.

References _T, CurrentChar(), GetPreprocessorType(), HandleConditionPreprocessor(), m_TokenIndex, and ptOthers.

Referenced by SkipUnwanted().

◆ SkipString()

bool Tokenizer::SkipString ( )
protected

Skip the string literal(enclosed in double quotes) or character literal(enclosed in single quotes).

Definition at line 349 of file tokenizer.cpp.

References _T, CurrentChar(), IsEOF(), MoveToNextChar(), and SkipToStringEnd().

Referenced by Lex(), ReadToEOL(), SkipToEndConditionPreprocessor(), and SkipToNextConditionPreprocessor().

◆ SkipToChar()

bool Tokenizer::SkipToChar ( const wxChar ch)
protected

Skip characters until we meet a ch.

Definition at line 302 of file tokenizer.cpp.

References CurrentChar(), MoveToNextChar(), and NotEOF().

Referenced by SkipComment(), and SkipToInlineCommentEnd().

◆ SkipToEndConditionPreprocessor()

void Tokenizer::SkipToEndConditionPreprocessor ( )
private

Skip to the #endif conditional preprocessor directive.

for example:

#if 1
// active statements
#elif x
// skipped statements
#else
// skipped statements
#endif

if we see a "#if 1" branch we need to skip the next two branches, and go to "#endif"

Definition at line 1239 of file tokenizer.cpp.

References _T, CurrentChar(), MoveToNextChar(), NextChar(), SkipComment(), SkipString(), SkipToEOL(), and SkipWhiteSpace().

Referenced by HandleConditionPreprocessor(), and SkipToNextConditionPreprocessor().

◆ SkipToEOL()

bool Tokenizer::SkipToEOL ( )

Skip from the current position to the end of line, use with care outside this class!

Definition at line 555 of file tokenizer.cpp.

References _T, CurrentChar(), IsBackslashBeforeEOL(), IsEOF(), m_LineNumber, MoveToNextChar(), NextChar(), NotEOF(), PreviousChar(), SkipComment(), and TRACE.

Referenced by CalcConditionExpression(), ParserThread::DoParse(), HandleConditionPreprocessor(), HandleUndefs(), ReadToEOL(), and SkipToEndConditionPreprocessor().

◆ SkipToInlineCommentEnd()

bool Tokenizer::SkipToInlineCommentEnd ( )

Skip to then end of the C++ style comment.

Definition at line 588 of file tokenizer.cpp.

References _T, CurrentChar(), IsBackslashBeforeEOL(), IsEOF(), m_LineNumber, MoveToNextChar(), NextChar(), NotEOF(), PreviousChar(), SkipToChar(), and TRACE.

Referenced by SkipComment().

◆ SkipToNextConditionPreprocessor()

void Tokenizer::SkipToNextConditionPreprocessor ( )
private

Skip to the next conditional preprocessor directive branch.

for example:

#if 0
// skipped statements
#elif 0
// skipped statements
#else
// active statements
#endif

if we see a "#if 0", we need to jump to the next "#elif xxx"

Definition at line 1199 of file tokenizer.cpp.

References _T, CurrentChar(), m_LineNumber, m_TokenIndex, MoveToNextChar(), NextChar(), SkipComment(), SkipString(), SkipToEndConditionPreprocessor(), and SkipWhiteSpace().

Referenced by HandleConditionPreprocessor().

◆ SkipToStringEnd()

bool Tokenizer::SkipToStringEnd ( const wxChar ch)
protected

Move to the end of string literal or character literal, the m_TokenIndex will point at the closing quote character.

Parameters
chis a character either double quote or single quote
Returns
true if we reach the closing quote character

Definition at line 314 of file tokenizer.cpp.

References CurrentChar(), IsEOF(), IsEscapedChar(), and MoveToNextChar().

Referenced by SkipString().

◆ SkipUnwanted()

bool Tokenizer::SkipUnwanted ( )
protected

skips comments, spaces, preprocessor branch.

Definition at line 831 of file tokenizer.cpp.

References NotEOF(), SkipComment(), SkipPreprocessorBranch(), and SkipWhiteSpace().

Referenced by DoGetToken(), GetToken(), and PeekToken().

◆ SkipWhiteSpace()

bool Tokenizer::SkipWhiteSpace ( )
protected

◆ SplitArguments()

bool Tokenizer::SplitArguments ( wxArrayString results)
private

Split the macro arguments, and store them in results, when calling this function, we expect that m_TokenIndex point to the opening '(', or one space char before the opening '('.

such as below

..... ABC ( xxx, yyy ) zzz .....
^--------m_TokenIndex
Parameters
resultsin the above example, the result contains two items (xxx and yyy)
Returns
false if arguments (the parenthesis) are not found.

Definition at line 1486 of file tokenizer.cpp.

References _T, wxArrayString::Add(), wxString::Clear(), CurrentChar(), wxString::IsEmpty(), wxString::Last(), Lex(), m_Lex, m_NestLevel, m_State, MoveToNextChar(), NotEOF(), SkipComment(), SkipWhiteSpace(), and tsRawExpression.

Referenced by GetMacroExpandedText().

◆ UngetToken()

void Tokenizer::UngetToken ( )

Member Data Documentation

◆ m_Buffer

wxString Tokenizer::m_Buffer
private

Buffer content, all the lexical analysis is operating on this member variable.

Definition at line 502 of file tokenizer.h.

Referenced by BaseInit(), InitFromBuffer(), IsEscapedChar(), Lex(), ReadFile(), ReadToEOL(), and ReplaceBufferText().

◆ m_BufferLen

unsigned int Tokenizer::m_BufferLen
private

◆ m_ExpandedMacros

std::list<ExpandedMacro> Tokenizer::m_ExpandedMacros
private

this serves as a macro replacement stack, in the above example, if AAA is replaced by BBBB, we store the macro definition of AAA in the m_ExpandedMacros, and if BBBB is also defined as

#define BBBB CCC + DDD
#define CCC 1
When expanding BBBB, the new m_Buffer becomes
....CCC + DDD..................[EOF]
^
here, m_TokenIndex is moved back to the beginning of CCC, and you have the macro replacement
stack m_ExpandedMacros like below
The stack becomes
top -> macro BBBB
-> macro AAA
next, if CCC is expand to 1, you have this
......1 + DDD..................[EOF]
^
The stack becomes
top -> macro CCC
-> macro BBBB
-> macro AAA

if 1 is parsed, and we get a next token '+', the CCC in the top is popped.

when we try to expand a macro usage, we can look up in the stack to see whether the macro is already used. C preprocessor don't allow recursively expand a same macro twice. since std::stack does not allow us to loop all its elements, we use std::list.

Definition at line 626 of file tokenizer.h.

Referenced by Lex(), ReplaceBufferText(), and ReplaceMacroUsage().

◆ m_ExpressionResult

std::stack<bool> Tokenizer::m_ExpressionResult
private

preprocessor branch stack, if we meet a #if 1, then the value true will be pushed to to the stack, if we skip the #endif, the true value should be popped.

Definition at line 558 of file tokenizer.h.

Referenced by HandleConditionPreprocessor(), SetLastTokenIdx(), and SkipComment().

◆ m_FileIdx

unsigned int Tokenizer::m_FileIdx
private

File index, useful when parsing documentation;.

See also
SkipComment

Definition at line 500 of file tokenizer.h.

Referenced by AddMacroDefinition(), Init(), InitFromBuffer(), SetLastTokenIdx(), and SkipComment().

◆ m_Filename

wxString Tokenizer::m_Filename
private

Filename of the buffer.

Definition at line 498 of file tokenizer.h.

Referenced by HandleUndefs(), Init(), InitFromBuffer(), ReadFile(), and Tokenizer().

◆ m_IsOK

bool Tokenizer::m_IsOK
private

bool variable specifies whether the buffer is ready for parsing

Definition at line 550 of file tokenizer.h.

Referenced by BaseInit(), Init(), and InitFromBuffer().

◆ m_LastTokenIdx

int Tokenizer::m_LastTokenIdx
private

store the recent added token index for example, here is a comment

int aaa; //!< description of aaa

the token "aaa" is added to the token tree before reading the description. After that the token index is stored, and later if we read the "description of aaa", we will attach the document to the token

Definition at line 648 of file tokenizer.h.

Referenced by BaseInit(), SetLastTokenIdx(), and SkipComment().

◆ m_Lex

wxString Tokenizer::m_Lex
private

a lexeme string return by the Lex() function, this is a candidate token string, which may be replaced if it is a macro usage

Definition at line 509 of file tokenizer.h.

Referenced by CheckMacroUsageAndReplace(), DoGetToken(), GetMacroExpandedText(), GetPreprocessorType(), HandleDefines(), HandleUndefs(), IsMacroDefined(), Lex(), and SplitArguments().

◆ m_LineNumber

unsigned int Tokenizer::m_LineNumber
private

◆ m_Loader

LoaderBase* Tokenizer::m_Loader
private

File loader, it load the content to the m_Buffer, either from the harddisk or memory.

Definition at line 554 of file tokenizer.h.

Referenced by Init(), and ReadFile().

◆ m_NestLevel

unsigned int Tokenizer::m_NestLevel
private

keep track of block nesting { }

Definition at line 527 of file tokenizer.h.

Referenced by BaseInit(), GetPreprocessorType(), GetToken(), HandleConditionPreprocessor(), Lex(), PeekToken(), ReplaceBufferText(), SplitArguments(), and UngetToken().

◆ m_NextTokenDoc

wxString Tokenizer::m_NextTokenDoc
private

normally, this record the doxygen style comments for the next token definition for example, here is a comment

int aaa;

Then, the "description of aaa" is stored in this variable when the token "aaa" is added to the TokenTree, it will associate the document and token

Definition at line 637 of file tokenizer.h.

Referenced by BaseInit(), SetLastTokenIdx(), and SkipComment().

◆ m_PeekAvailable

bool Tokenizer::m_PeekAvailable
private

Peek token information.

Definition at line 535 of file tokenizer.h.

Referenced by GetToken(), PeekToken(), ReplaceBufferText(), and UngetToken().

◆ m_PeekLineNumber

unsigned int Tokenizer::m_PeekLineNumber
private

Definition at line 538 of file tokenizer.h.

Referenced by BaseInit(), GetToken(), PeekToken(), and UngetToken().

◆ m_PeekNestLevel

unsigned int Tokenizer::m_PeekNestLevel
private

Definition at line 539 of file tokenizer.h.

Referenced by BaseInit(), GetToken(), PeekToken(), and UngetToken().

◆ m_PeekToken

wxString Tokenizer::m_PeekToken
private

Definition at line 536 of file tokenizer.h.

Referenced by GetToken(), PeekToken(), and UngetToken().

◆ m_PeekTokenIndex

unsigned int Tokenizer::m_PeekTokenIndex
private

Definition at line 537 of file tokenizer.h.

Referenced by BaseInit(), GetToken(), PeekToken(), and UngetToken().

◆ m_ReadingMacroDefinition

bool Tokenizer::m_ReadingMacroDefinition
private

indicates whether we are reading the macro definition This variable will affect how the doxygen comments will be associated to the Token.

See also
Tokenizer::SkipComment for details

Definition at line 654 of file tokenizer.h.

Referenced by ReadToEOL(), and SkipComment().

◆ m_SavedLineNumber

unsigned int Tokenizer::m_SavedLineNumber
private

Definition at line 546 of file tokenizer.h.

Referenced by BaseInit(), HandleConditionPreprocessor(), PeekToken(), and ReplaceBufferText().

◆ m_SavedNestingLevel

unsigned int Tokenizer::m_SavedNestingLevel
private

Definition at line 547 of file tokenizer.h.

Referenced by BaseInit(), HandleConditionPreprocessor(), PeekToken(), and ReplaceBufferText().

◆ m_SavedTokenIndex

unsigned int Tokenizer::m_SavedTokenIndex
private

Saved token info (for PeekToken()), m_TokenIndex will be moved forward or backward when either DoGetToken() or SkipUnwanted() is called, so we should save m_TokenIndex before it get modified.

Definition at line 545 of file tokenizer.h.

Referenced by BaseInit(), HandleConditionPreprocessor(), PeekToken(), and ReplaceBufferText().

◆ m_State

TokenizerState Tokenizer::m_State
private

Tokeniser state specifies the token reading option.

Definition at line 552 of file tokenizer.h.

Referenced by CalcConditionExpression(), GetToken(), PeekToken(), and SplitArguments().

◆ m_Token

wxString Tokenizer::m_Token
private

These variables define the current token string and its auxiliary information, such as the token name, the line number of the token, the current brace nest level.

token name

Definition at line 514 of file tokenizer.h.

Referenced by GetToken(), and UngetToken().

◆ m_TokenIndex

unsigned int Tokenizer::m_TokenIndex
private

index offset in buffer, when parsing a buffer

....... namespace std { int a; .......
^ --- m_TokenIndex, m_Token = "std"

m_TokenIndex always points to the next character of a valid token, in the above example, it points to the space character after "std".

Definition at line 523 of file tokenizer.h.

Referenced by BaseInit(), CalcConditionExpression(), GetPreprocessorType(), GetToken(), HandleConditionPreprocessor(), IsEscapedChar(), Lex(), MoveToNextChar(), PeekToken(), ReadToEOL(), ReplaceBufferText(), SkipPreprocessorBranch(), SkipToNextConditionPreprocessor(), and UngetToken().

◆ m_TokenizerOptions

TokenizerOptions Tokenizer::m_TokenizerOptions
private

Tokenizer options specify the token reading option.

Definition at line 492 of file tokenizer.h.

Referenced by HandleConditionPreprocessor(), SkipComment(), and Tokenizer().

◆ m_TokenTree

TokenTree* Tokenizer::m_TokenTree
private

the Token tree to store the macro definition, the token tree is shared with Parserthread

Definition at line 495 of file tokenizer.h.

Referenced by AddMacroDefinition(), CheckMacroUsageAndReplace(), HandleUndefs(), Init(), InitFromBuffer(), IsMacroDefined(), SetLastTokenIdx(), and SkipComment().

◆ m_UndoLineNumber

unsigned int Tokenizer::m_UndoLineNumber
private

◆ m_UndoNestLevel

unsigned int Tokenizer::m_UndoNestLevel
private

◆ m_UndoTokenIndex

unsigned int Tokenizer::m_UndoTokenIndex
private

Backup the previous Token information.

Definition at line 530 of file tokenizer.h.

Referenced by BaseInit(), GetToken(), HandleConditionPreprocessor(), ReplaceBufferText(), and UngetToken().


The documentation for this class was generated from the following files: