Code::Blocks
SVN r11506
|
This is just a simple lexer class. More...
#include <tokenizer.h>
Classes | |
struct | ExpandedMacro |
replaced buffer information Here is an example of how macro are expanded More... | |
Public Member Functions | |
Tokenizer (TokenTree *tokenTree, const wxString &filename=wxEmptyString) | |
Tokenizer constructor. More... | |
~Tokenizer () | |
Tokenizer destructor. More... | |
bool | Init (const wxString &filename=wxEmptyString, LoaderBase *loader=0) |
Initialize the buffer by opening a file through a loader, this function copy the contents from the loader's buffer to its own buffer, so after that, we can safely delete the loader after this function call. More... | |
bool | InitFromBuffer (const wxString &buffer, const wxString &fileOfBuffer=wxEmptyString, size_t initLineNumber=0) |
Initialize the buffer by directly using a wxString's content. More... | |
wxString | GetToken () |
Consume and return the current token string. More... | |
wxString | PeekToken () |
Do a "look ahead", and return the next token string. More... | |
void | UngetToken () |
Undo the GetToken. More... | |
void | SetTokenizerOption (bool wantPreprocessor, bool storeDocumentation) |
Handle condition preprocessor and store documentation or not. More... | |
void | SetState (TokenizerState state) |
Set the Tokenizer skipping options. More... | |
TokenizerState | GetState () |
Return the token reading options value,. More... | |
const wxString & | GetFilename () const |
Return the opened files name. More... | |
unsigned int | GetLineNumber () const |
Return the line number of the current token string. More... | |
unsigned int | GetNestingLevel () const |
Return the brace "{}" level. More... | |
void | SaveNestingLevel () |
Save the brace "{" level, the parser might need to ignore the nesting level in some cases. More... | |
void | RestoreNestingLevel () |
Restore the brace level. More... | |
bool | IsOK () const |
If the buffer is correctly loaded, this function return true. More... | |
wxString | ReadToEOL (bool stripUnneeded=true) |
return the string from the current position to the end of current line, in most case, this function is used in handling #define, use with care outside this class! More... | |
void | ReadParentheses (wxString &str) |
read a string from '(' to ')', note that inner parentheses are considered More... | |
bool | SkipToEOL () |
Skip from the current position to the end of line, use with care outside this class! More... | |
bool | SkipToInlineCommentEnd () |
Skip to then end of the C++ style comment. More... | |
bool | IsEOF () const |
Check whether the Tokenizer reaches the end of the buffer (file) More... | |
bool | NotEOF () const |
return true if it is Not the end of buffer More... | |
bool | ReplaceBufferText (const wxString &target, const Token *macro=0) |
Backward buffer replacement for re-parsing. More... | |
bool | ReplaceMacroUsage (const Token *tk) |
Get expanded text for the current macro usage, then replace buffer for re-parsing. More... | |
int | GetFirstTokenPosition (const wxString &buffer, const wxString &target) |
Search "target" in the buffer, return first position in buffer. More... | |
int | GetFirstTokenPosition (const wxChar *buffer, const size_t bufferLen, const wxChar *key, const size_t keyLen) |
find the sub-string key in the whole buffer, return the first position of the key More... | |
int | KMP_Find (const wxChar *text, const wxChar *pattern, const int patternLen) |
KMP find, get the first position, if find nothing, return -1 https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm. More... | |
void | SetLastTokenIdx (int tokenIdx) |
a Token is added, associate doxygen style documents(comments before the variables) to the Token More... | |
Protected Member Functions | |
void | BaseInit () |
Initialize some member variables. More... | |
wxString | DoGetToken () |
Do the actual lexical analysis, both GetToken() and PeekToken() will internally call this function. More... | |
bool | CheckMacroUsageAndReplace () |
check the m_Lex to see it is an identifier like token, and also if it is a macro usage, replace it. More... | |
bool | Lex () |
this function only move the m_TokenIndex and get a lexeme and store it in m_Lex, the m_Lex will be further checked if it is a macro usage or not. More... | |
bool | ReadFile () |
Read a file, and fill the m_Buffer. More... | |
bool | IsEscapedChar () |
Check the current character is a C-Escape character in a string. More... | |
bool | SkipToChar (const wxChar &ch) |
Skip characters until we meet a ch. More... | |
bool | SkipUnwanted () |
skips comments, spaces, preprocessor branch. More... | |
bool | SkipWhiteSpace () |
Skip any "tab" "white-space". More... | |
bool | SkipComment () |
Skip the C/C++ comment. More... | |
bool | SkipPreprocessorBranch () |
Skip the C preprocessor directive, such as #ifdef xxxx only the conditional preprocessor directives are handled here, the others such as #include or #warning and all kinds of ptOthers(. More... | |
bool | SkipString () |
Skip the string literal(enclosed in double quotes) or character literal(enclosed in single quotes). More... | |
bool | SkipToStringEnd (const wxChar &ch) |
Move to the end of string literal or character literal, the m_TokenIndex will point at the closing quote character. More... | |
bool | MoveToNextChar () |
Move to the next character in the buffer. More... | |
wxChar | CurrentChar () const |
Return the current character indexed(pointed) by m_TokenIndex in the m_Buffer. More... | |
wxChar | CurrentCharMoveNext () |
Do the previous two functions sequentially. More... | |
wxChar | NextChar () const |
Return (peek) the next character. More... | |
wxChar | PreviousChar () const |
Return (peek) the previous character. More... | |
Private Member Functions | |
bool | CharInString (const wxChar ch, const wxChar *chars) const |
Check if a ch matches any characters in the wxChar array. More... | |
bool | IsBackslashBeforeEOL () |
Check the previous char before EOL is a backslash, call this function in the condition that the CurrentChar is ' ', here we have two cases: More... | |
bool | CalcConditionExpression () |
#if xxxx, calculate the value of "xxxx" More... | |
bool | IsMacroDefined () |
If the next token string is macro definition, return true this is used in the situation when we are reading the conditional preprocessors such as checking macro defined like below. More... | |
void | HandleDefines () |
handle the macro definition statement: #define XXXXX More... | |
void | HandleUndefs () |
handle the statement: #undef XXXXX More... | |
void | AddMacroDefinition (wxString name, int line, wxString para, wxString substitues) |
add a macro definition to the Token database for example: #define AAA(x,y) x+y More... | |
void | SkipToNextConditionPreprocessor () |
Skip to the next conditional preprocessor directive branch. More... | |
void | SkipToEndConditionPreprocessor () |
Skip to the #endif conditional preprocessor directive. More... | |
PreprocessorType | GetPreprocessorType () |
Get current conditional preprocessor type,. More... | |
void | HandleConditionPreprocessor (const PreprocessorType type) |
handle the preprocessor directive: #ifdef XXX or #endif or #if or #elif or... More... | |
bool | SplitArguments (wxArrayString &results) |
Split the macro arguments, and store them in results, when calling this function, we expect that m_TokenIndex point to the opening '(', or one space char before the opening '('. More... | |
bool | GetMacroExpandedText (const Token *tk, wxString &expandedText) |
Get the full expanded text. More... | |
void | KMP_GetNextVal (const wxChar *pattern, int next[]) |
used in the KMP find function More... | |
Private Attributes | |
TokenizerOptions | m_TokenizerOptions |
Tokenizer options specify the token reading option. More... | |
TokenTree * | m_TokenTree |
the Token tree to store the macro definition, the token tree is shared with Parserthread More... | |
wxString | m_Filename |
Filename of the buffer. More... | |
unsigned int | m_FileIdx |
File index, useful when parsing documentation;. More... | |
wxString | m_Buffer |
Buffer content, all the lexical analysis is operating on this member variable. More... | |
unsigned int | m_BufferLen |
Buffer length. More... | |
wxString | m_Lex |
a lexeme string return by the Lex() function, this is a candidate token string, which may be replaced if it is a macro usage More... | |
wxString | m_Token |
These variables define the current token string and its auxiliary information, such as the token name, the line number of the token, the current brace nest level. More... | |
unsigned int | m_TokenIndex |
index offset in buffer, when parsing a buffer More... | |
unsigned int | m_LineNumber |
line offset in buffer, please note that it is 1 based, not 0 based More... | |
unsigned int | m_NestLevel |
keep track of block nesting { } More... | |
unsigned int | m_UndoTokenIndex |
Backup the previous Token information. More... | |
unsigned int | m_UndoLineNumber |
unsigned int | m_UndoNestLevel |
bool | m_PeekAvailable |
Peek token information. More... | |
wxString | m_PeekToken |
unsigned int | m_PeekTokenIndex |
unsigned int | m_PeekLineNumber |
unsigned int | m_PeekNestLevel |
unsigned int | m_SavedTokenIndex |
Saved token info (for PeekToken()), m_TokenIndex will be moved forward or backward when either DoGetToken() or SkipUnwanted() is called, so we should save m_TokenIndex before it get modified. More... | |
unsigned int | m_SavedLineNumber |
unsigned int | m_SavedNestingLevel |
bool | m_IsOK |
bool variable specifies whether the buffer is ready for parsing More... | |
TokenizerState | m_State |
Tokeniser state specifies the token reading option. More... | |
LoaderBase * | m_Loader |
File loader, it load the content to the m_Buffer, either from the harddisk or memory. More... | |
std::stack< bool > | m_ExpressionResult |
preprocessor branch stack, if we meet a #if 1, then the value true will be pushed to to the stack, if we skip the #endif, the true value should be popped. More... | |
std::list< ExpandedMacro > | m_ExpandedMacros |
this serves as a macro replacement stack, in the above example, if AAA is replaced by BBBB, we store the macro definition of AAA in the m_ExpandedMacros, and if BBBB is also defined as More... | |
wxString | m_NextTokenDoc |
normally, this record the doxygen style comments for the next token definition for example, here is a comment More... | |
int | m_LastTokenIdx |
store the recent added token index for example, here is a comment More... | |
bool | m_ReadingMacroDefinition |
indicates whether we are reading the macro definition This variable will affect how the doxygen comments will be associated to the Token. More... | |
This is just a simple lexer class.
A Tokenizer does the lexical analysis on a buffer. The buffer is either a wxString loaded from a local source/header file or a wxString already in memory(e.g. the scintilla text buffer). The most public interfaces are two member functions: GetToken() and PeekToken(). The former one eats one token string from buffer, the later one does a "look ahead" on the buffer and return the next token string(peeked string). The peeked string will be cached until the next GetToken() call, thus performance can be improved. Also, Tokenizer class does some kind of handling "Macro expansion" on the buffer, from this point of view, this class is a kind of preprocessor Further more, it handles some "conditional preprocessor directives"(like "#if xxx").
Definition at line 64 of file tokenizer.h.
Tokenizer::Tokenizer | ( | TokenTree * | tokenTree, |
const wxString & | filename = wxEmptyString |
||
) |
Tokenizer constructor.
filename | the file to be opened. |
Definition at line 91 of file tokenizer.cpp.
References Init(), wxString::IsEmpty(), m_Filename, m_TokenizerOptions, TokenizerOptions::storeDocumentation, and TokenizerOptions::wantPreprocessor.
Tokenizer::~Tokenizer | ( | ) |
Tokenizer destructor.
Definition at line 122 of file tokenizer.cpp.
|
private |
add a macro definition to the Token database for example: #define AAA(x,y) x+y
name | macro name which is "AAA" |
line | the line number of the macro definition |
para | the formal parameters, which is "(x,y)" |
substitues | the definition which is "x+y" |
Definition at line 2027 of file tokenizer.cpp.
References TokenTree::at(), TokenTree::insert(), Token::m_Args, m_FileIdx, Token::m_FullType, Token::m_Index, Token::m_ParentIndex, Token::m_TokenKind, TokenTree::m_TokenTicketCount, m_TokenTree, SetLastTokenIdx(), tkMacroDef, TokenTree::TokenExists(), and wxNOT_FOUND.
Referenced by HandleDefines().
|
protected |
Initialize some member variables.
Definition at line 191 of file tokenizer.cpp.
References wxString::Clear(), wxString::clear(), m_Buffer, m_BufferLen, m_IsOK, m_LastTokenIdx, m_LineNumber, m_NestLevel, m_NextTokenDoc, m_PeekLineNumber, m_PeekNestLevel, m_PeekTokenIndex, m_SavedLineNumber, m_SavedNestingLevel, m_SavedTokenIndex, m_TokenIndex, m_UndoLineNumber, m_UndoNestLevel, and m_UndoTokenIndex.
Referenced by Init(), and InitFromBuffer().
|
private |
#if xxxx, calculate the value of "xxxx"
Definition at line 1093 of file tokenizer.cpp.
References _T, Expression::AddToInfixExpression(), Expression::CalcPostfix(), Expression::ConvertInfixToPostfix(), DoGetToken(), wxString::Format(), Expression::GetResult(), Expression::GetStatus(), IsMacroDefined(), wxString::Len(), m_BufferLen, m_LineNumber, m_State, m_TokenIndex, SkipComment(), SkipToEOL(), SkipWhiteSpace(), wxString::StartsWith(), wxString::ToLong(), TRACE, tsRawExpression, and wxIsalnum().
Referenced by HandleConditionPreprocessor().
Check if a ch matches any characters in the wxChar array.
Definition at line 366 of file tokenizer.h.
References wxStrlen().
Referenced by Lex().
|
protected |
check the m_Lex to see it is an identifier like token, and also if it is a macro usage, replace it.
Definition at line 1069 of file tokenizer.cpp.
References TokenTree::at(), m_Lex, m_TokenTree, ReplaceMacroUsage(), tkMacroDef, and TokenTree::TokenExists().
Referenced by DoGetToken().
|
inlineprotected |
Return the current character indexed(pointed) by m_TokenIndex in the m_Buffer.
Definition at line 331 of file tokenizer.h.
Referenced by Lex(), ReadToEOL(), SkipComment(), SkipPreprocessorBranch(), SkipString(), SkipToChar(), SkipToEndConditionPreprocessor(), SkipToEOL(), SkipToInlineCommentEnd(), SkipToNextConditionPreprocessor(), SkipToStringEnd(), SkipWhiteSpace(), and SplitArguments().
|
inlineprotected |
Do the previous two functions sequentially.
Definition at line 339 of file tokenizer.h.
|
protected |
Do the actual lexical analysis, both GetToken() and PeekToken() will internally call this function.
It just move the m_TokenIndex one step forward, and return a lexeme before the m_TokenIndex.
Definition at line 944 of file tokenizer.cpp.
References CheckMacroUsageAndReplace(), Lex(), m_Lex, and SkipUnwanted().
Referenced by CalcConditionExpression(), GetToken(), PeekToken(), and ReadParentheses().
|
inline |
Return the opened files name.
Definition at line 121 of file tokenizer.h.
Referenced by ParserThread::HandleConditionalArguments(), ParserThread::HandleForLoopArguments(), ParserThread::ReadClsNames(), and ParserThread::ReadVarNames().
Search "target" in the buffer, return first position in buffer.
it is used to find the formal argument in the macro definition body.
buffer | the content |
target | the search key |
Definition at line 240 of file tokenizer.h.
References wxString::GetData(), and wxString::Len().
Referenced by GetMacroExpandedText().
int Tokenizer::GetFirstTokenPosition | ( | const wxChar * | buffer, |
const size_t | bufferLen, | ||
const wxChar * | key, | ||
const size_t | keyLen | ||
) |
find the sub-string key in the whole buffer, return the first position of the key
buffer | the content of the string |
bufferLen | length of the string |
key | the search key(sub-string) |
keyLen | the search key length |
Definition at line 1908 of file tokenizer.cpp.
References _T, KMP_Find(), and wxIsalnum().
|
inline |
Return the line number of the current token string.
Definition at line 127 of file tokenizer.h.
Referenced by ParserThread::DoParse(), ParserThread::HandleClass(), ParserThread::HandleConditionalArguments(), HandleDefines(), ParserThread::HandleEnum(), ParserThread::HandleForLoopArguments(), ParserThread::HandleFunction(), ParserThread::HandleNamespace(), ParserThread::HandleTypedef(), ParserThread::ParseBufferForNamespaces(), ParserThread::ReadClsNames(), and ParserThread::ReadVarNames().
Get the full expanded text.
tk | the macro definition token, usually a function like macro definition |
expandedText | is an output string call this function in the condition that we have just detect the current token is a macro usage, such as in the condition below, that "ABC" is a macro usage: ......ABC(abc, (def))..... ^--------m_TokenIndex |
Definition at line 1730 of file tokenizer.cpp.
References _T, wxString::Alloc(), wxString::Find(), wxArrayString::GetCount(), wxString::GetData(), GetFirstTokenPosition(), wxString::IsEmpty(), wxString::Len(), Token::m_Args, Token::m_FullType, m_Lex, Token::m_Name, wxString::Remove(), ReplaceBufferText(), wxString::SetChar(), wxString::size(), SplitArguments(), TRACE, wxString::wx_str(), wxIsalpha(), and wxNOT_FOUND.
Referenced by ReplaceMacroUsage().
|
inline |
Return the brace "{}" level.
the value will increase by one when we meet a "{", decrease by one when we meet a "}".
Definition at line 135 of file tokenizer.h.
Referenced by ParserThread::HandleEnum(), ParserThread::SkipBlock(), and ParserThread::SkipToOneOfChars().
|
private |
Get current conditional preprocessor type,.
Definition at line 1275 of file tokenizer.cpp.
References TokenizerConsts::kw_define, TokenizerConsts::kw_elif, TokenizerConsts::kw_elifdef, TokenizerConsts::kw_elifndef, TokenizerConsts::kw_else, TokenizerConsts::kw_endif, TokenizerConsts::kw_if, TokenizerConsts::kw_ifdef, TokenizerConsts::kw_ifndef, TokenizerConsts::kw_undef, wxString::Len(), Lex(), m_Lex, m_LineNumber, m_NestLevel, m_TokenIndex, MoveToNextChar(), ptDefine, ptElif, ptElifdef, ptElifndef, ptElse, ptEndif, ptIf, ptIfdef, ptIfndef, ptOthers, ptUndef, SkipComment(), and SkipWhiteSpace().
Referenced by SkipPreprocessorBranch().
|
inline |
Return the token reading options value,.
Definition at line 115 of file tokenizer.h.
Referenced by ParserThread::CalcEnumExpression(), ParserThread::DoParse(), ParserThread::GetTemplateArgs(), ParserThread::HandleClass(), ParserThread::HandleEnum(), ParserThread::SkipAngleBraces(), and ParserThread::SkipBlock().
wxString Tokenizer::GetToken | ( | ) |
Consume and return the current token string.
Definition at line 839 of file tokenizer.cpp.
References _T, wxString::Clear(), DoGetToken(), m_LineNumber, m_NestLevel, m_PeekAvailable, m_PeekLineNumber, m_PeekNestLevel, m_PeekToken, m_PeekTokenIndex, m_State, m_Token, m_TokenIndex, m_UndoLineNumber, m_UndoNestLevel, m_UndoTokenIndex, ReadParentheses(), SkipUnwanted(), and tsRawExpression.
Referenced by ParserThread::CalcEnumExpression(), NativeParserBase::ComputeCallTip(), ParserThread::DoParse(), ParserThread::GetTemplateArgs(), ParserThread::HandleClass(), ParserThread::HandleConditionalArguments(), ParserThread::HandleEnum(), ParserThread::HandleForLoopArguments(), ParserThread::HandleFunction(), ParserThread::HandleIncludes(), ParserThread::HandleNamespace(), ParserThread::HandleTypedef(), ParserThread::ParseBufferForNamespaces(), ParserThread::ParseBufferForUsingNamespace(), ParserThread::ReadAngleBrackets(), ParserThread::ReadClsNames(), ParserThread::ReadVarNames(), ParserThread::SkipAngleBraces(), ParserThread::SkipBlock(), and ParserThread::SkipToOneOfChars().
|
private |
handle the preprocessor directive: #ifdef XXX or #endif or #if or #elif or...
If conditional preprocessor handles correctly, return true, otherwise return false.
Definition at line 1340 of file tokenizer.cpp.
References _T, CalcConditionExpression(), HandleDefines(), HandleUndefs(), IsMacroDefined(), m_ExpressionResult, m_LineNumber, m_NestLevel, m_SavedLineNumber, m_SavedNestingLevel, m_SavedTokenIndex, m_TokenIndex, m_TokenizerOptions, m_UndoLineNumber, m_UndoNestLevel, m_UndoTokenIndex, ptDefine, ptElif, ptElifdef, ptElifndef, ptElse, ptEndif, ptIf, ptIfdef, ptIfndef, ptOthers, ptUndef, SkipToEndConditionPreprocessor(), SkipToEOL(), SkipToNextConditionPreprocessor(), TRACE, and TokenizerOptions::wantPreprocessor.
Referenced by SkipPreprocessorBranch().
|
private |
handle the macro definition statement: #define XXXXX
Definition at line 1949 of file tokenizer.cpp.
References _T, AddMacroDefinition(), wxString::GetChar(), GetLineNumber(), wxString::IsEmpty(), wxString::Left(), wxString::Len(), Lex(), m_Lex, ReadToEOL(), wxString::Right(), SkipComment(), SkipWhiteSpace(), and wxT.
Referenced by HandleConditionPreprocessor().
|
private |
handle the statement: #undef XXXXX
Definition at line 2009 of file tokenizer.cpp.
References _T, TokenTree::erase(), F(), wxString::IsEmpty(), Lex(), m_Filename, m_Lex, m_LineNumber, m_TokenTree, SkipComment(), SkipToEOL(), SkipWhiteSpace(), tkMacroDef, TokenTree::TokenExists(), TRACE, wxString::wx_str(), and wxNOT_FOUND.
Referenced by HandleConditionPreprocessor().
bool Tokenizer::Init | ( | const wxString & | filename = wxEmptyString , |
LoaderBase * | loader = 0 |
||
) |
Initialize the buffer by opening a file through a loader, this function copy the contents from the loader's buffer to its own buffer, so after that, we can safely delete the loader after this function call.
Definition at line 126 of file tokenizer.cpp.
References _T, BaseInit(), TokenTree::GetFileIndex(), wxString::IsEmpty(), m_BufferLen, m_FileIdx, m_Filename, m_IsOK, m_Loader, m_TokenTree, ReadFile(), wxString::Replace(), TRACE, TRACE2, TRACE2_SET_FLAG, wxString::wx_str(), and wxFileExists().
Referenced by ParserThread::InitTokenizer(), and Tokenizer().
bool Tokenizer::InitFromBuffer | ( | const wxString & | buffer, |
const wxString & | fileOfBuffer = wxEmptyString , |
||
size_t | initLineNumber = 0 |
||
) |
Initialize the buffer by directly using a wxString's content.
initLineNumber | the start line of the buffer, usually the parser try to parse a function body, so the line information of each local variable tokens are correct. |
buffer | text content used for parsing |
fileOfBuffer | the file name where the buffer come from. |
Definition at line 174 of file tokenizer.cpp.
References _T, BaseInit(), TokenTree::GetFileIndex(), wxString::Length(), m_Buffer, m_BufferLen, m_FileIdx, m_Filename, m_IsOK, m_LineNumber, m_TokenTree, and wxString::Replace().
Referenced by NativeParserBase::ComputeCallTip(), ParserThread::HandleConditionalArguments(), ParserThread::HandleForLoopArguments(), ParserThread::InitTokenizer(), ParserThread::ParseBufferForNamespaces(), and ParserThread::ParseBufferForUsingNamespace().
|
inlineprivate |
Check the previous char before EOL is a backslash, call this function in the condition that the CurrentChar is '
', here we have two cases:
Definition at line 386 of file tokenizer.h.
References _T.
Referenced by ReadToEOL(), SkipComment(), SkipToEOL(), and SkipToInlineCommentEnd().
|
inline |
Check whether the Tokenizer reaches the end of the buffer (file)
Definition at line 177 of file tokenizer.h.
Referenced by Lex(), MoveToNextChar(), ReadToEOL(), SkipComment(), SkipString(), SkipToEOL(), SkipToInlineCommentEnd(), SkipToStringEnd(), and SkipWhiteSpace().
|
protected |
Check the current character is a C-Escape character in a string.
Definition at line 280 of file tokenizer.cpp.
References wxString::GetChar(), m_Buffer, m_BufferLen, m_TokenIndex, and PreviousChar().
Referenced by SkipToStringEnd().
|
private |
If the next token string is macro definition, return true this is used in the situation when we are reading the conditional preprocessors such as checking macro defined like below.
Then we try to see whether we check to see "xxx" is a macro definition, we don't need to expand the "xxx" here
Definition at line 1171 of file tokenizer.cpp.
References _T, Lex(), m_Lex, m_TokenTree, SkipComment(), SkipWhiteSpace(), tkMacroDef, and TokenTree::TokenExists().
Referenced by CalcConditionExpression(), and HandleConditionPreprocessor().
|
inline |
If the buffer is correctly loaded, this function return true.
Definition at line 153 of file tokenizer.h.
Referenced by ParserThread::Parse(), ParserThread::ParseBufferForNamespaces(), and ParserThread::ParseBufferForUsingNamespace().
KMP find, get the first position, if find nothing, return -1 https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.
Definition at line 1673 of file tokenizer.cpp.
References _T, KMP_GetNextVal(), and TRACE.
Referenced by GetFirstTokenPosition().
|
private |
used in the KMP find function
Definition at line 1653 of file tokenizer.cpp.
References _T.
Referenced by KMP_Find().
|
protected |
this function only move the m_TokenIndex and get a lexeme and store it in m_Lex, the m_Lex will be further checked if it is a macro usage or not.
Definition at line 965 of file tokenizer.cpp.
References _T, wxString::assign(), CharInString(), TokenizerConsts::colon, TokenizerConsts::colon_colon, CurrentChar(), TokenizerConsts::equal, IsEOF(), m_Buffer, m_ExpandedMacros, m_Lex, m_NestLevel, m_TokenIndex, wxString::Mid(), MoveToNextChar(), NextChar(), NotEOF(), SkipString(), wxEmptyString, wxIsalnum(), wxIsalpha(), and wxIsdigit().
Referenced by DoGetToken(), GetPreprocessorType(), HandleDefines(), HandleUndefs(), IsMacroDefined(), and SplitArguments().
|
protected |
Move to the next character in the buffer.
Definition at line 334 of file tokenizer.cpp.
References _T, IsEOF(), m_BufferLen, m_LineNumber, m_TokenIndex, and PreviousChar().
Referenced by GetPreprocessorType(), Lex(), ReadToEOL(), SkipComment(), SkipString(), SkipToChar(), SkipToEndConditionPreprocessor(), SkipToEOL(), SkipToInlineCommentEnd(), SkipToNextConditionPreprocessor(), SkipToStringEnd(), SkipWhiteSpace(), and SplitArguments().
|
inlineprotected |
Return (peek) the next character.
Definition at line 347 of file tokenizer.h.
Referenced by Lex(), ReadToEOL(), SkipComment(), SkipToEndConditionPreprocessor(), SkipToEOL(), SkipToInlineCommentEnd(), and SkipToNextConditionPreprocessor().
|
inline |
return true if it is Not the end of buffer
Definition at line 183 of file tokenizer.h.
Referenced by ParserThread::DoParse(), Lex(), ParserThread::ParseBufferForNamespaces(), ParserThread::ParseBufferForUsingNamespace(), ParserThread::ReadAngleBrackets(), ReadParentheses(), ReadToEOL(), SkipToChar(), SkipToEOL(), SkipToInlineCommentEnd(), SkipUnwanted(), and SplitArguments().
wxString Tokenizer::PeekToken | ( | ) |
Do a "look ahead", and return the next token string.
Definition at line 869 of file tokenizer.cpp.
References _T, wxString::Clear(), DoGetToken(), m_LineNumber, m_NestLevel, m_PeekAvailable, m_PeekLineNumber, m_PeekNestLevel, m_PeekToken, m_PeekTokenIndex, m_SavedLineNumber, m_SavedNestingLevel, m_SavedTokenIndex, m_State, m_TokenIndex, ReadParentheses(), SkipUnwanted(), and tsRawExpression.
Referenced by NativeParserBase::ComputeCallTip(), ParserThread::DoParse(), ParserThread::HandleClass(), ParserThread::HandleConditionalArguments(), ParserThread::HandleEnum(), ParserThread::HandleForLoopArguments(), ParserThread::HandleFunction(), ParserThread::HandleNamespace(), ParserThread::HandleTypedef(), ParserThread::ParseBufferForNamespaces(), and ParserThread::ParseBufferForUsingNamespace().
|
inlineprotected |
Return (peek) the previous character.
Definition at line 356 of file tokenizer.h.
Referenced by IsEscapedChar(), MoveToNextChar(), ReadToEOL(), SkipToEOL(), and SkipToInlineCommentEnd().
|
protected |
Read a file, and fill the m_Buffer.
Definition at line 212 of file tokenizer.cpp.
References _T, cbRead(), LoaderBase::FileName(), LoaderBase::GetData(), LoaderBase::GetLength(), wxString::Length(), m_Buffer, m_BufferLen, m_Filename, m_Loader, wxEmptyString, and wxFileExists().
Referenced by Init().
void Tokenizer::ReadParentheses | ( | wxString & | str | ) |
read a string from '(' to ')', note that inner parentheses are considered
str | the returned string |
Definition at line 496 of file tokenizer.cpp.
References _T, DoGetToken(), wxString::Last(), NotEOF(), wxIsalnum(), and wxIsalpha().
Referenced by GetToken(), and PeekToken().
wxString Tokenizer::ReadToEOL | ( | bool | stripUnneeded = true | ) |
return the string from the current position to the end of current line, in most case, this function is used in handling #define, use with care outside this class!
stripUnneeded | true if you want to remove comments and compression spaces(two or more spaces should become one space) |
Definition at line 367 of file tokenizer.cpp.
References _T, wxString::Append(), CurrentChar(), IsBackslashBeforeEOL(), IsEOF(), m_Buffer, m_LineNumber, m_ReadingMacroDefinition, m_TokenIndex, wxString::Mid(), MoveToNextChar(), NextChar(), NotEOF(), PreviousChar(), SkipComment(), SkipString(), SkipToEOL(), TRACE, and wxString::wx_str().
Referenced by HandleDefines().
Backward buffer replacement for re-parsing.
target | the new text going to replace some text on the m_Buffer |
macro | if it is a macro expansion, we need to remember the referenced(used) macro token so that we can avoid the recursive macro expansion such as the below code: #define X Y #define Y X int X; |
Macro expansion is just replace some characters in the m_Buffer.
For example, the above is a wxChar Array(m_Buffer), a macro usage "AAAA(u,v)" is detected and need to expanded. We just do a "backward" text replace here. Before replacement, m_TokenIndex points to the next char of ")" in "AAAA(u,v)"(We say it as an anchor point). After replacement, the new buffer becomes:
Note that "NNNNNNNNNNNNNNN" is the expanded new text. The m_TokenIndex was moved backward to the beginning of the new added text. If the new text is small enough, then m_Buffer's length does not need to increase. The situation when our m_Buffer's length need to be increased is that the new text is too long, so the buffer before "anchor point" can not hold the new text, this way, m_Buffer's length will adjusted. like below:
NNNNNNNNNNNNNNNNNNNNNNyyyyyyyyy ^—m_TokenIndex
Definition at line 1546 of file tokenizer.cpp.
References _T, wxString::GetChar(), wxString::insert(), wxString::IsEmpty(), wxString::Len(), m_Buffer, m_BufferLen, Tokenizer::ExpandedMacro::m_End, m_ExpandedMacros, m_LineNumber, Tokenizer::ExpandedMacro::m_Macro, m_NestLevel, m_PeekAvailable, m_SavedLineNumber, m_SavedNestingLevel, m_SavedTokenIndex, m_TokenIndex, m_UndoLineNumber, m_UndoNestLevel, m_UndoTokenIndex, s_MaxMacroReplaceDepth, wxString::SetChar(), TRACE, and wxString::wx_str().
Referenced by GetMacroExpandedText(), and ReplaceMacroUsage().
bool Tokenizer::ReplaceMacroUsage | ( | const Token * | tk | ) |
Get expanded text for the current macro usage, then replace buffer for re-parsing.
tk | the macro definition token |
Definition at line 1635 of file tokenizer.cpp.
References GetMacroExpandedText(), m_ExpandedMacros, and ReplaceBufferText().
Referenced by CheckMacroUsageAndReplace().
|
inline |
Restore the brace level.
Definition at line 147 of file tokenizer.h.
|
inline |
Save the brace "{" level, the parser might need to ignore the nesting level in some cases.
Definition at line 141 of file tokenizer.h.
void Tokenizer::SetLastTokenIdx | ( | int | tokenIdx | ) |
a Token is added, associate doxygen style documents(comments before the variables) to the Token
Definition at line 1719 of file tokenizer.cpp.
References TokenTree::AppendDocumentation(), wxString::clear(), wxString::IsEmpty(), m_ExpressionResult, m_FileIdx, m_LastTokenIdx, m_NextTokenDoc, and m_TokenTree.
Referenced by AddMacroDefinition(), and ParserThread::DoAddToken().
|
inline |
Set the Tokenizer skipping options.
E.g. normally, we read the parentheses as a whole token, but sometimes, we should disable this options,
Definition at line 109 of file tokenizer.h.
Referenced by ParserThread::CalcEnumExpression(), ParserThread::DoParse(), ParserThread::GetTemplateArgs(), ParserThread::HandleClass(), ParserThread::HandleEnum(), ParserThread::HandleNamespace(), ParserThread::ParseBufferForNamespaces(), ParserThread::SkipAngleBraces(), and ParserThread::SkipBlock().
|
inline |
Handle condition preprocessor and store documentation or not.
Definition at line 100 of file tokenizer.h.
References TokenizerOptions::storeDocumentation, and TokenizerOptions::wantPreprocessor.
Referenced by ParserThread::ParserThread().
|
protected |
Skip the C/C++ comment.
valid documents
invalid documents
Definition at line 612 of file tokenizer.cpp.
References _T, TokenTree::AppendDocumentation(), CurrentChar(), IsBackslashBeforeEOL(), IsEOF(), m_ExpressionResult, m_FileIdx, m_LastTokenIdx, m_LineNumber, m_NextTokenDoc, m_ReadingMacroDefinition, m_TokenizerOptions, m_TokenTree, MoveToNextChar(), NextChar(), wxString::size(), SkipToChar(), SkipToInlineCommentEnd(), SkipWhiteSpace(), TokenizerOptions::storeDocumentation, and TRACE.
Referenced by CalcConditionExpression(), GetPreprocessorType(), HandleDefines(), HandleUndefs(), IsMacroDefined(), ReadToEOL(), SkipToEndConditionPreprocessor(), SkipToEOL(), SkipToNextConditionPreprocessor(), SkipUnwanted(), and SplitArguments().
|
protected |
Skip the C preprocessor directive, such as #ifdef xxxx only the conditional preprocessor directives are handled here, the others such as #include or #warning and all kinds of ptOthers(.
Definition at line 804 of file tokenizer.cpp.
References _T, CurrentChar(), GetPreprocessorType(), HandleConditionPreprocessor(), m_TokenIndex, and ptOthers.
Referenced by SkipUnwanted().
|
protected |
Skip the string literal(enclosed in double quotes) or character literal(enclosed in single quotes).
Definition at line 349 of file tokenizer.cpp.
References _T, CurrentChar(), IsEOF(), MoveToNextChar(), and SkipToStringEnd().
Referenced by Lex(), ReadToEOL(), SkipToEndConditionPreprocessor(), and SkipToNextConditionPreprocessor().
|
protected |
Skip characters until we meet a ch.
Definition at line 302 of file tokenizer.cpp.
References CurrentChar(), MoveToNextChar(), and NotEOF().
Referenced by SkipComment(), and SkipToInlineCommentEnd().
|
private |
Skip to the #endif conditional preprocessor directive.
for example:
if we see a "#if 1" branch we need to skip the next two branches, and go to "#endif"
Definition at line 1239 of file tokenizer.cpp.
References _T, CurrentChar(), MoveToNextChar(), NextChar(), SkipComment(), SkipString(), SkipToEOL(), and SkipWhiteSpace().
Referenced by HandleConditionPreprocessor(), and SkipToNextConditionPreprocessor().
bool Tokenizer::SkipToEOL | ( | ) |
Skip from the current position to the end of line, use with care outside this class!
Definition at line 555 of file tokenizer.cpp.
References _T, CurrentChar(), IsBackslashBeforeEOL(), IsEOF(), m_LineNumber, MoveToNextChar(), NextChar(), NotEOF(), PreviousChar(), SkipComment(), and TRACE.
Referenced by CalcConditionExpression(), ParserThread::DoParse(), HandleConditionPreprocessor(), HandleUndefs(), ReadToEOL(), and SkipToEndConditionPreprocessor().
bool Tokenizer::SkipToInlineCommentEnd | ( | ) |
Skip to then end of the C++ style comment.
Definition at line 588 of file tokenizer.cpp.
References _T, CurrentChar(), IsBackslashBeforeEOL(), IsEOF(), m_LineNumber, MoveToNextChar(), NextChar(), NotEOF(), PreviousChar(), SkipToChar(), and TRACE.
Referenced by SkipComment().
|
private |
Skip to the next conditional preprocessor directive branch.
for example:
if we see a "#if 0", we need to jump to the next "#elif xxx"
Definition at line 1199 of file tokenizer.cpp.
References _T, CurrentChar(), m_LineNumber, m_TokenIndex, MoveToNextChar(), NextChar(), SkipComment(), SkipString(), SkipToEndConditionPreprocessor(), and SkipWhiteSpace().
Referenced by HandleConditionPreprocessor().
|
protected |
Move to the end of string literal or character literal, the m_TokenIndex will point at the closing quote character.
ch | is a character either double quote or single quote |
Definition at line 314 of file tokenizer.cpp.
References CurrentChar(), IsEOF(), IsEscapedChar(), and MoveToNextChar().
Referenced by SkipString().
|
protected |
skips comments, spaces, preprocessor branch.
Definition at line 831 of file tokenizer.cpp.
References NotEOF(), SkipComment(), SkipPreprocessorBranch(), and SkipWhiteSpace().
Referenced by DoGetToken(), GetToken(), and PeekToken().
|
protected |
Skip any "tab" "white-space".
Definition at line 263 of file tokenizer.cpp.
References _T, CurrentChar(), IsEOF(), and MoveToNextChar().
Referenced by CalcConditionExpression(), GetPreprocessorType(), HandleDefines(), HandleUndefs(), IsMacroDefined(), SkipComment(), SkipToEndConditionPreprocessor(), SkipToNextConditionPreprocessor(), SkipUnwanted(), and SplitArguments().
|
private |
Split the macro arguments, and store them in results, when calling this function, we expect that m_TokenIndex point to the opening '(', or one space char before the opening '('.
such as below
results | in the above example, the result contains two items (xxx and yyy) |
Definition at line 1486 of file tokenizer.cpp.
References _T, wxArrayString::Add(), wxString::Clear(), CurrentChar(), wxString::IsEmpty(), wxString::Last(), Lex(), m_Lex, m_NestLevel, m_State, MoveToNextChar(), NotEOF(), SkipComment(), SkipWhiteSpace(), and tsRawExpression.
Referenced by GetMacroExpandedText().
void Tokenizer::UngetToken | ( | ) |
Undo the GetToken.
Definition at line 914 of file tokenizer.cpp.
References m_LineNumber, m_NestLevel, m_PeekAvailable, m_PeekLineNumber, m_PeekNestLevel, m_PeekToken, m_PeekTokenIndex, m_Token, m_TokenIndex, m_UndoLineNumber, m_UndoNestLevel, and m_UndoTokenIndex.
Referenced by ParserThread::CalcEnumExpression(), ParserThread::DoParse(), ParserThread::GetTemplateArgs(), ParserThread::HandleClass(), ParserThread::HandleEnum(), ParserThread::HandleFunction(), ParserThread::HandleNamespace(), ParserThread::HandleTypedef(), ParserThread::ReadClsNames(), and ParserThread::SkipAngleBraces().
|
private |
Buffer content, all the lexical analysis is operating on this member variable.
Definition at line 502 of file tokenizer.h.
Referenced by BaseInit(), InitFromBuffer(), IsEscapedChar(), Lex(), ReadFile(), ReadToEOL(), and ReplaceBufferText().
|
private |
Buffer length.
Definition at line 504 of file tokenizer.h.
Referenced by BaseInit(), CalcConditionExpression(), Init(), InitFromBuffer(), IsEscapedChar(), MoveToNextChar(), ReadFile(), and ReplaceBufferText().
|
private |
this serves as a macro replacement stack, in the above example, if AAA is replaced by BBBB, we store the macro definition of AAA in the m_ExpandedMacros, and if BBBB is also defined as
if 1 is parsed, and we get a next token '+', the CCC in the top is popped.
when we try to expand a macro usage, we can look up in the stack to see whether the macro is already used. C preprocessor don't allow recursively expand a same macro twice. since std::stack does not allow us to loop all its elements, we use std::list.
Definition at line 626 of file tokenizer.h.
Referenced by Lex(), ReplaceBufferText(), and ReplaceMacroUsage().
|
private |
preprocessor branch stack, if we meet a #if 1, then the value true will be pushed to to the stack, if we skip the #endif, the true value should be popped.
Definition at line 558 of file tokenizer.h.
Referenced by HandleConditionPreprocessor(), SetLastTokenIdx(), and SkipComment().
|
private |
File index, useful when parsing documentation;.
Definition at line 500 of file tokenizer.h.
Referenced by AddMacroDefinition(), Init(), InitFromBuffer(), SetLastTokenIdx(), and SkipComment().
|
private |
Filename of the buffer.
Definition at line 498 of file tokenizer.h.
Referenced by HandleUndefs(), Init(), InitFromBuffer(), ReadFile(), and Tokenizer().
|
private |
bool variable specifies whether the buffer is ready for parsing
Definition at line 550 of file tokenizer.h.
Referenced by BaseInit(), Init(), and InitFromBuffer().
|
private |
store the recent added token index for example, here is a comment
the token "aaa" is added to the token tree before reading the description. After that the token index is stored, and later if we read the "description of aaa", we will attach the document to the token
Definition at line 648 of file tokenizer.h.
Referenced by BaseInit(), SetLastTokenIdx(), and SkipComment().
|
private |
a lexeme string return by the Lex() function, this is a candidate token string, which may be replaced if it is a macro usage
Definition at line 509 of file tokenizer.h.
Referenced by CheckMacroUsageAndReplace(), DoGetToken(), GetMacroExpandedText(), GetPreprocessorType(), HandleDefines(), HandleUndefs(), IsMacroDefined(), Lex(), and SplitArguments().
|
private |
line offset in buffer, please note that it is 1 based, not 0 based
Definition at line 525 of file tokenizer.h.
Referenced by BaseInit(), CalcConditionExpression(), GetPreprocessorType(), GetToken(), HandleConditionPreprocessor(), HandleUndefs(), InitFromBuffer(), MoveToNextChar(), PeekToken(), ReadToEOL(), ReplaceBufferText(), SkipComment(), SkipToEOL(), SkipToInlineCommentEnd(), SkipToNextConditionPreprocessor(), and UngetToken().
|
private |
File loader, it load the content to the m_Buffer, either from the harddisk or memory.
Definition at line 554 of file tokenizer.h.
Referenced by Init(), and ReadFile().
|
private |
keep track of block nesting { }
Definition at line 527 of file tokenizer.h.
Referenced by BaseInit(), GetPreprocessorType(), GetToken(), HandleConditionPreprocessor(), Lex(), PeekToken(), ReplaceBufferText(), SplitArguments(), and UngetToken().
|
private |
normally, this record the doxygen style comments for the next token definition for example, here is a comment
Then, the "description of aaa" is stored in this variable when the token "aaa" is added to the TokenTree, it will associate the document and token
Definition at line 637 of file tokenizer.h.
Referenced by BaseInit(), SetLastTokenIdx(), and SkipComment().
|
private |
Peek token information.
Definition at line 535 of file tokenizer.h.
Referenced by GetToken(), PeekToken(), ReplaceBufferText(), and UngetToken().
|
private |
Definition at line 538 of file tokenizer.h.
Referenced by BaseInit(), GetToken(), PeekToken(), and UngetToken().
|
private |
Definition at line 539 of file tokenizer.h.
Referenced by BaseInit(), GetToken(), PeekToken(), and UngetToken().
|
private |
Definition at line 536 of file tokenizer.h.
Referenced by GetToken(), PeekToken(), and UngetToken().
|
private |
Definition at line 537 of file tokenizer.h.
Referenced by BaseInit(), GetToken(), PeekToken(), and UngetToken().
|
private |
indicates whether we are reading the macro definition This variable will affect how the doxygen comments will be associated to the Token.
Definition at line 654 of file tokenizer.h.
Referenced by ReadToEOL(), and SkipComment().
|
private |
Definition at line 546 of file tokenizer.h.
Referenced by BaseInit(), HandleConditionPreprocessor(), PeekToken(), and ReplaceBufferText().
|
private |
Definition at line 547 of file tokenizer.h.
Referenced by BaseInit(), HandleConditionPreprocessor(), PeekToken(), and ReplaceBufferText().
|
private |
Saved token info (for PeekToken()), m_TokenIndex will be moved forward or backward when either DoGetToken() or SkipUnwanted() is called, so we should save m_TokenIndex before it get modified.
Definition at line 545 of file tokenizer.h.
Referenced by BaseInit(), HandleConditionPreprocessor(), PeekToken(), and ReplaceBufferText().
|
private |
Tokeniser state specifies the token reading option.
Definition at line 552 of file tokenizer.h.
Referenced by CalcConditionExpression(), GetToken(), PeekToken(), and SplitArguments().
|
private |
These variables define the current token string and its auxiliary information, such as the token name, the line number of the token, the current brace nest level.
token name
Definition at line 514 of file tokenizer.h.
Referenced by GetToken(), and UngetToken().
|
private |
index offset in buffer, when parsing a buffer
m_TokenIndex always points to the next character of a valid token, in the above example, it points to the space character after "std".
Definition at line 523 of file tokenizer.h.
Referenced by BaseInit(), CalcConditionExpression(), GetPreprocessorType(), GetToken(), HandleConditionPreprocessor(), IsEscapedChar(), Lex(), MoveToNextChar(), PeekToken(), ReadToEOL(), ReplaceBufferText(), SkipPreprocessorBranch(), SkipToNextConditionPreprocessor(), and UngetToken().
|
private |
Tokenizer options specify the token reading option.
Definition at line 492 of file tokenizer.h.
Referenced by HandleConditionPreprocessor(), SkipComment(), and Tokenizer().
|
private |
the Token tree to store the macro definition, the token tree is shared with Parserthread
Definition at line 495 of file tokenizer.h.
Referenced by AddMacroDefinition(), CheckMacroUsageAndReplace(), HandleUndefs(), Init(), InitFromBuffer(), IsMacroDefined(), SetLastTokenIdx(), and SkipComment().
|
private |
Definition at line 531 of file tokenizer.h.
Referenced by BaseInit(), GetToken(), HandleConditionPreprocessor(), ReplaceBufferText(), and UngetToken().
|
private |
Definition at line 532 of file tokenizer.h.
Referenced by BaseInit(), GetToken(), HandleConditionPreprocessor(), ReplaceBufferText(), and UngetToken().
|
private |
Backup the previous Token information.
Definition at line 530 of file tokenizer.h.
Referenced by BaseInit(), GetToken(), HandleConditionPreprocessor(), ReplaceBufferText(), and UngetToken().