Patch #3494 2013-08-31 11:14

hovercraft

CodeCompletion: fixes UnnamedStruct in the classBrowser
Download
3494-CodeCompletion.patch (10.2 KB)
Category
Plugin::Refinement
Status
Open
Close date
 
Assigned to
ollydbg
Index: src/plugins/codecompletion/parser/tokentree.cpp
===================================================================
--- src/plugins/codecompletion/parser/tokentree.cpp    (revision 9443)
+++ src/plugins/codecompletion/parser/tokentree.cpp    (working copy)
@@ -115,7 +115,7 @@
     if (!newToken)
         return -1;
 
-    return AddToken(newToken, -1);
+    return AddToken(newToken, -1); // -1 means we add a new slot to the Token list (vector)
 }
 
 int TokenTree::insert(int loc, Token* newToken)
@@ -332,7 +332,33 @@
     return result.size();
 }
 
-int TokenTree::AddToken(Token* newToken, int fileIdx)
+void TokenTree::RenameToken(Token* token, const wxString& newName)
+{
+    if (!token)
+        return;
+    // remove the old token index from the TokenIdxSet mapped by old name.
+    int slotNo = m_Tree.GetItemNo(token->m_Name);
+    if (slotNo)
+    {
+        TokenIdxSet& curList = m_Tree.GetItemAtPos(slotNo);
+    // Note: As we have no way to actually delete keys in the TokenSearchTree,
+    // the previous name index path of the token will still exist, as well as its TokenIdxSet slot,
+    // but this slot will be empty and as result will lead to nothing.
+    // This is the same thing the RemoveToken procedure does.
+        curList.erase(token->m_Index);
+    };
+    token->m_Name = newName;
+
+    static TokenIdxSet tmpTokens = TokenIdxSet();
+
+    size_t tokenIdx = m_Tree.AddItem(newName, tmpTokens);
+    TokenIdxSet& curList = m_Tree.GetItemAtPos(tokenIdx);
+
+    // add the old token index to the TokenIdxSet mapped by new name, note Token index is not changed
+    curList.insert(token->m_Index);
+}
+
+int TokenTree::AddToken(Token* newToken, int forceIdx)
 {
     if (!newToken)
         return -1;
@@ -345,10 +371,10 @@
     size_t tokenIdx = m_Tree.AddItem(name, tmpTokens);
     TokenIdxSet& curList = m_Tree.GetItemAtPos(tokenIdx);
 
-    int newItem = AddTokenToList(newToken, fileIdx);
+    int newItem = AddTokenToList(newToken, forceIdx);
     curList.insert(newItem);
 
-    size_t fIdx = (fileIdx<0) ? newToken->m_FileIdx : (size_t)fileIdx;
+    size_t fIdx = newToken->m_FileIdx;
     m_FileMap[fIdx].insert(newItem);
 
     // Add Token (if applicable) to the namespaces indexes
@@ -464,7 +490,11 @@
 
     int result = -1;
 
-    if (forceidx >= 0) // Reading from cache?
+    // if the token index is specified, then just replace the specified slot to the newToken, this
+    // usually happens we construct the whole TokenTree from cache.
+    // other wise, we just append one to the vector or reused free slots stored in m_FreeTokens
+    // so, it is normal cases in any parsing stages.
+    if (forceidx >= 0)
     {
         if ((size_t)forceidx >= m_Tokens.size())
         {
@@ -474,7 +504,7 @@
         m_Tokens[forceidx] = newToken;
         result = forceidx;
     }
-    else // For real-time parsing
+    else
     {
         if (m_FreeTokens.size())
         {
Index: src/plugins/codecompletion/parser/parserthread.cpp
===================================================================
--- src/plugins/codecompletion/parser/parserthread.cpp    (revision 9443)
+++ src/plugins/codecompletion/parser/parserthread.cpp    (working copy)
@@ -139,6 +139,7 @@
     const wxString kw__C_          (_T("\"C\""));
     const wxString kw_for          (_T("for"));
     // length: 4
+    const wxString kw___at         (_T("__at"));
     const wxString kw_else         (_T("else"));
     const wxString kw_enum         (_T("enum"));
     const wxString kw_elif         (_T("elif"));
@@ -727,6 +728,10 @@
                 m_Str.Clear();
                 SkipToOneOfChars(ParserConsts::colon, true, true);
             }
+            else if (token == ParserConsts::kw___at)
+            {
+                m_Tokenizer.GetToken(); // skip arguments
+            }
             else
                 switchHandled = false;
             break;
@@ -1859,6 +1864,8 @@
             newToken->m_ImplLine = lineNr;
             newToken->m_ImplLineStart = m_Tokenizer.GetLineNumber();
 
+            newToken->m_IsUnnamed = true;
+
             DoParse(); // recursion
 
             m_LastParent     = lastParent;
@@ -2380,7 +2387,10 @@
         }
 
         if (!newEnum) // either named or first unnamed enum
+            {
             newEnum = DoAddToken(tkEnum, token, lineNr);
+            newEnum->m_IsUnnamed = true;
+            };
         level = m_Tokenizer.GetNestingLevel();
         m_Tokenizer.GetToken(); // skip {
     }
@@ -2762,6 +2772,10 @@
                   token.wx_str(), m_Str.wx_str(),
                   (m_LastParent ? m_LastParent->m_Name.wx_str() : _T("<no-parent>")));
 
+            // Detects anonymous ancestor and gives him a name based on the first found alias.
+            if (m_Str.StartsWith(g_UnnamedSymbol))
+                RefineUnnamedTypeToken(tkUndefined, token);
+
             Token* newToken = DoAddToken(tkVariable, token, m_Tokenizer.GetLineNumber());
             if (!newToken)
download for full patch...
hovercraft 2013-08-31 11:38

Current codecompletion plugin implementation expects structure definition to be in standard form: structure TAG_NAME {....} name;

However in many sources the TAG_NAME omitted, in which cases the class browser shows UnnamedStruct instead of actual structure name.

This patch proposes workaround for this issue, replacing UnnamedStruct with actual structure name prefixed with word 'struct'.

Also keyword __at added to better support SDCC compiler.

tpetrov 2013-10-01 20:36

@ollydbg: Can you please look at this see if it is good for inclusion.

ollydbg 2013-10-03 08:51

@hovercraft

Here is my test code:

struct

{

int x;

float y;

} abc;

I don't see your patch handle this. I can set a bp in the code between // hovercraft 2013-8-30 start and // hovercraft 2013-8-30 end, but they never hit.

What do you want to solve? I don't understand. Can you give more details? Thanks.

ollydbg 2013-10-03 08:57

BTW: please indent by 4 spaces (not tabs), I'm not sure whether berlios automatically does this change, thanks.

hovercraft 2013-10-03 13:45

Apparently it is working only with typedefs only now. Look at this:

typedef struct

{

int x;

float y;

} CBA;

struct

{

int x;

float y;

} abc;

CBA will be shown as srtuct_CBA and abc will be shown as __UnnamedStruct41 or something, whereas without the patch both cases would be shown as __UnnamedStruct.. :)

Also there should be some other place in the code for other cases (not typedefs) to fix similar way, but I personally don't know about it. Can you please point me?

ollydbg 2013-10-03 16:14

@hovercraft

OK, I see.

From this link: http://stackoverflow.com/questions/4082972/struct-and-typedef-in-c-versus-c?rq=1

typedef struct

{

int x;

float y;

} CBA;

CBA is an alias to an unnamed structure, right? In your patch, you give the unnamed structure a name "struct_CBA", is that necessary? In symbol browser tree, you click on "CBA", it will bring you to it's declaration, when you need code-completion, it will correctly show it's member.

ollydbg 2013-10-03 16:19

typedef struct

{

int x;

float y;

} CBA, ZZZ, *PPP;

struct

{

int x;

float y;

} abc;

int main()

{

ZZZ a;

CBA b;

PPP c;

return 0;

}

Now, what's the unnamed struct? "struct_CBA" or "struct_ZZZ" or "struct_*PPP"?

hovercraft 2013-10-03 17:33

You are writing "In your patch, you give the unnamed structure a name "struct_CBA", is that necessary?"

Answer is YES!

Why? Because without it current implementation of CC will always show __UnnamedStructXX instead of CBA In the symbol browser tree, so user cannot have a clue what this structure actually is until him double-clicked it and opened corresponding source location.

In fact, current implementation also gives the unnamed structure a fictitious name ( it is __UnnamedStructXX) and shows it in the browser tree, but this name is absolutely non informative.

In your last example when we place mouse over variable a, b, or c we will see tooltip "ZZZ main::a", "CBA main::b" or "PPP main::c" respectively and that`s ok.

Again, if we hover for example the "ZZZ" in the "ZZZ a;" declaration string, and patch is applied, we will see "typedef struct_CBA ZZZ" and this is also ok for me, because it gives some information about what this structure is. Remember "CBA" is a structure alias which usually given some informative meaning by the author.

Instead, without a patch, we will see "__UnnamedStruct11 ZZZ" in the tooltip there - and that's not ok.

hovercraft 2013-10-03 18:28

Strictly speaking, in your example of unnamed structure with several aliases, we have several opportunities how represent it to the user.

Firstly we can display all of them as sub-entities of anonymous structure. That is exactly as Eclipse CDT behave in this situation.

This is clearly the bets solution, but currently I don't see a way how to implement it in the CB.

In my opinion, the best we can do right now is to choose some of these aliases and display it in the symbol browser tree - hereof this patch.

At least this is much better than see in some projects tons of __UnnamedStruct and no more info.

ollydbg 2013-10-04 07:49

@hovercraft, thanks for the explanation. I understand the reason, I think this is a good improvement. But your patch does not works correctly when you are going to show the member of b, like:

int main()

{

ZZZ a;

CBA b;

PPP c;

b. //<----code completion does not work here once your patch applied.

return 0;

}

This is because you manually change the Token->m_Name, this is the search key value for the Patricia tree, so they are not allowed to be changed once it is created.

What I suggest is: delay struct_CBA Token creating, or delete the __UnnamedStruct1 Token, and create another one Token.

hovercraft 2013-10-04 09:56

Thank you for pointing this out. I will evaluate your suggestions and try to make corrections!

hovercraft 2013-10-09 06:19
ollydbg, please advice: what do you think about this modification:
            // hovercraft 2013-10-09 start
            Token* parent = m_LastParent;
            if (parent)
            {
                wxString sParrent = parent->m_Name;
                if (!sParrent.Cmp(tempAncestor))
                    parent = m_LastParent->GetTree()->at(m_LastParent->m_ParentIndex);
            };
            Token* newTokenPrep = TokenExists(tempAncestor, parent, tkTypedef | tkClass);
            if (newTokenPrep && tempAncestor.StartsWith(_T("__Unnamed")))
            {
                if (tempAncestor.StartsWith(_T("__UnnamedUnion")))
                    tempAncestor = _T("union_") + token;
                else if (tempAncestor.StartsWith(_T("__UnnamedStruct")))
                    tempAncestor = _T("struct_") + token;
                else
                    tempAncestor = _T("tag_") + token;
                newTokenPrep->m_Name = tempAncestor;
                m_TokenTree->insert(newTokenPrep->m_Index, newTokenPrep);
                m_Str = tempAncestor;
                ancestor = tempAncestor;
            };
            // hovercraft 2013-10-09 end

The idea behind this is to update the index node without affecting the token object itself. Unfortunately, current implementation of the index classes doesn't offer an update function. So now I simply add the new node with changed name using m_TokenTree->insert and leave the old one (with the name __Unnamed..) as is. This has a disadvantage of wasting the memory with unused index entries, but still could be useful for while, because the amount of memory wasted should be relatively small.
I could try to implement kind of update method in the TokenTree and it's base classes hierarchy, because I think it's good idea to have such a functionality there, but this may require a time, so I need to know your opinion before doing this.
ollydbg 2013-10-09 07:50

original tree: (-> means one to one, ->> means one to many)

name -> ItemIdx ->TokenIdxSet ->> TokenIdx -> Token* -> Token

What you did is, Keep Token* and Token, then

new_name -> newItemIdx -> newTokenIdxSet ->> newTokenIdx -> Token* -> Token

So, two roads that targets to the same Token.

Normally, I think that there are many empty TokensIdxSet in the Tree, this is because some times Tokens are removed from the tree, but our searchTree can only grows (you can not remove key string in the BasicSearchTree class), so some key string point to empty TokenIdxSet.

In you case, the things becomes more complicated. You know, you have occupy two TokenIdx for one Token, the former slot is actually dead. If you manually remove a Token, these slots can be reused (see the m_FreeTokens).

From my point of view, I think the simple way is just: remove the dead slot in TokenIdxSet, so this looks like we have already delete a Token, but infact, we have never delete and new a Token, we just change its name. Then the dead slot can go to m_FreeTokens for reuse.

The best way is as you said: the basicsearchtree can reduce it self, I event don't know how to do it, if you can do this, its better.

BTW: I have some local comments about TokenTree, which I can show here or commit to svn to let you understand those classes more clear. do you?

hovercraft 2013-10-10 08:27

About two roads leading to single target - yes, I did it intentionally, because I think it does exactly what we need with minimal effort. I agree, it's good idea to reuse the slot, if I properly understand what is the slot. I'll try and put results here.

And yes, please commit your local comments to svn, or you can mail me at list.ru with my nickname.

ollydbg 2013-10-10 14:06

I commit the comments in revision 9394, the TokenTree related code was written by rick22 around year 2006, so I'm not fully review all the code, in-fact I'm not fully understand the code.

There is also a wiki page you can take a reference: http://wiki.codeblocks.org/index.php?title=Code_Completion_Design.

About the slot, I mean an element in a vector, it is referenced by the an index value.

hovercraft 2013-11-05 03:25
Hello!

Ollydbg, after further studying the sources I now believe the problem with dead slots is seriously far-fetched. Only reducing BasicSearchTree itself makes sense. Without it we cannot achieve any sensible improvement  over my last patch variant, so I'd recommend to apply it as is for a while, because altering the BasicSearchTree may be a long enough story and it goes beyond the scope of this particular patch.
1) The m_FreeTokens doesn't denote the TokenIdxSet, but rather directly specify the position of our token pointer under which it is stored in the m_Tokens. We don't have a dead slot in the m_Tokens, because it still points to our token and we does reuse it, and as such, there is nothing to put into the free token pointers list.
2) We can empty the TokenIdxSet object corresponding to the previous token name ("__Unnamed.."), or even reuse it, but this will result in little if any memory saving. Most of the wasted memory will still reside in the BasicSearchTree, and the same true for the current implementation of RemoveToken procedure. For example we can use the following method:
void TokenTree::rename(Token* token, const wxString& newName)
{
    if (!token)
        return;
    int slotNo = m_Tree.GetItemNo(token->m_Name);
    if (slotNo)
    {
        TokenIdxSet& curList = m_Tree.GetItemAtPos(slotNo);
        // The previous name index path of the token will still exist, as well as its TokenIdxSet slot,
        // but this slot will be empty and as result will lead to nothing.
        // This is the same sing the RemoveToken procedure dues.
        curList.erase(token->m_Index);
        // Actually we can save some additional memory by reusing this slot,
        // this can be easily implemented on the BasicSearchTree level,
        // but is it worth the efforts? Most of the memory will still be wasted by the BasicSearchTree itself.
    };
    token->m_Name = newName;
    AddToken(token, token->m_Index);

}
ollydbg 2013-11-06 06:01
Hi, hovercraft, I agree with you.
What I want is to let the basicsearchtree have some "delete" functions.
I have take some time to read/debug the tree related code, I see that there are many things coupled in the basic search tree.

First, tree node is reference by index, I think this is OK.
Second, all the node have an item-map, which is a map of (depth->itemno). For example, we have a tree:
 *   - "" (0)
 *         \- "p" (4)
 *                 +- "hysi" (2)
 *                 |          +- "cs" (1)
 *                 |          \- "ology" (3)
 *                 \- "sychic" (5)
Now, the node "hysi" (2) can have up to four elements, such as:
depth -> (actually string) -> itemno
5        ->  "ph"                  ->  3
6        ->  "phy"                ->  5
7        ->  "phys"               ->  9
8        ->  "physi"               ->7

Here, the itemno is type of "size_t", and itemno is in-fact an index of the m_Points in BasicSearchTree class.
Also, itemno is also the index of the general type T in std::vector<T> m_Items in class template <class T> class SearchTree.

In CC, the T is typedef std::set< int,    std::less<int>    > TokenIdxSet;

Here, I think both BasicSearchTree::m_Points and SearchTree::m_Items are useless, if we directly hold the information in SearchTreeNode::m_Items.

This means in the SearchTreeNode::m_Items (item map of each node), we can have:

depth -> (actually string) -> T(TokenIdxSet)
5        ->  "ph"                  ->  TokenIdxSet of all the Token which has the name "ph"
6        ->  "phy"                ->  TokenIdxSet of all the Token which has the name "phy"
7        ->  "phys"              ->  TokenIdxSet of all the Token which has the name "phys"
8        ->  "physi"             ->  TokenIdxSet of all the Token which has the name "physi"

Now, in this way, remove/delete an key(string) in the basicsearchtree is much easier.
We just remove some tree nodes, and as Nodes are stored in typedef std::vector<SearchTreeNode*> SearchTreeNodesArray, we may have some slot left empty in the vector, but that's not a problem, because there empty slot (node index) can be reused later.

When you want to remove a Token, you just find the T(TokenIdxSet), and remove the index of the Token in the TokenIdxSet, if the set becomes empty (e.g. index set for string "ph" becomes empty).  then there is no such Token named "ph", so there is no need to keep this element in the SearchTreeNode::m_Items, we can directly remove this element in the map.

If the SearchTreeNode::m_Items becomes empty, we know that this node can be deleted.

I'm not sure my analysis is good, comments are welcome.


ollydbg 2013-11-06 06:04
Some correction for the previous comment:

Now, the node "hysi" (2) can have up to four elements of the item-map, such as:
depth -> (actually string) -> itemno
2        ->  "ph"                  ->  3
3        ->  "phy"                ->  5
4        ->  "phys"               ->  9
5        ->  "physi"              ->  7

The depth is the string from "root" to the current key string.

ollydbg 2013-11-06 08:24

There is one reason that delete the key in the basicsearchtree are not a good idea when parsing.

E.g. The piece of code was parsed again and again, this is the case the CC need to parse the buffer to retrieve local information, delete and add the string key in the basicsearchtree can surely have performance issue. I will put more comments for the tree related code in the C::B trunk.

hovercraft 2013-11-07 06:28

Yes, I agree, deletion of the key may affect performance, if we do it often. This surely is a subject to the usual performance/memory trade-off. May be we should not do anything in this respect at all, if the memory constraints not so important.

What I did in this patch - I found a way to add aliases for some token names if we need so. In principle, anonymous tokens can benefit from it, would it be possible to display several aliases of a token in the browser tree in the future. I.e. we can refer the anonymous object both as __UnnamedStruct and as struct_blabla for example.

ollydbg 2013-11-07 14:51

Ok, two question remains:

1, void TokenTree::rename(Token* token, const wxString& newName) is needed or not?

2, __at keyword improvement need some comments, in-fact, I have never see a code which use __at keyword

hovercraft 2013-11-07 16:40

1) It depends on what we prefer. The TokenTree::rename method terminates an "__Unnamed.." index path, makes it point to nothing. Whereas variant as of 2013-10-09 leave the "__Unnamed.." index path operational (implicitly produces an alias). And finally I just made an explicit void TokenTree::alias(Token* token, const wxString& newName) and corresponding size_t BasicSearchTree::alias(const wxString& s, size_t itemno) methods. I'd prefer the explicit method, despite it introduces more changes than others.

There was one question in the meantime: if the CC need to parse a piece of code over and over, would it be possible that it will introduce the new __UnnamedXX (where XX is a number) alias with each pass? If this is the case, it's bad. I didn't tested this yet.

hovercraft 2013-11-07 16:46

2) An "__at" keyword is for SDCC compiler (and possibly for other small and embedded systems). Look at this typical excerpt from the Microchip MCU (PIC18F..) related source:

extern __sfr __at (0xfb2) TMR3L;

extern __sfr __at (0xfb3) TMR3H;

extern __sfr __at (0xfba) CCP2CON;

typedef union {

struct {

unsigned CCP2M0:1;

unsigned CCP2M1:1;

unsigned CCP2M2:1;

unsigned CCP2M3:1;

unsigned DCCP2Y:1;

unsigned DCCP2X:1;

unsigned :1;

unsigned :1;

};

} __CCP2CONbits_t;

extern volatile __CCP2CONbits_t __at (0xfba) CCP2CONbits;

hovercraft 2013-11-07 17:14
My "alias" methods looks as follows:
size_t BasicSearchTree::alias(const wxString& s, size_t itemno)
{
    SearchTreePoint resultpos = AddNode(s, 0); //the second argument 0 means the root node
    // add a pair (resultpos.depth -> itemno) to the result node
    size_t result = m_Nodes[resultpos.n]->AddItemNo(resultpos.depth, itemno);
    m_Points[result] = resultpos;
    return result;
}

void TokenTree::alias(Token* token, const wxString& newName)
{
    if (!token)
        return;
    size_t slotNo = m_Tree.GetItemNo(token->m_Name);
    wxString oldName = token->m_Name;
    if (slotNo)
    {
        token->m_Name = newName;
        // Actually slotNo should not change there:
        slotNo = m_Tree.alias(token->m_Name, slotNo);
        TokenIdxSet& curList = m_Tree.GetItemAtPos(slotNo);
        curList.insert(token->m_Index);
        wxString log(F(_T("TokenTree::alias() : Added alias '%s' for '%s'."), newName.c_str(), oldName.c_str()));
        CCLogger::Get()->Log(log);
        CCLogger::Get()->DebugLog(log);
    };
}
ollydbg 2013-11-08 01:11

Quote: There was one question in the meantime: if the CC need to parse a piece of code over and over, would it be possible that it will introduce the new __UnnamedXX (where XX is a number) alias with each pass? If this is the case, it's bad. I didn't tested this yet. Quote End.

Oh, I can confirm this is a bug.

If you have such code:

struct

{

int x;

float y;

} abc;

int main (int argc, char** argv)

{

}

Once you edit this code, and press Save button, the parser runs again, and look at the symbol tree, the __UnnamedStructXXX will increase by one. I also see the dumped basicsearchtree, and it is growing.

This is a serious bug, and should be fixed. I file a bug report here: http://developer.berlios.de/bugs/?func=detailbug&bug_id=19186&group_id=5358

hovercraft 2013-11-10 06:29

I currently working on the following function prototype to be used in place of simple unnamed token increment:

/**

* Until index key removal been implemented,

* uses index scan within given prefix to find first (previously generated)

* token name which can be reused (e.g. if token deleted but its name still exists).

* If that fails, generates a new unique (i.e. which not yet present in this tree) key string

* to be used as token name or alias

* based on given prefix and an integer sequence_val,

* incrementing the sequence_val by one.

*/

wxString TokenTree::GenerateUniqueName(const wxString& prefix, size_t &sequence_val);

It should be fast enough, but to further improve its speed making the "_Unnamed" strings to be file-related may be helpful, i.e. _UnnamedStruct_FF_XXX,

where FF is the file index where anonymous token resides and XXX usual incremental number. This will reduce index scan to single file subtree.

ollydbg 2013-11-10 06:46

FYI:

Rev9443 and Rev9438 should fix the unnecessary increased token name problem.

hovercraft 2013-11-10 12:19

Ollydbg, thank you for info, I overlooked this.

I finally decided to go with variant as of 2013-Oct-09 15:19 - it involves less patching :). Should I upload revised patch?

And are you still interested in the index key removal functionality?

ollydbg 2013-11-10 15:49

QUOTE: I finally decided to go with variant as of 2013-Oct-09 15:19 - it involves less patching :). Should I upload revised patch?

I personally like the way "void TokenTree::rename(Token* token, const wxString& newName)" better. See the reasons below:

Obf(another dev) has the similar option, see: http://forums.codeblocks.org/index.php/topic,18542.msg126892.html#msg126892, so we expect to remove the ugly __UnnamedStructXXXXXXX names, although it looks like it's hard to remove the ugly name in the basicsearchtree.

I think people will get confused when they see two steps (in your patch of 2013-Oct-09)

1, change a Token's name.

2, re-insert the Token to the tree. (this in-fact cause two roads go to the same Token)

I think it is much clean/better that we have a explicit function named "RenameToken()", that is we break the old road as the name really changes, and construct the new road.

What's your opinion? If it is OK, would you mind to file the new patch? Thanks for the contribution.

hovercraft 2013-11-11 05:38

Glad to make something useful (hope). Thank you for patience. :) And decision to get rid off the __Unnamed objects is indeed good news : at least for me they were the only one really annoying thing in the C::B :).

It is possible to do something similar in the ReadVarNames() method for non-typedef structures (and I actually did it in my copy). I believe you will find a better way to do it soon. Nevertheless I can append it to this patch if you wish. What do you think?

hovercraft 2013-11-11 06:25

Patch updated (with non-typedef struct fix for your reference).

ollydbg 2013-11-13 00:31
Thanks for the patch.

In the function:
void TokenTree::RenameToken(Token* token, const wxString& newName)
{
    if (!token)
        return;
    int slotNo = m_Tree.GetItemNo(token->m_Name);
    if (slotNo)
    {
        TokenIdxSet& curList = m_Tree.GetItemAtPos(slotNo);
    // Note: As we have no way to actually delete keys in the TokenSearchTree,
    // the previous name index path of the token will still exist, as well as its TokenIdxSet slot,
    // but this slot will be empty and as result will lead to nothing.
    // This is the same thing the RemoveToken procedure does.
        curList.erase(token->m_Index);
    };
    token->m_Name = newName;
    AddToken(token, token->m_Index);
}

I think it should be: AddToken(token, token->m_FileIdx);
I still need time to test this patch.
ollydbg 2013-11-13 00:46
I found there may be a bug in the function:
int TokenTree::AddToken(Token* newToken, int fileIdx)
{
    if (!newToken)
        return -1;

    const wxString & name = newToken->m_Name;

    static TokenIdxSet tmpTokens = TokenIdxSet();

    // Insert the token's name and the token in the (inserted?) list
    size_t tokenIdx = m_Tree.AddItem(name, tmpTokens);
    TokenIdxSet& curList = m_Tree.GetItemAtPos(tokenIdx);

    int newItem = AddTokenToList(newToken, fileIdx);
    curList.insert(newItem);

    size_t fIdx = (fileIdx<0) ? newToken->m_FileIdx : (size_t)fileIdx;
    m_FileMap[fIdx].insert(newItem);

    // Add Token (if applicable) to the namespaces indexes
    if (newToken->m_ParentIndex < 0)
    {
        newToken->m_ParentIndex = -1;
        m_GlobalNameSpaces.insert(newItem);
        if (newToken->m_TokenKind == tkNamespace)
            m_TopNameSpaces.insert(newItem);
    }

    // All done!
    return newItem;
}

The line:
int newItem = AddTokenToList(newToken, fileIdx);
I think it should be:
int newItem = AddTokenToList(newToken, tokenIdx);

Since the function  AddTokenToList is just add a Token pointer to a std::vector<Token*>, I'm not sure why here the fileIdx is used????


ollydbg 2013-11-13 02:48
Go back to 2006-01-11, this functions looks like below:
int TokensTree::AddToken(Token* newToken,int forceidx)
{
    if (!newToken)
        return -1;

    const wxString & name = newToken->m_Name;

    static TokenIdxSet tmp_tokens = TokenIdxSet();
    // tmp_tokens.clear();

    // Insert the token's name and the token in the (inserted?) list
    size_t idx2 = m_Tree.AddItem(name,tmp_tokens,false);
    TokenIdxSet& curlist = m_Tree.GetItemAtPos(idx2);

    int newitem = AddTokenToList(newToken,forceidx);
    curlist.insert(newitem);
    m_FilesMap[newToken->m_FileIdx].insert(newitem);

    // Add Token (if applicable) to the namespaces indexes
    if (newToken->m_ParentIndex < 0)
    {
        newToken->m_ParentIndex = -1;
        m_GlobalNameSpace.insert(newitem);
        if (newToken->m_TokenKind == tkNamespace)
            m_TopNameSpaces.insert(newitem);
    }

    // All done!
    return newitem;
}
Look, the forceidx should be a TokenIndex, not file index, right?
ollydbg 2013-11-13 05:41

I open a discussion here: http://forums.codeblocks.org/index.php/topic,18565.msg127086/topicseen.html#msg127086

I believe this is another bug.

hovercraft 2013-11-13 06:54

Yes, I also noticed that second parameter in AddToken should be indeed the Toke::m_Index, but may be somewhere mismatched with file index. I had in mind to check it more closely, but you outran me :)

ollydbg 2013-11-13 07:12

I think generally the function in your patch: RenameToken(Token* token, const wxString& newName) need to function:

AddToken(token, token->m_Index);

Here, the first argument is the Token pointer

the second argument is the forced index, which means we want to re-used the slot in the std::vector<Token*>. In-fact in most cases AddToken function was called like AddToken(token, -1), this means the token will take one new slot in the std::vector<Token*> (or an empty slot).

The more simple way is we just add the token->m_Index to a TokenIdxSet which is mapped by the newName.

I will post a patch on the C::B forum follow the discussion later.

ollydbg 2013-11-13 07:36

I think only the first half part of the AddToken() function is need. which is:

if (!newToken)

return -1;

const wxString & name = newToken->m_Name;

static TokenIdxSet tmpTokens = TokenIdxSet();

// Insert the token's name and the token in the (inserted?) list

size_t tokenIdx = m_Tree.AddItem(name, tmpTokens);

TokenIdxSet& curList = m_Tree.GetItemAtPos(tokenIdx);

int newItem = AddTokenToList(newToken, forceIdx); // No need here.

curList.insert(newItem);

We just get the curList which is the TokenIdxSet of the the new String.

So, we just curList.insert(token->m_Index);

ollydbg 2013-11-13 07:42
The changed function look like below:

void TokenTree::RenameToken(Token* token, const wxString& newName)
{
    if (!token)
        return;
    // remove the old token index from the TokenIdxSet mapped by old name.
    int slotNo = m_Tree.GetItemNo(token->m_Name);
    if (slotNo)
    {
        TokenIdxSet& curList = m_Tree.GetItemAtPos(slotNo);
    // Note: As we have no way to actually delete keys in the TokenSearchTree,
    // the previous name index path of the token will still exist, as well as its TokenIdxSet slot,
    // but this slot will be empty and as result will lead to nothing.
    // This is the same thing the RemoveToken procedure does.
        curList.erase(token->m_Index);
    };
    token->m_Name = newName;
    if (!newToken)
        return -1;

    static TokenIdxSet tmpTokens = TokenIdxSet();

    size_t tokenIdx = m_Tree.AddItem(newName, tmpTokens);
    TokenIdxSet& curList = m_Tree.GetItemAtPos(tokenIdx);

    // add the old token index to the TokenIdxSet mapped by new name, note Token index is not changed
    curList.insert(token->m_Index);
}
ollydbg 2013-11-13 07:51
ollydbg 2013-11-13 14:51

Can you try the updated patch in http://forums.codeblocks.org/index.php/topic,18565.msg127089.html#msg127089

I just did some code refactoring (create a function to do the token name change in ParserThread class)

When try to test the code below:

typedef struct

{

int x;

float y;

} CBA, ZZZ, *PPP;

struct

{

int x;

float y;

} abc;

int main()

{

ZZZ a;

CBA b;

PPP c;

abc.;

return 0;

}

I see two x and two y in suggestion list when I hit the dot after "abc", I don't know why. (quite busy now, so if you have some time, you can help me to find the reason, thanks. )

hovercraft 2013-11-14 07:33

Strange enough - I see only one x and one y with this test case. My sources are 9447 checkout from svn with my only this my patch applied.

Sorry, I cannot find your updated patch at http://forums.codeblocks.org/index.php/topic,18565.msg127089.html#msg127089. The last post just says:

"Ok, a patch to fix this issue (include the <REFERENCE TO THIS SAME Patch #3494>) EDIT: update to version 2 of the patch".

Where the "version 2" can be found ?

By a way, I registered on that forum yesterday (hovercraft) but not received confirmation yet - "Your account is still awaiting admin approval".

ollydbg 2013-11-14 14:05

my patch was in the attachment of the post, I guess only a approved C::B forum user can see it, never mind, I will send you by private email right now.

hovercraft 2013-11-14 14:16

Thank you i got it.

ollydbg 2013-11-14 14:54

I found the reason.

I have two cpp file in one project. Both cpp file contain such code:

struct

{

int x;

float y;

} abc;

So, originally, the token name is: __UnnamedStruct6_0 and __UnnamedStruct7_0, note 6 and 7 are file index which means they are in different source files.

But when they get renamed, we get the same token name: struct_abc. So, finally, I get abc. which show two x and two y in suggesting list.

Because abc's type is: struct_abc, but we have two tokens for both cpp files.

I don't think this is a big problem (it looks like gcc can compile such code)

Maybe, we can change the token name to "struct6_abc" and "struct7_abc" to solve this problem?

hovercraft 2013-11-15 15:12

Have nothing against your last variant, I just added your suggestion about the file number inside the token name and one comment.

ollydbg 2013-11-17 03:24
Quote:  I just added your suggestion about the file number inside the token name and one comment.

So, you add the only one line comment here? see below:

//This block almost certainly is a needless extra precaution, but i still not sure.


The code snippet I guess was originally extract from the common code in either ReadClsNames() and ReadVarNames() functions.
+    Token* parent = m_LastParent;
+    if (parent)
+    {
+        //This block almost certainly is a needless extra precaution, but i still not sure.
+        wxString sParrent = parent->m_Name;
+        if (!sParrent.Cmp(m_Str))
+            parent = m_LastParent->GetTree()->at(m_LastParent->m_ParentIndex);
+    };

I will look into it to see what does these code do.


ollydbg 2013-11-17 06:10
Question: What does the code means?

+            if (m_Str.StartsWith(g_UnnamedSymbol))
+            {
+                Token* parent = m_LastParent;
+                if (parent)
+                {
+                    wxString sParrent = parent->m_Name;
+                    if (!sParrent.Cmp(m_Str))
+                        parent = m_LastParent->GetTree()->at(m_LastParent->m_ParentIndex);
+                };
+                Token* unnamedAncestor = TokenExists(m_Str, parent, tkUndefined); //tkTypedef | tkClass);

Why the parent's token name is needed to compare with the "m_Str"? If they are different, you get the parent's parent.

I see these code exist in your version one of the patch.

hovercraft 2013-11-17 08:57

Quote: If they are different, you get the parent's parent.

No, If they are SAME, I get the parent's parent.

Well, in the copy of source where I originally created this patch, just passing m_LastParent to the TokenExists method surprisingly could lead to incorrect result: no token found. That's why this block was inserted. In the current sources I cannot reproduce this situation. May be the original source copy has been broken or just too outdated.

Now I think we can pass just a nullptr instead of m_LastParent to the TokenExists. Mostly It was an extra precautions due to my insufficient knowledge of the code.

Patch updated.

ollydbg 2013-11-17 14:54

I think it should be:

Token* unnamedAncestor = TokenExists(m_Str, m_LastParent, typeMask);

I post the new patch on C::B forum, the same thread as I mentioned before.

Here is the test code(works fine with my new patch, but may failed if you use nullptr in TokenExists function:

typedef struct

{

int x;

float y;

} CBA, ZZZ, *PPP;

struct

{

int x;

float y;

} abc;

class ABCD

{public:

struct

{

int x;

float y;

} abcd;

};

int main()

{

ZZZ a;

CBA b;

PPP c;

//abc.;

ABCD e;

e.

return 0;

}

Please note that an anonymous type in a class definition of ABCD.

Please help to test it and give some feedback, I hope this is the final patch before I commit to the trunk, thanks.

hovercraft 2013-11-17 16:31

I just thought anonymous object inside classes (and other nested cases) may be subject to further investigation and would be better to leave them as is for a while.

For example if we add a global

struct

{

int x;

float y;

} abcd;

somewhere in your test case, we will see duplicated CC suggestions after typing "ABCD.abcd." as you noticed before.

Also after typing "ABCD." we will see the "struct1_abcd" item in the list, which is not as good. That's why i decided substitute nullptr.

ollydbg 2013-11-18 00:23

QUOTE: I just thought anonymous object inside classes (and other nested cases) may be subject to further investigation and would be better to leave them as is for a while.

Well, I think it anonymous class inside class can be solve by such method:

Suppose we have the following code snippet in either class and global space

struct

{

int x;

float y;

} abcd;

Normally, we will get two type tokens: _UnnamedStruct2_3 (in class) and _UnnamedStruct2_3 (in global namespace)

Now, with the rename method, we will get:

_UnnamedStruct2_3_abcd (in class) and _UnnamedStruct2_3_abcd (in global namespace)

OR

struct2_3_abcd (in class) and struct2_3_abcd (in global namespace)

We put the alias after the the token index.

What do you think about this method?

hovercraft 2013-11-18 02:29

Ok - let's do this way.

And one more point: would it be the right idea to introduce into the Token class some flag or something to mark anonymous tokens to be able distinguish them reliably? For example in the future different gui parts (e.g. plugins) can rely on this flag if they want to change the way such objects been represented to the end user. Maybe a simple public boolean variable Token::m_IsUnnamed?

ollydbg 2013-11-18 07:23

Hi, hovercraft, thanks. Yes, it is possible to add another bool variable in the Token.

But currently there are two many bool variables in the Token class, this make Token class a lot bigger, I would think change them to bit field, something like:

class Token

{

unsigned int isTemplate : 1;

unsigned int isConst : 1;

unsigned int isUnnamed: 1;

......

};

But not easy, because there are too many places which use those bools.

I have some new idea when I read your comments, is it possible to leave the Token name unchanged (_UnnamedStructXX_YYY), and later when we want to show them, just change the string in the GUI by checking the Token's name(I mean: avoiding adding the isUnnamed bit field or bool variable, just check the name)?

hovercraft 2013-11-18 09:18

This is possible of course. But having a boolean or bit field much better. This way we can avoid string compare operation every time we need to know object kind, which in turn usually should refer the g_UnnamedSymbol constant..

We can add a boolean as temporary measure and make it hidden (protected or private), but add the public interface function to read its value as boolean.

Later we can redesign the code so internally these values will be represented as bit fields without changing class interface! We can try do this with other boolean members I believe - maybe even without changing the consuming code.

I just afraid, if we will switch to redesign gui right now, wouldn't it leave us for a long period yet with what we have?

ollydbg 2013-11-18 15:09
I agree with you.
Quote: I just afraid, if we will switch to redesign gui right now, wouldn't it leave us for a long period yet with what we have?

Sorry, I don't understand this sentence, can you explain?

I just look at our Token class definition:

class Token
{
friend class TokenTree;
public:
    Token(const wxString& name, unsigned int file, unsigned int line, size_t ticket);
    ~Token();

    bool AddChild(int childIdx);
    bool DeleteAllChildren();
    bool HasChildren() const { return !m_Children.empty(); }
    wxString GetNamespace() const;
    bool InheritsFrom(int idx) const;
    wxString DisplayName() const;
    wxString GetTokenKindString() const;
    wxString GetTokenScopeString() const;
    wxString GetFilename() const;
    wxString GetImplFilename() const;
    wxString GetFormattedArgs() const; // remove all '\n'
    wxString GetStrippedArgs() const; // remove all default value
    size_t GetTicket() const { return m_Ticket; }
    bool MatchesFiles(const TokenFileSet& files);

    TokenTree* GetTree() const { return m_TokenTree; }
    bool IsValidAncestor(const wxString& ancestor);

    wxString                     m_FullType; // this is the full return value (if any): e.g. const wxString&
    wxString                     m_BaseType; // this is what the parser believes is the actual return value: e.g. wxString
    wxString                     m_Name;     // Token's name, it can be searched in the TokenTree
    wxString                     m_Args;     // If it is a function Token, then this value is function arguments
                                             // e.g.   (int arg1 = 10, float arg2 = 9.0)
                                             // If it is an enumerator, then this is the assigned (inferred) value
    wxString                     m_BaseArgs; // stripped arguments e.g. (int arg1, float arg2)

    wxString                     m_AncestorsString; // all ancestors comma-separated list

    unsigned int                 m_FileIdx;       // File index in TokenTree
    unsigned int                 m_Line;          // Line index where the token was met, which is 0/1 based
    unsigned int                 m_ImplFileIdx;   // function implementation file index
    unsigned int                 m_ImplLine;      // function implementation line index
    unsigned int                 m_ImplLineStart; // if token is impl, opening brace line
    unsigned int                 m_ImplLineEnd;   // if token is impl, closing brace line

    TokenScope                   m_Scope;         // public? private? protected?
    TokenKind                    m_TokenKind;     // See TokenKind class
    bool                         m_IsOperator;    // is operator overload function?
    bool                         m_IsLocal;       // found in a local source file, otherwise in wxString
    bool                         m_IsTemp;        // local (automatic) variable
    bool                         m_IsConst;       // the member method is const (yes/no)

    int                          m_Index;         // current Token index in the tree
    int                          m_ParentIndex;   // Parent Token index
    TokenIdxSet                  m_Children;      // if it is a class kind token, then it contains all the member tokens
    TokenIdxSet                  m_Ancestors;     // all the ancestors in the inheritance hierarchy
    TokenIdxSet                  m_DirectAncestors; //the neatest ancestors
    TokenIdxSet                  m_Descendants;   // all the descendants in the inheritance hierarchy

    wxArrayString                m_Aliases;       // used for namespace aliases

    wxString                     m_TemplateArgument;
    wxArrayString                m_TemplateType;    //for a class template, this is the formal template argument list
                                                    //but for a variable Token, this is the actual template arguments.
    std::map<wxString, wxString> m_TemplateMap;

    wxString                     m_TemplateAlias;   // alias for templates, e.g. template T1 T2;
    void*                        m_UserData;        // custom user-data (the classbrowser expects it to be a pointer to a cbProject)

protected:
    TokenTree*                   m_TokenTree;       // a pointer to TokenTree
    size_t                       m_Ticket;          // This is used in classbrowser to avoid duplication
};

You see, there are already a lot of bool variables. Also, there are some variables which only used for special kind of Token type. For example, m_TemplateMap, which is only for a class template specification, m_IsOperator is only used for functions.

So, maybe, a better way is: create a TokenBase class, which implement all the basic interface, then any special Token kind like functions, classes, structs...... should all be derived from the TokenBase, and have their own member variables, we can still hold all the pointers in the vector like std::vector<TokenBase*>.

But I think it is not easy/soon to implement this.

Now, lets return to our topic, if you add a bool variable m_IsUnnamed, what should we do?

Change the code:
        if (m_Str.Contains(_T("Union")))
            m_Str = _T("union");
        else if (m_Str.Contains(_T("Struct")))
            m_Str = _T("struct");
        else
            m_Str = _T("tag");

to something like:
Token* typeToken= TokenExists(m_Str, m_LastParent, typeMask);
typeToken->IsUnnamed == true;
m_Str =...

???

Any other changes???





hovercraft 2013-11-22 17:15

Hello, sorry for the late reply - was very busy at my work..

Quote: Sorry, I don't understand this sentence, can you explain?

Never mind. I just thought it is good idea to finish this patch first. And then we can start re-design the user interface and related redesign of the Token class.

Adding the m_IsUnnamed member doesn't influence the way how we would rename anonymous tokens but gives us possibility to distinguish them easily afterwards.

Patch updated.

hovercraft 2013-11-23 08:51

One more time..

ollydbg 2013-11-26 07:28

I committed with a slightly modified patch in r9466. Thanks for your contribution and happy discussion with you.

I just test that a bool variable always take one byte (release build or debug build), but a bit-field variable only takes one bit, so maybe, we need to start the refactoring now.

hovercraft 2013-11-26 13:14

Ok, are you going to start a new branch for refactoring? Or how to arrange this work properly? Anyway I'm going to participate to the extent of my possibilities.

ollydbg 2013-11-26 14:37

Hi, hovercraft, thanks.

In-fact, I'm not sure what is the way to do such refactoring. When I was not a C::B developer, I remember that I sent several patches to Morten(C::B developer), and he maintained a SVN branch, and applied my patches on the branch. Currently I don't know how to maintain a SVN branch(my knowledge of SVN branch/merge is quite limited), maybe another approach is we can work on a git clone, OBF(C::B developer) has a git clone of C::B source on github, using git for branch works much simpler.

When I looked at the Token class, I see there are about five or six bool variables, so if we convert them to bit field, we can save about five or four bytes, this looks not much appealing. Since some member variables are only for specific kind of Tokens, such as: m_TemplateMap. I still think we can create a base class like TokenBase, and define the common interface(by virtual functions), and create many kind of specific Tokens by inheritance, what do you think about this method? Maybe, we can discuss this on C::B forum.

hovercraft 2013-11-27 04:41

I agree, subclassing is a common approach, and discuss this in the forum is also the right idea.

ollydbg 2013-11-27 14:19