Richard Laughlin

Rants about Software, Compilers, C++ and Rust.

Understanding LArrow Using Clang

In my previous post, I discussed how to build a clang tool which allows for inspection of the Clang AST. We then built up to a simple tool which dumps the AST as text to STDOUT. Even this simple tool is useful for understanding C++ which is otherwise complicated (or at least surprising). To demonstrate this, let's take a deep dive into Étienne Laurin's excellent post "C++ left arrow operator".

This blog post provides some C++ which seemingly uses a new operator, left arrow (<-) which is the inverse of the usual right arrow (->).

struct C {
    void f() { std::cout << "foo\n"; }    
};
 
int main() {
    C x;
    (&C::f)<-x;
}

The code declares an operator<, and an operator- which is the first hint as to how this works. To get an even clearer picture, let's look at the clang AST dump we get from our ast_dump plugin.

Clang to the Rescue

To make the output more readable, we're going to modify the original code slightly. Instead of importing iostream for std::cout, we'll just have the function add two numbers. Since we won't be using the output anyway, this makes the AST we get much smaller.

Bonus exercise: try this on the original and see how much additional AST gets generated!

So instead of:

struct C {
    void f() { std::cout << "foo\n"; }    
};

We'll have:

struct C {
    void f() { 3+5; }    
};

AST Dump Walkthrough

The full output is available in this gist.

I encourage readers to try to understand each block on their own before reading my walkthrough. To help, I'll provide the doxygen links for all of the AST nodes which are used in each block.

Boilerplate

First, we find the TranslationUnitDecl, along with some built-in typedefs which get emitted into every AST by clang. This can be ignored since we don't use any of it.


BuiltinType ConstantArrayType PointerType RecordType TranslationUnitDecl TypedefDecl

TranslationUnitDecl 0x19ed5b8 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x19ede20 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x19edb80 '__int128'
|-TypedefDecl 0x19ede90 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x19edba0 'unsigned __int128'
|-TypedefDecl 0x19ee208 <<invalid sloc>> <invalid sloc> implicit __NSConstantString '__NSConstantString_tag'
| `-RecordType 0x19edf80 '__NSConstantString_tag'
|   `-CXXRecord 0x19edee8 '__NSConstantString_tag'
|-TypedefDecl 0x19ee2a0 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x19ee260 'char *'
|   `-BuiltinType 0x19ed660 'char'
|-TypedefDecl 0x1a37d28 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list '__va_list_tag[1]'
| `-ConstantArrayType 0x1a37cd0 '__va_list_tag[1]' 1 
|   `-RecordType 0x19ee390 '__va_list_tag'
|     `-CXXRecord 0x19ee2f8 '__va_list_tag'

struct larrow

Now we get to the real stuff: The template definition of larrow. It is a ClassTemplateDecl, which represents the declaration of a class or struct template. Inside it has a single TemplateTypeParamDecl corresponding to the template argument T.


ClassTemplateDecl TemplateTypeParmDecl

|-ClassTemplateDecl 0x1a37ed0 <larrow.cc:1:1, line:5:1> line:2:8 larrow
| |-TemplateTypeParmDecl 0x1a37d80 <line:1:10, col:16> col:16 referenced class depth 0 index 0 T

Then comes a CXXRecordDecl definition for larrow. This is the object which will be copied for each instantiation of the template.


CXXRecordDecl

| |-CXXRecordDecl 0x1a37e40 <line:2:1, line:5:1> line:2:8 struct larrow definition

What follows is a DefinitionData node. This is not a real AST node, but instead a way to display a lot of information succinctly. It's an implementation detail of clang::TextNodeDumper. If you're interested in understanding this fully, see the source here to trace each term to the CXXRecordDecl method which determines it.

| | |-DefinitionData standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
| | | |-DefaultConstructor
| | | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveConstructor exists simple trivial needs_implicit
| | | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveAssignment exists simple trivial needs_implicit
| | | `-Destructor simple irrelevant trivial needs_implicit

Next, we find a perplexing CXXRecordDecl inside the earlier CXXRecordDecl. Why? This is an implementation detail of clang to simplify the logic for implementing name lookups. When you're working with the AST, you can test for this extraneous record by calling Decl::isImplicit().


CXXRecordDecl

| | |-CXXRecordDecl 0x1a38110 <col:1, col:8> col:8 implicit referenced struct larrow

Finally, we arrive at the body of the larrow class. We have a CXXConstructorDecl and FieldDecl for the constructor and field, respectively.


CompoundStmt CXXConstructorDecl CXXCtorInitializer DeclRefExpr FieldDecl ParenListExpr ParmVarDecl

| | |-CXXConstructorDecl 0x1a38338 <line:3:5, col:29> col:5 larrow<T> 'void (T *)'
| | | |-ParmVarDecl 0x1a38208 <col:12, col:15> col:15 referenced a_ 'T *'
| | | |-CXXCtorInitializer Field 0x1a383f8 'a' 'T *'
| | | | `-ParenListExpr 0x1a38480 <col:22, col:25> 'NULL TYPE'
| | | |   `-DeclRefExpr 0x1a38460 <col:23> 'T *' lvalue ParmVar 0x1a38208 'a_' 'T *'
| | | `-CompoundStmt 0x1a384c8 <col:27, col:29>
| | `-FieldDecl 0x1a383f8 <line:4:5, col:8> col:8 a 'T *'

What's this? Wow, it's the template larrow<T> instantiated for the type struct C!

Yep, the AST contains not only the original template, but all of the specializations the compiler determines are needed based on the uses it finds.


ClassTemplateSpecializationDecl CompoundStmt CXXConstructorDecl CXXCtorInitializer CXXDestructorDecl CXXRecordDecl CXXStaticCastExpr DeclRefExpr FieldDecl ImplicitCastExpr MemberExpr ParmVarDecl RecordType TemplateArgument

| `-ClassTemplateSpecializationDecl 0x1a6c6e8 <line:1:1, line:5:1> line:2:8 struct larrow definition
|   |-DefinitionData pass_in_registers standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
|   | |-DefaultConstructor
|   | |-CopyConstructor simple trivial has_const_param implicit_has_const_param
|   | |-MoveConstructor exists simple trivial
|   | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
|   | |-MoveAssignment exists simple trivial needs_implicit
|   | `-Destructor simple irrelevant trivial
|   |-TemplateArgument type 'C'
|   | `-RecordType 0x1a69030 'C'
|   |   `-CXXRecord 0x1a68f98 'C'
|   |-CXXRecordDecl 0x1a6cbc8 <col:1, col:8> col:8 implicit struct larrow
|   |-CXXConstructorDecl 0x1a6cdf0 <line:3:5, col:29> col:5 used larrow 'void (C *)'
|   | |-ParmVarDecl 0x1a6ccc8 <col:12, col:15> col:15 used a_ 'C *'
|   | |-CXXCtorInitializer Field 0x1a6cec0 'a' 'C *'
|   | | `-ImplicitCastExpr 0x1a6f3f8 <col:23> 'C *' <LValueToRValue>
|   | |   `-DeclRefExpr 0x1a6f3b8 <col:23> 'C *' lvalue ParmVar 0x1a6ccc8 'a_' 'C *'
|   | `-CompoundStmt 0x1a384c8 <col:27, col:29>
|   |-FieldDecl 0x1a6cec0 <line:4:5, col:8> col:8 referenced a 'C *'
|   |-CXXDestructorDecl 0x1a6cf28 <line:2:8> col:8 implicit referenced ~larrow 'void () noexcept' inline default trivial
|   |-CXXConstructorDecl 0x1a6eae8 <col:8> col:8 implicit constexpr larrow 'void (const larrow<C> &)' inline default trivial noexcept-unevaluated 0x1a6eae8
|   | `-ParmVarDecl 0x1a6ebf8 <col:8> col:8 'const larrow<C> &'
|   `-CXXConstructorDecl 0x1a6eca8 <col:8> col:8 implicit used constexpr larrow 'void (larrow<C> &&) noexcept' inline default trivial
|     |-ParmVarDecl 0x1a6edb8 <col:8> col:8 used 'larrow<C> &&'
|     |-CXXCtorInitializer Field 0x1a6cec0 'a' 'C *'
|     | `-ImplicitCastExpr 0x1a6f030 <col:8> 'C *' <LValueToRValue>
|     |   `-MemberExpr 0x1a6f000 <col:8> 'C *' xvalue .a 0x1a6cec0
|     |     `-CXXStaticCastExpr 0x1a6efd0 <col:8> 'larrow<C>' xvalue static_cast<struct larrow<struct C> &&> <NoOp>
|     |       `-DeclRefExpr 0x1a6efa0 <col:8> 'larrow<C>' lvalue ParmVar 0x1a6edb8 '' 'larrow<C> &&'
|     `-CompoundStmt 0x1a6f070 <col:8>

Notice that the original template only had one constructor, but this one has three:

  1. void (C *)
    This is the constructor which the user wrote down. This will show up later because it's (ab)using one of the unfortunate defaults of C++: single argument constructors become implicit conversions.

  2. void (const larrow<C> &)
    This is the copy constructor. Notice that it does not have a definition. This is because this translation unit doesn't ever invoke it.

  3. void (larrow<C> &&)
    This is the move constructor. It has a definition because it may be invoked in a few places. Those places are all marked as being elidable, but it's generated by clang in case it is needed.


operator< and operator-

There isn't anything particularly surprising in this next section; it's the same template & specialization structure with nodes that correspond roughly one-to-one with the source.


BinaryOperator BuiltinType.html CallExpr CompoundStmt CXXDependentScopeMemberExpr CXXMemberCallExpr DeclRefExpr FunctionDecl FunctionTemplateDecl ImplicitCastExpr MemberExpr ParenExpr ParmVarDecl RecordType ReturnStmt TemplateArgument TemplateTypeParmDecl

|-FunctionTemplateDecl 0x1a38a48 <line:7:1, line:10:1> line:8:3 operator<
| |-TemplateTypeParmDecl 0x1a384d8 <line:7:11, col:17> col:17 referenced class depth 0 index 0 T
| |-TemplateTypeParmDecl 0x1a38558 <col:20, col:26> col:26 referenced class depth 0 index 1 R
| |-FunctionDecl 0x1a389a8 <line:8:1, line:10:1> line:8:3 operator< 'R (R (T::*)(), larrow<T>)'
| | |-ParmVarDecl 0x1a38798 <col:13, col:24> col:21 referenced f 'R (T::*)()'
| | |-ParmVarDecl 0x1a38890 <col:27, col:37> col:37 referenced it 'larrow<T>'
| | `-CompoundStmt 0x1a38c68 <col:41, line:10:1>
| |   `-ReturnStmt 0x1a38c58 <line:9:5, col:23>
| |     `-CallExpr 0x1a38c38 <col:12, col:23> '<dependent type>'
| |       `-ParenExpr 0x1a38c18 <col:12, col:21> '<dependent type>'
| |         `-BinaryOperator 0x1a38bf8 <col:13, col:20> '<dependent type>' '->*'
| |           |-CXXDependentScopeMemberExpr 0x1a38b90 <col:13, col:16> '<dependent type>' lvalue .a
| |           | `-DeclRefExpr 0x1a38b70 <col:13> 'larrow<T>' lvalue ParmVar 0x1a38890 'it' 'larrow<T>'
| |           `-DeclRefExpr 0x1a38bd8 <col:20> 'R (T::*)()' lvalue ParmVar 0x1a38798 'f' 'R (T::*)()'
| `-FunctionDecl 0x1a6d398 <line:8:1, line:10:1> line:8:3 used operator< 'void (void (C::*)(), larrow<C>)'
|   |-TemplateArgument type 'C'
|   | `-RecordType 0x1a69030 'C'
|   |   `-CXXRecord 0x1a68f98 'C'
|   |-TemplateArgument type 'void'
|   | `-BuiltinType 0x19ed620 'void'
|   |-ParmVarDecl 0x1a6d1a8 <col:13, col:24> col:21 used f 'void (C::*)()'
|   |-ParmVarDecl 0x1a6d280 <col:27, col:37> col:37 used it 'larrow<C>':'larrow<C>'
|   `-CompoundStmt 0x1a6f548 <col:41, line:10:1>
|     `-ReturnStmt 0x1a6f538 <line:9:5, col:23>
|       `-CXXMemberCallExpr 0x1a6f518 <col:12, col:23> 'void':'void'
|         `-ParenExpr 0x1a6f4f8 <col:12, col:21> '<bound member function type>'
|           `-BinaryOperator 0x1a6f4d8 <col:13, col:20> '<bound member function type>' '->*'
|             |-ImplicitCastExpr 0x1a6f4a8 <col:13, col:16> 'C *' <LValueToRValue>
|             | `-MemberExpr 0x1a6f458 <col:13, col:16> 'C *' lvalue .a 0x1a6cec0
|             |   `-DeclRefExpr 0x1a6f438 <col:13> 'larrow<C>':'larrow<C>' lvalue ParmVar 0x1a6d280 'it' 'larrow<C>':'larrow<C>'
|             `-ImplicitCastExpr 0x1a6f4c0 <col:20> 'void (C::*)()' <LValueToRValue>
|               `-DeclRefExpr 0x1a6f488 <col:20> 'void (C::*)()' lvalue ParmVar 0x1a6d1a8 'f' 'void (C::*)()'


CompoundStmt CXXConstructExpr CXXFunctionalCastExpr CXXUnresolvedConstructExpr DeclRefExpr ExprWithCleanups FunctionDecl FunctionTemplateDecl MaterializeTemporaryExpr ParmVarDecl RecordType ReturnStmt TemplateArgument TemplateTypeParmDecl UnaryOperator

|-FunctionTemplateDecl 0x1a68da8 <line:12:1, line:15:1> line:13:11 operator-
| |-TemplateTypeParmDecl 0x1a38c80 <line:12:10, col:16> col:16 referenced class depth 0 index 0 T
| |-FunctionDecl 0x1a68d08 <line:13:1, line:15:1> line:13:11 operator- 'larrow<T> (T &)'
| | |-ParmVarDecl 0x1a68c08 <col:21, col:24> col:24 referenced a 'T &'
| | `-CompoundStmt 0x1a68f80 <col:27, line:15:1>
| |   `-ReturnStmt 0x1a68f70 <line:14:5, col:24>
| |     `-CXXUnresolvedConstructExpr 0x1a68f48 <col:12, col:24> 'larrow<T>' 'larrow<T>'
| |       `-UnaryOperator 0x1a68f30 <col:22, col:23> '<dependent type>' prefix '&' cannot overflow
| |         `-DeclRefExpr 0x1a68f10 <col:23> 'T' lvalue ParmVar 0x1a68c08 'a' 'T &'
| `-FunctionDecl 0x1a6c998 <line:13:1, line:15:1> line:13:11 used operator- 'larrow<C> (C &)'
|   |-TemplateArgument type 'C'
|   | `-RecordType 0x1a69030 'C'
|   |   `-CXXRecord 0x1a68f98 'C'
|   |-ParmVarDecl 0x1a6c898 <col:21, col:24> col:24 used a 'C &'
|   `-CompoundStmt 0x1a6f3a0 <col:27, line:15:1>
|     `-ReturnStmt 0x1a6f390 <line:14:5, col:24>
|       `-ExprWithCleanups 0x1a6f378 <col:12, col:24> 'larrow<C>':'larrow<C>'
|         `-CXXConstructExpr 0x1a6f348 <col:12, col:24> 'larrow<C>':'larrow<C>' 'void (larrow<C> &&) noexcept' elidable
|           `-MaterializeTemporaryExpr 0x1a6f300 <col:12, col:24> 'larrow<C>':'larrow<C>' xvalue
|             `-CXXFunctionalCastExpr 0x1a6f2d8 <col:12, col:24> 'larrow<C>':'larrow<C>' functional cast to larrow<struct C> <ConstructorConversion>
|               `-CXXConstructExpr 0x1a6f2a8 <col:12, col:24> 'larrow<C>':'larrow<C>' 'void (C *)'
|                 `-UnaryOperator 0x1a6f268 <col:22, col:23> 'C *' prefix '&' cannot overflow
|                   `-DeclRefExpr 0x1a6f220 <col:23> 'C':'C' lvalue ParmVar 0x1a6c898 'a' 'C &'


struct C

Based on what we saw earlier, this is also straightforward:


BinaryOperator CompoundStmt CXXConstructorDecl CXXMethodDecl IntegerLiteral ParmVarDecl

|-CXXRecordDecl 0x1a68f98 <line:17:1, line:19:1> line:17:8 referenced struct C definition
| |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
| | |-DefaultConstructor exists trivial constexpr defaulted_is_constexpr
| | |-CopyConstructor simple trivial has_const_param implicit_has_const_param
| | |-MoveConstructor exists simple trivial
| | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
| | |-MoveAssignment exists simple trivial needs_implicit
| | `-Destructor simple irrelevant trivial needs_implicit
| |-CXXRecordDecl 0x1a690b8 <col:1, col:8> col:8 implicit struct C
| |-CXXMethodDecl 0x1a691c8 <line:18:5, col:21> col:10 used f 'void ()'
| | `-CompoundStmt 0x1a692e0 <col:14, col:21>
| |   `-BinaryOperator 0x1a692c0 <col:16, col:18> 'int' '+'
| |     |-IntegerLiteral 0x1a69280 <col:16> 'int' 3
| |     `-IntegerLiteral 0x1a692a0 <col:18> 'int' 5
| |-CXXConstructorDecl 0x1a694a8 <line:17:8> col:8 implicit used constexpr C 'void () noexcept' inline default trivial
| | `-CompoundStmt 0x1a69928 <col:8>
| |-CXXConstructorDecl 0x1a695a8 <col:8> col:8 implicit constexpr C 'void (const C &)' inline default trivial noexcept-unevaluated 0x1a695a8
| | `-ParmVarDecl 0x1a696b8 <col:8> col:8 'const C &'
| `-CXXConstructorDecl 0x1a69768 <col:8> col:8 implicit constexpr C 'void (C &&)' inline default trivial noexcept-unevaluated 0x1a69768
|   `-ParmVarDecl 0x1a69878 <col:8> col:8 'C &&'

int main()

Finally we arrive at the interesting part!

We have a FunctionDecl named main with type int (). Its body is a CompoundStmt (as always) with two parts:

  1. A DeclStmt that declares a variable named x of type C. The default constructor for C will be invoked to initialize it.

  2. An ExprWithCleanups that corresponds to the interesting line in the code.

    Simplifying the inner structure a bit, we see code that amounts to: operator<(&C::f, operator-(x)).


CompoundStmt CXXConstructExpr CXXOperatorCallExpr DeclRefExpr DeclStmt ExprWithCleanups FunctionDecl ImplicitCastExpr MaterializeTemporaryExpr ParenExpr UnaryOperator VarDecl

`-FunctionDecl 0x1a69350 <line:21:1, line:24:1> line:21:5 main 'int ()'
  `-CompoundStmt 0x1a6f190 <col:12, line:24:1>
    |-DeclStmt 0x1a69a20 <line:22:5, col:8>
    | `-VarDecl 0x1a69428 <col:5, col:7> col:7 used x 'C' callinit
    |   `-CXXConstructExpr 0x1a699f8 <col:7> 'C' 'void () noexcept'
    `-ExprWithCleanups 0x1a6f178 <line:23:5, col:14> 'void':'void'
      `-CXXOperatorCallExpr 0x1a6f140 <col:5, col:14> 'void':'void' '<'
        |-ImplicitCastExpr 0x1a6f128 <col:12> 'void (*)(void (C::*)(), larrow<C>)' <FunctionToPointerDecay>
        | `-DeclRefExpr 0x1a6f0b0 <col:12> 'void (void (C::*)(), larrow<C>)' lvalue Function 0x1a6d398 'operator<' 'void (void (C::*)(), larrow<C>'
        |-ParenExpr 0x1a6c638 <col:5, col:11> 'void (C::*)()'
        | `-UnaryOperator 0x1a6c620 <col:6, col:10> 'void (C::*)()' prefix '&' cannot overflow
        |   `-DeclRefExpr 0x1a69a88 <col:7, col:10> 'void ()' CXXMethod 0x1a691c8 'f' 'void ()'
        `-CXXConstructExpr 0x1a6f080 <col:13, col:14> 'larrow<C>':'larrow<C>' 'void (larrow<C> &&) noexcept' elidable
          `-MaterializeTemporaryExpr 0x1a6ee38 <col:13, col:14> 'larrow<C>':'larrow<C>' xvalue
            `-CXXOperatorCallExpr 0x1a6cb30 <col:13, col:14> 'larrow<C>':'larrow<C>' '-'
              |-ImplicitCastExpr 0x1a6cb18 <col:13> 'larrow<C> (*)(C &)' <FunctionToPointerDecay>
              | `-DeclRefExpr 0x1a6ca98 <col:13> 'larrow<C> (C &)' lvalue Function 0x1a6c998 'operator-' 'larrow<C> (C &)'
              `-DeclRefExpr 0x1a6c658 <col:14> 'C' lvalue Var 0x1a69428 'x' 'C'


Next Steps

This exercise has demonstrated several things:

  1. The Clang AST is an excellent way to understand C++ source code. It exposes aspects of the language that are otherwise implicit.

  2. The dump() output is OK, but provides a very static view of the AST data. It would be better if it were somehow interactive. Following references is particularly painful.

  3. The clang documentation for many AST nodes is sparse, and a much more in-depth reference guide is needed to better explain what the nodes are, where they come from, and unique capabilities they have.