-
Symbolication: Beyond the basics
Discover how you can achieve maximum performance and insightful debugging with your app. Symbolication is at the center of tools such as Instruments and LLDB to help bridge the layers between your application's runtime and your source code. Learn how this process works and the steps you can take to gain the most insight into your app.
Resources
Related Videos
WWDC22
-
Download
♪ Bass music playing ♪ ♪ Alejandro Lucena: Hello, everyone.
Thank you for attending this session on symbolication.
While symbolication may seem like a vague term, we'll see the instrumental role it plays in helping you to quickly identify the root causes of bugs, crashes, and performance bottlenecks.
We'll gain a deeper intuition as to how symbolication works and cover several tools that you have at your disposal to follow along firsthand.
Along the way, we'll discuss the various sources of debug information that are necessary for a rich symbolication experience and how you can configure your app to best utilize this information.
Let's get started by familiarizing ourselves with a concrete definition and example of symbolication.
Fundamentally, symbolication is a mechanism to convert or translate how our devices see our apps at runtime — which is in terms of memory addresses and instructions — back to how we as developers see our apps — which is in terms of functions, names, and files.
Without this bridging layer, it vastly complicates diagnosing a bug even for a few lines of code.
As an example, let's consider this Swift code.
Here I have a function, `generateMagicNumber()`, that selects a particular number from a list of candidate numbers.
To do this, we first call into `numberChoices()` that returns an array of 10 randomly generated numbers.
Then, we pass that array into `selectMagicNumber(choices:)`, which returns the number at a particular index.
This seems like a reasonable program, but I encounter a crash when I first run it.
My first recourse is to check the crash log, which is rather unfruitful.
All I can tell from the thread backtrace is that my MagicNumbers app crashed somewhere.
Thanks, but I already knew that.
I have no idea what any of the registers are referring to either.
I can try to step through the app with the debugger and identify the crash, but what if this only happens in particular circumstances that I can't reproduce? Using the debugger won't necessarily pinpoint the problem for me in that case.
Or I can try to look at the disassembly, but it's much more difficult to keep track of things.
This clearly isn't a viable way to diagnose the problem, and more importantly, with the help of symbolication, we don't have to debug from this starting point.
The Xcode Organizer tells me I can download the dSYM for this app, which will reprocess the crash log.
In doing so, Xcode applies the concepts of symbolication so that I can diagnose the issue with a much nicer crash log, where I can not only see all of my functions actually being called, but I can also determine the file and line number to refer back to in my code.
This updated crash log also tells me that we tried to access an index out of range.
Alternatively, If I already have the dSYM, I can use the `atos` command to get the same information.
Looking back at my code, I come to realize that `MAGIC_CHOICE` is far outside the bounds of our 10-element array.
In another instance, I'm interested in profiling my app to deliver the fastest user experience.
Here, Instruments shows me that the app cycles through periods of high utilization and low utilization.
If we focus on a period of low utilization, Instruments tells us that the app was writing some content to a file.
However, when I inspect a period of high utilization, I get the same exact backtrace.
How is this possible? Isn't this executing the same exact code? As we'll see, this Instruments trace is only partially symbolicated.
For instance, I don't see any file names or line numbers in the backtrace as I did with the updated crash log.
As a result, it's missing some information.
With that in mind, I can similarly locate my dSYM in Instruments.
After doing this, my new Instruments trace shows that the high-utilization regions were indeed writing to a file, but they were specifically within a debug code path that I left in the program.
The low-utilization areas avoid this and represent how my app behaves in production.
Just as Xcode utilized the dSYM to symbolicate a rather uninformative crash log, Instruments also used the dSYM to enrich a partially symbolicated trace and tell me the exact cause of the performance issue.
Now, while it's great that these tools leverage symbolication to pinpoint problem areas in my code, it naturally begs a few questions.
How does this all work? Where else can I apply this? And is this all about dSYMs? To answer these questions and unlock symbolication's capabilities, we'll need to take a deeper dive into the details.
This may seem a little overwhelming, but these are important concepts to understand.
There are many tools to aid in debugging and profiling that build upon symbolication.
`atos` alone already gave us the exact root cause of a crash, and there are many more tools built into Xcode.
Further, we specified flags like `o`, `l`, and `i` to `atos`, but what do they mean? Do we always use the same set of flags? What if we don't have one of the values available? You'll also gain a good foundation in understanding when and why your backtraces may not be fully symbolicated and how to fix that, as we saw in the Instruments case.
Lastly, there are a number of build settings that you're in control of that influence the richness of symbolication.
We'll go through these build settings so you have a solid intuition about how they're utilized.
To that end, I'd like to introduce the two-step process for symbolication.
Step number one is going back to the file, and step number two is to consult debug information.
As we'll see, going back to the file is all about converting or translating runtime memory addresses into a more stable, usable form.
This allows us to communicate with our debug information to make a meaningful connection between a raw memory address and the source code.
Let's start off by discussing step number one, going back to the file.
The ultimate goal in this step is to translate a runtime memory address, such as those we saw in the original crash log, to a corresponding address in your binary on disk.
Just like we have runtime addresses, your apps and frameworks have an address space on disk, too! The on disk address space differs from the address space that your app occupies at runtime, and we need a mechanism to figure out those differences.
First, we should understand exactly what the on disk addresses are.
These addresses are assigned by the linker when you build your app.
Specifically, the linker groups your binary into segments.
Each segment contains related data and has properties such as a name, a size, and their assigned addresses.
For example, the `__TEXT` segment of your binary contains all of the functions and methods you've written, and the `__DATA` segment contains program-wide state such as global variables.
Each of these segments gets assigned a different address such that they don't overlap.
The linker records this information at the very beginning of your executables as part of the Mach-O header.
Mach-O is the format used for all executable binaries and libraries, and the system knows that it needs to read this header to run your apps.
Looking a little more closely, the Mach-O header contains a number of load commands that hold the segment properties.
The system uses these load commands to load the segments into memory.
Note that if your apps are Universal 2, then the app will have one header and a set of segments for each architecture.
We can see for ourselves by using the `otool -l` command, which prints the load commands for a specified file.
Here, we're looking a segment load command identified with `LC_SEGMENT_64`.
This load commands says that the `__TEXT` segment starts at the address in `vmaddr` and is `vmsize` bytes long.
So if the kernel follows these load commands to load the segments into memory, what exactly is the difference between the runtime and the linker addresses? Well, before the kernel actually loads the segments, it initializes a random value known as the ASLR slide.
The kernel then adds the ASLR slide to the addresses in the load commands.
So rather than loading the `__TEXT` segment at address A and the `__DATA` segment at address B, the kernel instead loads them to A+S and B+S, where S is the ASLR slide.
Since A+S and B+S are the true addresses that the system uses, they're also known as load addresses.
With that in mind, we now know that the difference between a runtime address and the linker address is the ASLR slide.
We can calculate the ASLR slide with the following equation: S = L-A, where S is the ASLR slide, L is the load address, and A is the linker address.
We'll see examples of this equation shortly, but the key point is that we can always go back to the file address space once we know the ASLR slide.
The ASLR slide equation required two addresses — the load address and the linker address — so where do we get them from? We already saw how we can query the load commands to know the linker address using `otool`.
To know the runtime address, the system queries your app either at the point of a crash or as its being profiled by Instruments for its runtime address space.
This information is reflected in the Binary Images list in your crash logs.
You can also see the load addresses interactively with the `vmmap` tool, which enumerates the active memory regions in your program.
Let's compute the ASLR slide value ourselves from the original crash log.
In the Binary Images list, I have the load address of the `__TEXT` segment.
I also have the linker address for the binary on disk when I looked at the load commands.
Subtracting these two yields an ASLR slide value of 0x45c000.
This means that every address in my program's runtime `__TEXT` segment is 0x45c000 bytes away from the linker `__TEXT` segment address.
So to see what a backtrace address from the crash log corresponds to in the file, I can subtract 0x45c000 from it to get the address on disk.
Since this address is now part of the on-disk address space, I can inspect my app to see what resides there.
The crash log tells me that a thread crashed while executing whatever is at this address, so we can use `otool` again to see the problematic instruction.
This time, I specified the `-tV` flags to `otool`, which will print the disassembly.
Notice that I'm also specifying the architecture as arm64.
This is so that otool knows which Mach-O header and segments to consider since the app is built as Universal 2.
The output of `otool` reveals a `brk` instruction at the address.
`brk` signals an exception or a problem in the app.
Tools such as `atos` also calculate the ASLR slide using the same technique we went through together.
`atos` will read the load commands for the file indicated by the `-o` flag, and we tell it about the load address with the `-l` flag.
As I mentioned, `vmmap` can also tell us about the load addresses of a running app.
Let's try this calculation again, but this time we'll use `vmmap` instead of the binary image list to determine the ASLR slide.
I ran the MagicNumbers program again and got the `__TEXT` segment load address before the program crashed.
Using the previous formula, I can determine that this time the ASLR slide value was 0x104d14000.
Again, to go back to the file, I need to subtract the ASLR slide value.
If I subtract 0x104d14000 from the topmost entry in the new crash log, I get the same exact file address as before.
And this is no coincidence; the kernel just picked a different ASLR value so our load addresses changed between the crash logs.
However, we can still determine the file address that was responsible for the crash.
The important takeaway here is that we have a mechanism to understand exactly what our app was doing down to the instruction level, regardless of its runtime address.
And with that mapping, we can then consult our debug information for the source code that was compiled to those instructions.
Before we move on, I want to present a summary of what we covered and the tools we used.
App binaries and frameworks are Mach-O files.
This means that they have related content in their different segments.
These segments are created by the linker.
The Mach-O header load commands describe the properties of those segments, including an address.
We used `otool` with the `-l` flag to print out the load commands.
Next, we learned that the kernel adds a random value, known as the ASLR slide, to the linker addresses.
The addition of the ASLR slide and the linker address is known as a load address.
We can check the binary image list in a crash log to see the load addresses in the event of a crash, or we can use `vmmap` to see the load addresses for a running app.
Finally, we walked through some examples of calculating the ASLR slide to get back to the file address space.
Now we can discuss debug information, which contains the crucial links between the file addresses and the source code.
Xcode creates the debug information when you build your app and will either embed it directly into your app binaries or store it as a separate file, such as a dSYM.
There are a few categories or types of debug info.
Each one offers a different level of detail for a given file address.
We'll look at three different types of debug info today.
First, we'll cover the function starts, which by itself doesn't add too much value, but it is a common starting point.
Next we'll see the nlist symbol tables, which add function and method names.
Lastly, we'll look at DWARF, which comes from dSYMs and static libraries.
DWARF adds the most detail, including file names, line numbers, and optimization records.
Since DWARF offers the most detail, we really want to strive to have this type of debug info whenever possible.
We'll learn about each of these and how they can be used to build up the fully symbolicated crash log.
Let's begin with function starts.
As we saw in the table, function starts offers the least source code detail.
Also keeping true to its name, this type of debug information only tells us about the first address — or the literal start — of our functions.
For example, this would tell us that a function begins and exists at a certain address.
However, it doesn't tell us which functions begin at those addresses, only that they exist.
The function starts debug info does this by encoding the list of addresses in your app's `__LINKEDIT` segment.
Since this is embedded directly in your app, the Mach-O header also has a load command to inform us where we can find it, which is `LC_FUNCTION_STARTS`.
You can see these for yourself with the `symbols` command and the `-onlyFuncStartsData` flag.
Here we get back a list of addresses and null placeholders.
These placeholders would ideally have function and method names instead of null, but the function starts data doesn't provide names.
Again, this isn't the most descriptive data.
However, it does allow for a slight update to the crash log.
We can now view the file addresses as offsets from a function.
For example, first we go back to the file by subtracting the ASLR slide value.
Then we find the function starts value that could contain the file address.
In this case, only the first value could contain the address because all of the other values are larger than the address.
Finally we can claim that our file address is actually 264 bytes into this function.
This is primarily useful for debuggers, since they can understand the details of how this function was set up and which registers were modified.
What this means for you, however, is that if you ever encounter a crash log that lacks function names, you're probably dealing with this lowest level of debug information.
This is good news because it means there are plenty of opportunities to enrich the crash log with better debug information.
Naturally, the next level of detail we'd like to see are function names.
This gives us our first real opportunity to take a crash log or an Instruments trace and use it to track down an issue in our source code.
This leads us to nlist symbol tables.
The symbol tables build on the idea of function starts and also encode a list of information in the `__LINKEDIT` segment, and has its own load command too.
However, rather than just encoding addresses, they encode C structs.
This lets us add more details for any given entry compared to function starts.
Specifically, they encode the `nlist_64` struct.
Here we have the definition of that struct.
A cursory glance shows that we have access to a name and several properties.
The values of these struct fields are determined by the nlist's `n_type`.
There are three primary n_types that we're interested in, but for now we'll only focus on two.
The first is known as a direct symbol.
These are functions and methods that you have fully defined within your apps and frameworks.
Direct symbols have a name and an address in the `nlist_64` struct.
Additionally, they're represented by a particular bit pattern in the `n_type` field.
Specifically, `n_type` will have the second, third, and fourth least significant bits set.
These bits are also known as `N_SECT`.
We can see these with `nm` and specifying the `-defined-only` and `--numeric-sort` flags.
Here, `nm` walked through the defined symbols of the MagicNumbers program and listed them in address order.
The names we get back appear cryptic.
That's because the names that are actually stored in the symbol tables are mangled names.
These mangled names help the compiler and linker to uniquely identify a function, but they aren't easy to understand unless they're demangled.
To get a friendlier version of these names, I passed the output to `swift-demangle`.
Now we get some familiar names, such as `main` and `numberChoices`, because they're directly defined in my app.
Similarly, the `symbols` tool has an option to show the nlist data, and it also demangles the names automatically.
Now that we can associate a function name to an address, this lets us update the crash log once again.
Here we can observe that our offset expression that we got from the function starts data also matches an entry from the direct symbols, and that entry has a name.
Putting these two together, we can now say that our crash happened at 264 bytes into main.
This still leaves some details to be desired, since we know for a fact that main isn't the only function involved.
And it would also help to have an exact line number too.
We encountered something similar to this in the Instruments trace example — we had some function names available, but it missed others.
One reason for this is that the symbol table only has direct symbol entries for functions that are involved in linking.
These are functions that you use across modules or functions that you export from your frameworks.
This makes it useful for identifying API boundaries.
And it also means that it has the necessary data to power dynamic loading with functions such as `dlsym()` and `dladdr()`.
One downside though is that local or static functions aren't represented in the symbol table since they aren't referenced outside of their module.
This ends up omitting implementation functions where we may have a significant portion of the app logic.
Further, it's common for binaries built in Release mode to have their symbol table stripped.
This means that unnecessary entries are removed from the symbol table, which helps reduce the size of your app.
If we think about it, it's rather uncommon for our app's primary driver to export functionality anywhere, so we'd be wasting space keeping those symbol table entries around.
For our frameworks and libraries, we definitely have exported functions that clients should use, but there's no need to keep around the locally shared functions since they can't be used anywhere else.
Stripping our primary app executables will almost always leave the symbol table practically empty.
Stripping our frameworks and libraries leaves only the exported functions.
You may have come across build settings in Xcode such as Strip Linked Product, Strip Style, and Strip Swift Symbols.
These build settings control how your app is stripped during the build.
If Strip Linked Product is enabled, then the binary is stripped according to Strip Style.
For example, All Symbols will perform the most invasive removal and leave behind only the bare essentials.
Non Globals removes direct symbols that are used within different modules of your app but are not exported for use in other apps.
Debugging Symbols removes the third type of nlist type that we'll discuss later when we get to DWARF.
However, this strip style does preserve the direct symbols.
For example, here I have a framework that defines two public interfaces and one internal shared implementation function.
Since all of these functions play a role in linking, they all have direct symbol entries.
If I strip non globals, then I'm only left with my interfaces.
The shared implementation function was only used within my framework, so it isn't considered global.
Similarly, stripping all symbols still leaves the interface since these are required for other apps to use the framework.
You can also notice in the `symbols --onlyNListData` output that there are function starts addresses interspersed between the direct symbols.
These addresses represent functions that were either never in the direct symbols, or were stripped.
You can tailor these strip settings to your desired level of symbol table visibility.
With this information, we can determine when we're working with direct symbols.
Some telltale signs of this are having function names but no line numbers or file names; or having a mix of function names and function starts addresses, as we have here with the framework example.
The second type of nlist struct we'll analyze is known as an indirect symbol, as opposed to direct symbol.
This is when the `n_type` matches only the `N_EXT` bit pattern.
These are functions and methods that you're using from other frameworks or libraries, such as `print()`.
You can see these with `nm`, only this time we'll specify `--undefined-only` instead of `--defined-only`.
We'll also add the `-m` flag, which will show you which frameworks or libraries the functions should be found in.
For example, the MagicNumbers app depends on a variety of Swift functions that are defined in libswiftCore.
Now that we've covered two of the three debug information categories, let's make sure we understand their properties.
Function starts are a list of addresses, so they lack names, but do allow us to determine offsets.
nlist symbol tables encode entire structs of information and can associate a name to an address.
They describe direct symbols — which are defined in your app — and indirect symbols, which are provided by dependencies.
Direct symbols are generally reserved for functions involved in linking, and the strip build settings influence which direct symbols are available.
Finally, both the function starts and the nlist symbol tables are embedded directly in your app.
What we haven't seen yet is the richer levels of detail, such as file names and line numbers.
This is provided to us by DWARF.
DWARF takes the concepts of nlist symbol tables to a completely different level.
Rather than keeping only a subset of functions, DWARF strives to describe everything.
We saw that nlist symbol tables added vastly more information versus the function starts.
It accomplished that by adding a dimension.
Remember, we started with just a single dimension, which were the addresses, when we looked at function starts.
Then we upgraded to two dimensions by encoding a struct full of information in the nlist symbol table.
DWARF adds a third dimension, which is about relationships.
DWARF recognizes that functions aren't isolated.
They call other functions, they have parameters, return meaningful data, and are defined in a particular file.
Encoding these relationships unlocks the most powerful aspects of symbolication.
When we're analyzing DWARF, we're primarily referring to a dSYM bundle.
In addition to other metadata such as plists, the dSYM bundle contains a binary with DWARF.
What makes this binary so special? The binary contains its data in a special `__DWARF` segment.
The DWARF specification mentions three streams of data within the segment that we'll focus on.
`debug_info` contains the raw data, `debug_abbrev` assigns structure to the data, and `debug_line` contains the file names and line numbers.
DWARF also defines two vocabulary types that we'll study first: the compile unit and the subprogram.
We'll introduce a third one later.
A compile unit represents a single source file that went into building the product.
For example, we can expect to have one compile unit for each Swift file in our project.
DWARF assigns properties to a compile unit, such as the name of the file, the SDKs, the portion of the `__TEXT` segment its functions occupy, and much more.
The main.swift compile unit contains these properties in the `debug_info` stream on the left, and it has a corresponding entry in the `debug _abbrev` stream on the right which tells us what the values represent.
Here we see the file name, the language it's written in, and a low/high pair representing the `__TEXT` segment range.
A subprogram represents a defined function.
We saw defined functions in the nlist symbol table already, but the subprogram can describe static and local functions too.
A subprogram also has a name and its `__TEXT` segment address range.
One fundamental relationship between compile units and subprograms is that subprograms are defined in a compile unit.
DWARF represents this with a tree.
The compile unit is at the root of the tree and it has subprogram entries as children.
The children are searchable by following their address ranges.
We can examine these in more detail with the `dwarfdump` command.
First we'll look at a compile unit.
This matches some of the properties of a compile unit I mentioned earlier.
`dwarfdump` helpfully combines the `debug_ info` and `debug _abbrev` contents to show you the structure and content of the data in your dSYMs.
And if we scroll down the output, we'll encounter one subprogram child.
The address range it occupies is within the bounds of the compile unit and we can also see the name of the function.
I mentioned that DWARF describes its data in extreme detail.
While we won't spend much time on all of these details, I think it's fun to see details such as function parameters.
They have their own vocabulary type that describes the name and type of the parameter.
Following the tree model, a parameter is a child of a subprogram.
Here we come across the entry for the choices parameter that we supply to a function.
Next, file names and line numbers come from the `debug_line` stream.
This stream doesn't have a tree structure.
Instead, it defines a line table program where the individual file addresses can be mapped back to an exact line of code.
This ends up generating a list of source code details which we can search to find the file and line.
If we parse the `debug_info` tree and generate the `debug_line` list, we end up with a structure like the following.
So if want to match a file address, we can traverse the tree.
First, we'll start at the compile unit and follow the branches.
Then we'll pick up any of the `debug_line` entries that matched.
We can automate this again with `atos`, only this time I'm specifically leaving out the `-i` flag.
Notice anything slightly odd here? Yes, we have the function name and line number, so we're definitely using DWARF.
Other than that though, this isn't all too different from the nlist symbol table update.
In fact, when we compare it to the first time we used `atos`, it still looks like we're missing so many valuable functions and details! What happened here? The only thing that changed was that we didn't specify the `-i` to `atos` this time.
That flag stands for "inlined functions." Inlining is a routine optimization that compilers perform.
This involves substituting a function call with the body of the function directly.
One cool effect that it has is making code seemingly disappear.
We can think of it as, rather than calling `numberChoices()`, the entirety of the code for `numberChoices()` was dropped in place.
Suddenly there's no function call to `numberChoices()` anymore! DWARF represents this with an inlined subroutine.
This is the third and final vocabulary type for DWARF that we'll discuss today.
An inlined subroutine is a subprogram — so it's a function — that was inlined into another subprogram.
Since an inlined function is completely engulfed by another node in the relationship tree, the inlined subroutine is a child of that node.
This definition applies recursively too, meaning that an inlined subroutine can have other inlined children.
Again, with `dwarfdump`, we can look for inlined subroutines.
They're listed as children of other nodes and have similar properties to subprograms, such as names and addresses.
However, in DWARF, these properties are frequently accessed through a common node, known as the abstract origin.
If there are many inlined copies of a particular function, then their common, shared properties are kept in the abstract origin so that they aren't duplicated everywhere.
One unique property that inlined subroutines have is a call site.
This is the location in our source code where we wrote the actual function call, but the optimizer replaced it.
Here for instance, we made the call to `generateANumber` on line 36 of the main.swift file.
This lets us update our tree with new child nodes.
And now this is looking like a much more comprehensive view of our program.
The optimization details for inlined functions were the key details in getting us to the fully symbolicated crash log.
The `-i` flag for `atos` instructs the tool to consider them during symbolication.
They were also the missing details from our Instruments trace.
The reason why we needed a dSYM both for Instruments and for the crash log was precisely so that we could extract all of this content.
There is another source where you'll find DWARF, and that is from static libraries and object files.
In the absence of a dSYM, you can still gather DWARF for functions that you linked from a static library or object file.
In those cases, you'll find the Debugging Symbols nlist types.
These were one of the symbol types that could be stripped.
They don't hold the DWARF themselves though.
Rather, they associate a function back to the file they came from.
If the library was built with debug information, then the nlist entry can point us to that DWARF.
These types of nlist entries can be seen verbosely with `dsymutil -dump-debug-map`.
Here we have the list of different functions and where they were pulled from.
Those locations can be scanned and processed for DWARF.
To summarize, DWARF is a vital source of in-depth symbolication data.
DWARF exposes important relationships between functions and files.
Optimizations such as function inlining have an enormous impact on the quality of symbolication, and DWARF can express it very well.
We also saw that dSYMs and static libraries contain DWARF.
However, please prefer dSYMs as you can easily transfer them to others and have built-in support from several tools.
Finally, I want to share different tools and tips you can use to facilitate symbolication.
For local development builds, you'll generally have a great deal of debug information if you build in debug mode.
For release mode, you can ensure that Xcode generates a dSYM by checking the Debug Information Format build setting.
Make sure Release is set to DWARF with dSYM File.
For apps that were submitted to the App Store, you can download your dSYMs through App Store Connect.
This also includes any apps with bitcode enabled.
If you want to check that a certain dSYM is already on your device, you can use the `mdfind` command.
The alphanumeric string here is your binary's UUID, which is a unique identifier defined in a load command.
You can see the UUID for your dSYMs with `symbols -uuid`.
Sometimes, a toolchain could generate invalid DWARF.
You can check this with `dwarfdump -verify`.
If you see any reported errors, please file a bug! DWARF data also has a cap of four gigabytes per binary.
If you're running into issues with your dSYMs and see that they're exceeding four gigabytes, consider splitting the project into separate components so that each one has its own smaller dSYM.
You can make sure that the dSYM you're using matches the specific build of your app that you're interested in by comparing UUIDs.
The app's UUID is in the Binary Images list section of crash reports and you can also see it with the `symbols` command.
You should ensure that both your app and your dSYM have the same UUIDs.
The `symbols` tool also lets you check the types of debug information your app has available.
We've already seen examples of this, but it's a helpful reminder that these tags in square brackets tell you the information source.
It's useful if you aren't sure which debug information you might be dealing with.
If you're certain that you have dSYMs available but still aren't getting names for your functions in Instruments traces, please check your entitlements and code signing.
Specifically, with the `codesign` command, you can verify that you have a proper code signature.
You should also verify that locally built apps for development have the `get-task-allow` entitlement.
This entitlement grants permissions to tools such as Instruments to symbolicate your app.
Xcode should set this entitlement automatically with the Profile action, but it's helpful to verify.
If you don't have the `get-task-allow` entitlement enabled, you should check your Code Signing Inject Base Entitlements build setting and make sure that it's enabled as you're developing.
Lastly, for Universal 2 apps, you should specify the architecture that you're interested in to the tools.
`symbols`, `otool`, and `dwarfdump` all have an `-arch` flag to only operate on a particular architecture slice.
This concludes "Symbolication: Beyond the basics." If nothing else, I want to really emphasize a few key points.
UUIDs and file addresses are a consistent and reliable way to identify what your app was doing since they're independent of the ASLR slide.
They're also our key to querying the debug info.
You should also use dSYMs whenever possible.
dSYMs contain the richest debug information in the form of DWARF and are supported by Xcode and Instruments.
Lastly, we covered several tools.
These tools are readily available to you in Xcode and they offer powerful diagnostics and insights.
You should strive to incorporate them into your workflows for debugging and optimizing.
If you're interested in learning more, I recommend these two sessions from WWDC18 to learn how your apps spring to life upon launch: "Optimizing app startup time" and "App startup time: Past, present, and future".
Thank you all very much for joining me to learn about symbolication! Have a wonderful rest of the week.
-
-
1:11 - MagicNumbers
func selectMagicNumber(choices: [Int]) -> Int { return choices[MAGIC_CHOICE] } func randomValue() -> Int { return Int.random(in: 1...100) } func numberChoices() -> [Int] { var choices = [Int]() for _ in 1...10 { choices.append(randomValue()) } return choices } func generateMagicNumber() -> Int { let numbers = numberChoices() let magic = selectMagicNumber(choices: numbers) return magic } print("The magic number is: \(generateMagicNumber())")
-
2:51 - atos symbolication
atos -o MagicNumbers.dSYM/Contents/Resources/DWARF/MagicNumbers -arch arm64 -l 0x10045c000 -i 0x10045fb70
-
7:34 - Load commands
otool -l MagicNumbers | grep LC_SEGMENT -A8
-
10:31 - Disassembly
otool -tV MagicNumbers -arch arm64
-
11:32 - vmmap
vmmap MagicNumbers | grep __TEXT
-
15:09 - Function starts
symbols -onlyFuncStartsData -arch arm64 MagicNumbers
-
17:06 - nlist_64
struct nlist_64 { union { uint32_t n_strx; } n_un; uint8_t n_type; uint8_t n_sect; uint16_t n_desc; uint64_t n_value; };
-
17:59 - Direct symbols with nm
nm -arch arm64 —defined-only --numeric-sort MagicNumbers
-
18:30 - Demangled direct symbols with nm
nm -arch arm64 —defined-only --numeric-sort MagicNumbers | xcrun swift-demangle
-
18:43 - Demangled direct symbols with the symbols tool
symbols -arch arm64 -onlyNListData MagicNumbers
-
23:06 - Indirect symbols with nm
nm -m —arch arm64 --undefined-only --numeric-sort MagicNumbers
-
27:16 - Examining dSYMs with dwarfdump
dwarfdump -v -debug-info -arch arm64 MagicNumbers.dSYM
-
29:25 - atos symbolication without inlined functions
atos -o MagicNumbers.dSYM/Contents/Resources/DWARF/MagicNumbers -arch arm64 —l 0x10045c000 0x10045fb70
-
32:29 - Examining debugging symbols
dsymutil --dump-debug-map -arch arm64 MagicNumbers
-
33:59 - Examining dSYM UUIDs
symbols -uuid MagicNumbers.dSYM
-
34:03 - Verifying DWARF
dwarfdump —verify MagicNumbers.dSYM
-
35:09 - Verifying entitlements and codesigning
codesign --display -v --entitlements :- MagicApp.app
-
-
Looking for something specific? Enter a topic above and jump straight to the good stuff.