In this report we analyze mobile app crashes during interactions with cloud services. We compare the failure rates on iOS and Android, dive into which App Store categories are the most affected by networking issues, and most importantly analyze why these issues occur at such an alarming rate in the first place!
A crash in a mobile app caused by a network call. For example, an app communicating with
a cloud service may return bad data, result in an error, take too long for the request to
complete, or simply fail to respond at all.
20% OF CRASHES ARE CORRELATED
with a network issue
We evaluated network calls that occur right before a mobile application crashes. As it turned out 20% of crashes had issued the same network call right before the application quit.
This finding is, of course, a correlation; we were determined to get closer to causation. After tuning our filters we believe the causation figure is closer to 7% of all crashes on iOS and Android.
ANDROID NOUGAT IS 2.5X MORE LIKELY TO HAVE NETWORK ISSUES VS IOS 10;
Nougat has the highest crash rate
Overall, we found both iOS and Android have network related crashes in the 7% range. However, this takes into account all versions of each operating system. Let’s look at this by the most recent and popular versions:
On Android 7 (Nougat), this means that 1 in every 5 crashes that occur are related to a networking issue. Nougat adoption is still very low, so the numbers are likely to change over time. However, when Marshmallow was released it was the most stable Android OS in Apteligent’s datawarehouse. The opposite has happened thus far with Nougat:
For more iOS adoption data and crash rate data:
For more Android adoption data and crash rate data:
Medical, Finance, and Shopping apps
in the Top 10 Worst Network Performance
The table below lists all of the app store categories for iOS and Android. For each one, we list the percentage of time a crash happens due to a network issue. The categories at the top of the list are the most sensitive to network issues. The most impacted category on Android — Personalization — includes live wallpaper and lockscreen widget apps, which typically consume cloud service data at regular intervals.
Most interesting are the appearance of “sensitive” apps in the top ten list: apps that are both critical to your health or to an enterprise’s revenue. Medical (#2), Finance (#5), and Shopping (#9) apps all appeared towards the top of the list.
In the table above, some categories exist on both platforms while some are unique to either Android or iOS. For more information on app store categories, check out the iOS listing and the Google Play listing.
The rest of the report aims to narrow in on the source of these network-related issues. In this section we examine specific cloud services and their involvement in network crashes. A cloud service can appear here either via an SDK embedded in the app (for example the Facebook SDK) or via a developer directly accessing a service provider’s API.
TWITTER RANKS #3 WORST IN ANALYTICS
AND #5 WORST IN ADVERTISING
for Network Crashes
Looking at the table above, one provider that stands out is Twitter, which runs Fabric, their own analytics and advertising platform. The majority of the providers on the list are startups, with two exceptions in IBM Coremetrics and Flurry, which is owned by Yahoo. Contrast Twitter’s performance against Facebook, which had one of the lowest rates of network crashes on the list.
Apteligent has the capability to notify you when SDKs are causing problems in your apps.
Why are these issues occurring?
There are many ways a network call can contribute to a crash, besides actually failing. For example, an app can successfully communicate with a cloud service but receive unexpected data back that leads to a crash. We explored the responses from the network calls (the status codes), as well as the following three metrics: latency of the network request, the amount of data received, and the overall speed of the request.
88% OF SUSPECT NETWORK CALLS
Were Successful Before a Crash
Mobile developers are typically careful to detect and handle a failed network call. What seems counterintuitive is that the majority of crashes happen after successful network calls. So, what can cause unexpected results and in turn cause a crash? The answers are latency, the amount of data received, and even the specifics of the data returned (for example an unknown response).
Latency represents how long a network request takes to complete. For this metric, and all of the following metrics, we subtract the overall average from the average of specific network calls that led to a crash. For example, for latency:
∆ latency = Average Network Crash Latency – Average Latency
The negative skew in the graph means network calls that led to a crash took less time than expected. The resulting integral (area under the graph) shows that 72% of network-related crashes took less time than average to complete, 5% took the same time as the average, and 23% took longer than average to complete.
Based on the results of the analysis below, we believe the negative latency skew is due to less data being returned on average for network calls that lead to a crash.
For this part of the analysis, we wanted to see how the amount of data being returned impacted the frequency of network crashes. Since the vast majority of the calls were successful, we graphed the distribution only for network calls that resulted in a 200 (success).
∆ bytes diff = Average Network Crash Bytes Received – Average Bytes Received
The results show that 55% of the network crash endpoints returned less data than expected, 17% returned the usual amount of data, and 28% returned more data than expected. Although not as skewed as the latency distribution, it is notable that the data received chart leans left and indicates a similar rate of high latency requests as high data requests.
10% OF SUCCESSFUL NETWORK CALLS
that resulted in a crash returned no data
Successful network calls that result in a crash on average return less data than their non-crashing counterparts; 10% of the time no data is returned at all. We conclude that while developers are accounting for complete communication failures with a cloud service, they need to be better about handling cases where the network call completes successfully but doesn’t return any data.
To calculate the speed of the request, we added together the total bytes sent and received then divided by the latency. In the graph below this is referred to as “bpl” or “bytes per latency.”
bpl (speed) = (bytes out + bytes in) / latency
∆ bpl diff = Average Network Crash Speed – Average Speed
The graph, similarly skewed to the left, shows 89% of network-related crashes had speeds lower than expected, 2% were average, and 9% were faster than expected. Taken with the finding that 72% of the time latency was lower than average, we would expect the speed to increase. Since the majority of calls were successful, the lower speed must be driven by a lower than expected bytes received.
A surprisingly high percentage of fatal errors in apps are caused by network issues. Therefore it is important to have what we call “network breadcrumbs” or the ability to see how user behavior initiates certain network calls that can ultimately lead to a crash.
Our analysis showed that certain external factors do influence the network crash rate, including OS version, app store category, and cloud service provider. We also found that certain external factors do not influence the likelihood of network crashes; for example, we found no meaningful difference among specific network carriers. We did, however, discover clear signals coming from latency, data received, and request speed. Mobile developers are handling the case of complete communication failures, but must expand error handling to include cases when an endpoint returns either unexpected data or no data at all.
To receive your own report like this, check out our new CUSTOM INSIGHTS™ capability. You gain access to billions of data points about the mobile ecosystem in the context of your app and utilize the same big data tools as our data science team. Existing customers can log in to the portal to access custom insights.
Apteligent Network Insights product provides network breadcrumbs to automatically capture and show all the network calls leading up to each crash. The Apteligent Userflows product additionally provides visibility into user behavior that triggers these fatal flows.