The App Interchange Format (AIF)

image showing a mobile phone and apps

Description

The App Interchange Format (AIF) is an open exchange format to describe app information using json. The motivating idea is to provide an open data format that can be used not only internally on a local application, but can be used in an open and interchangeable way to exchange, process and analyze app data.

Main Structure

Every app corresponds to a key "app" (so, the AIF json corresponding to an app starts with {"app": data} ). Note that in the MoboSearch project we currently use a file for every app, but future extensions will support extending the format by allowing a higher-level container key "apps" so to allow more apps per file (and also support the jsonp format, so having multiple app jsons concatenated within the same file). These extensions can be easily supported for instance just by providing pre- and post- processors that split the data and then reassemble it.

The main informative structure (corresponding to the top-level keys in the json format) within each app data structure is the following:

			    name
			    content_rating
			    contains_ads
			    downloads
			    interactive_elements
			    rating_average
    			    id
			    version
			    requires_os
       			    updated
    			    image
			    reviews
			    price
			    developer
			    description
			    categories
			    meta
			    safety
			    security
			    uses
			    scores
			    

The first eleven fields (from name to image) contain unstructured information (strings or numbers), whereas the remaining ten fields (from reviews onwards) contain structured information. For the sake of clarity, we will first describe the first eleven unstructured fields, and then proceed with the description of the structured ones. We will also use a real app as an example, so to facilitate the understanding.

Unstructured fields

As said, there are currently eleven unstructured fields in every app data structure:

name
A string with the name of the app
content_rating
A string with the content rating of the app
contains_ads
A string stating whether the app contains ads
downloads
An integer number stating how many times the app has been downloaded
interactive_elements
A string stating whether the app has interactive elements
rating_average
A float number with the average rating of the app
id
A string with the store id of the app
version
A string with the version identifier of the app
requires_os
A string with the minimum version of the operating system needed to run the app
updated
A string with the date in which the app has beeb last updated
image
A string with the web address (URL) of the representative app image/icon

So, for instance, the corresponding informative part of the Amazon kindle app is:

    "name": "Amazon Kindle",
    "content_rating": "Teen",
    "contains_ads": "Contains ads",
    "downloads": 409007651,
    "interactive_elements": "Users Interact",
    "rating_average": 4.596948,
    "id": "com.amazon.kindle",
    "updated": "Sep 9, 2022",
    "image": "https://play-lh.googleusercontent.com/48wwD4kfFSStoxwuwCIu6RdM2IeZmZKfb1ZeQkga0qEf1JKsiD-hK3Qf8qvxHL09lQ"
Note that in the above information two fields are missing, version and requires_os: this example shows how various information fields, depending on the developer choices, can also be absent.

Structured fields

reviews

The reviews property is composed by subitems containing detailed information about review ratings, in particular how many reviewers gave a particular rating, and the total amount of reviewers giving a rating. The corresponding properties, all of integer value, are "1", "2", "3", "4" and "5" (corresponding to how many gave a ranking of 1, 2, 3, 4, 5 stars) and "total" (corresponding to the total number of reviewers giving a score).

With an example, the corresponding informative part of the Amazon kindle app is:

"reviews": {
      "1": 123894,
      "2": 39824,
      "3": 76149,
      "4": 304732,
      "5": 2115353,
      "total": 2659952
    }

Note that we chose not to use a list structure, but instead explicit property-value pairs, in order to keep maximum flexibility about the possible scoring systems (so, not necessarily the 1-5 stars one).

price

The price property describes the price of the app and has two subitems, the amount (numeric value) and the currency (string value).

For example, the current price of the Amazon kindle (a free app) is:

    "price": {
      "amount": 0,
      "currency": "USD"
    }

developer

The developer property contains structured information about the developer of the app, and has 5 subproperties, all of string value: name (the name of the developer), id (a link to the developer apps), website (the web address of the developer), email (the email address of the developer) and address (the physical address of the developer.

For instance, the corresponding developer information of the Amazon Kindle app is:

    "developer": {
      "name": "Amazon Mobile LLC",
      "id": "/store/apps/developer?id=Amazon+Mobile+LLC",
      "website": "http://www.amazon.com/kindleapps",
      "email": "kindle-cs-support@amazon.com",
      "address": "AMZN Mobile LLC\n410 Terry Ave N\nSeattle, WA 98109"
    }
  

description

This property contains textual information describing the app, with two subitems of string value: short (the short description of the app) and full (the larger textual description of the app).

This is the (truncated here for readability in the full part) description corresponding to the Amazon Kindle app:

    "description": {
      "short": "Your library in your pocket. Anytime, anywhere.",
      "full": "READ ANYTIME, ANYWHERE
               On the bus, on your break, in your bed
               (...)
               Switch seamlessly from reading your Kindle book to listening to the Audible book, all within the Kindle app.
               - Get notified when authors you love have new releases."
    }
  

categories

This property contains information about the categories of the app. Each app can have a main category, but also additional categories: correspondingly, the two subproperties of categories are main and more. main contains information on the main category, which is expressed by three subproperties, all of string value: name (the name of the category), path (the web address of the category in the store) and label (the store id for the property). more contains instead a list with all the additional categories (always described with the properties name, path and label).

So for instance, the category information of the Amazon Kindle app is:

    "categories": {
      "main": {
        "name": "Books & Reference",
        "path": "/store/apps/category/BOOKS_AND_REFERENCE",
        "label": "BOOKS_AND_REFERENCE"
      }
    }
  

Note how the app above has one main category, and no additional secondary categories. The following is an example taken from another app (Train Station: Railroad Tycoon) which also contains the additional categories:

    "categories": {
      "main": {
        "name": "Simulation",
        "path": "/store/apps/category/GAME_SIMULATION",
        "label": "GAME_SIMULATION"
      },
      "more": [
        {
          "name": "Simulation",
          "path": "/store/apps/category/GAME_SIMULATION",
          "label": "GAME_SIMULATION"
        },
        {
          "name": "Management",
          "path": "/store/search?q=Management+games&c=apps"
        },
        {
          "name": "Tycoon",
          "path": "/store/search?q=Tycoon+games&c=apps"
        }
      ]
    }
  

Note that, as seen above, the secondary categories included by the developer may include the main category as well.

meta

This property includes optional meta information related to the app: currently, the corresponding app store where it has been extracted, and the date of extraction of the information.

This is for instance the corresponding json snippet:

    "meta": {
        "store": "google",
        "date": "2022-09-11"
    }
  

safety

This property includes safety information dealing with privacy and security, and is composed of three subproperties: privacy_policy (the address of the corresponding full-text privacy policy of the app), shared (the data that are shared with other parties) and collected (data that are collected by the developer).

The shared and collected properties have the same structure, composed by the three properties action (a string value synthesizing that the data is either shared or collected), description (a longer string describing the action) and data (a structured field describing the data and their uses).

In turn, the data property is then composed by a string with all the relevant data and their uses. Each component of the string is structured with the properties main (a string value describing the main category the data belongs to), overall (a string value containing the broad subcategory of the data) and then a details property that describes the data and its uses.

Last, the details property is a list of items composed by three subproperties: name (a string containing the name of the data), optional (a boolean value stating whether the data is optionally collected or not) and usage (a string with a comma separated list of uses of the data).

So for example, the corresponding safety information of the Amazon Kindle app is the following:

    "safety": {
      "privacy_policy": "https://www.amazon.com/gp/help/customer/display.html?nodeId=468496",
      "shared": {
        "action": "Data shared",
        "description": "Data that may be shared with other companies or organizations",
        "data": [
          {
            "main": "Personal info",
            "overall": "User IDs",
            "details": [
              {
                "name": "User IDs",
                "optional": false,
                "usage": "App functionality"
              }
            ]
          },
          {
            "main": "Financial info",
            "overall": "User payment info",
            "details": [
              {
                "name": "User payment info",
                "optional": false,
                "usage": "App functionality, Account management"
              }
            ]
          },
          {
            "main": "Messages",
            "overall": "Other in-app messages",
            "details": [
              {
                "name": "Other in-app messages",
                "optional": false,
                "usage": "App functionality"
              }
            ]
          },
          {
            "main": "App info and performance",
            "overall": "Crash logs",
            "details": [
              {
                "name": "Crash logs",
                "optional": false,
                "usage": "Analytics"
              }
            ]
          }
        ]
      },
      "collected": {
        "action": "Data collected",
        "description": "Data this app may collect",
        "data": [
          {
            "main": "Location",
            "overall": "Approximate location and Precise location",
            "details": [
              {
                "name": "Approximate location",
                "optional": true,
                "usage": "App functionality"
              },
              {
                "name": "Precise location",
                "optional": true,
                "usage": "App functionality"
              }
            ]
          },
          {
            "main": "Personal info",
            "overall": "Name, Email address, User IDs, Address, and Phone number",
            "details": [
              {
                "name": "Name",
                "optional": true,
                "usage": "App functionality, Analytics, Advertising or marketing Fraud prevention, security, and compliance, Personalization, Account management"
              },
              {
                "name": "Email address",
                "optional": true,
                "usage": "App functionality, Analytics, Developer communications Advertising or marketing, Account management"
              },
              {
                "name": "User IDs",
                "optional": true,
                "usage": "App functionality, Analytics, Developer communications Advertising or marketing, Fraud prevention, security, and compliance, Personaliation, Account management"
              },
              {
                "name": "Address",
                "optional": true,
                "usage": "App functionality"
              },
              {
                "name": "Phone number",
                "optional": true,
                "usage": "App functionality, Analytics, Advertising or marketing Fraud prevention, security, and compliance, Account management"
              }
            ]
          },
          {
            "main": "Financial info",
            "overall": "User payment info and Purchase history",
            "details": [
              {
                "name": "User payment info",
                "optional": true,
                "usage": "App functionality"
              },
              {
                "name": "Purchase history",
                "optional": true,
                "usage": "App functionality, Analytics, Personalization"
              }
            ]
          },
          {
            "main": "Messages",
            "overall": "Other in-app messages",
            "details": [
              {
                "name": "Other in-app messages",
                "optional": true,
                "usage": "App functionality"
              }
            ]
          },
          {
            "main": "Photos and videos",
            "overall": "Photos and Videos",
            "details": [
              {
                "name": "Photos",
                "optional": true,
                "usage": "App functionality, Personalization"
              },
              {
                "name": "Videos",
                "optional": true,
                "usage": "App functionality"
              }
            ]
          },
          {
            "main": "Files and docs",
            "overall": "Files and docs",
            "details": [
              {
                "name": "Files and docs",
                "optional": true,
                "usage": "App functionality, Analytics"
              }
            ]
          },
          {
            "main": "App activity",
            "overall": "App interactions, In-app search history, Installed apps,Other user-generated content, and Other actions",
            "details": [
              {
                "name": "App interactions",
                "optional": false,
                "usage": "App functionality, Analytics, Advertising or marketing Fraud prevention, security, and compliance, Personalization"
              },
              {
                "name": "In-app search history",
                "optional": false,
                "usage": "App functionality, Analytics, Advertising or marketing Personalization"
              },
              {
                "name": "Installed apps",
                "optional": true,
                "usage": "App functionality"
              },
              {
                "name": "Other user-generated content",
                "optional": true,
                "usage": "App functionality, Analytics, Personalization"
              },
              {
                "name": "Other actions",
                "optional": true,
                "usage": "Analytics, Personalization"
              }
            ]
          },
          {
            "main": "App info and performance",
            "overall": "Crash logs, Diagnostics, and Other app performance data"
            "details": [
              {
                "name": "Crash logs",
                "optional": false,
                "usage": "App functionality, Analytics"
              },
              {
                "name": "Diagnostics",
                "optional": false,
                "usage": "App functionality, Analytics"
              },
              {
                "name": "Other app performance data",
                "optional": false,
                "usage": "App functionality, Analytics"
              }
            ]
          },
          {
            "main": "Device or other IDs",
            "overall": "Device or other IDs",
            "details": [
              {
                "name": "Device or other IDs",
                "optional": false,
                "usage": "App functionality, Analytics, Advertising or marketing Fraud prevention, security, and compliance, Personalization"
              }
            ]
          }
        ]
      }
    }
 

Now, the above information resembles what is currently offered by the app store (Google and Apple): the main key here is the specific piece of data, and then the associations with the usages follow. This chosen information structure causes (intentionally...?) big usability problems, essentially because uses that are dangerous for privacy (like, selling user data) are split into the various data bits and sunken into a myriad of other uses that are of no concern (like uses of the data for the app functionalities). The uses main information structures described later tackles this problem by offering a different data view.

security

The security property contains information about security aspects of the data like transmission (for instance, whether the data is securely transmitted over the internet or not) and storage (for instance, whether the user can ask to delete the data). It is composed by a list of items, each structured via two properties: main is a string value stating the main security property, and description is a textual description of the property.

Following our usual example, this is the security information of the Amazon Kindle app:

    "security": [
      {
        "main": "Data is encrypted in transit",
        "description": "Your data is transferred over a secure connection"
      },
      {
        "main": "You can request that data be deleted",
        "description": "The developer provides a way for you to request that your data be deleted"
      }
    ]
  

uses

When talking about the safety data structure we emphasized the fact that the chosen approach is far from user-friendly: certainly the user can read such information, but it is very hard and time consuming to extract the most relevant part: is my data used for purposes that go beyond the necessary use of the app? is my privacy at risk because my data is sold or used for tracking me and advertising? Intentional or not, the current data structure obfuscates this information, splitting it and sinking it. For this reason, we chose to also offer a different data structure: a different view of the data that completely reverse the current state of affairs, and is based instead on the uses of the data. In other words, whereas the current Google and Apple store structure is based on a relationship that starts from the data and shows its uses, this new data structure reverses the information and provides the relationship between use and data. This allows to extract right away the relevant risk factors for the users, like: what parts of my data are sold? what parts of my data are used to track me and build a profile for advertising?

In practice, the uses property has as subproperties just the uses of the data. Then, each data usage has two subproperties, mandatory and optional, stating what data is actually mandatory and what data is instead optional (and so, subject to the approval of the user).

The mandatory and optional are then structured in the same way: they contain the two properties shared and collected, corresponding to the data that is shared with third parties and to the data that is collected by the app developer. In turn, shared and collected then contain a list with all the data.

Going back to our example of the Amazon Kindle app, this is the corresponding data structure of the uses:

    "uses": {
      "App functionality": {
        "mandatory": {
          "shared": [
            "User IDs",
            "User payment info",
            "Other in-app messages"
          ],
          "collected": [
            "App interactions",
            "In-app search history",
            "Crash logs",
            "Diagnostics",
            "Other app performance data",
            "Device or other IDs"
          ]
        },
        "optional": {
          "collected": [
            "Approximate location",
            "Precise location",
            "Name",
            "Email address",
            "User IDs",
            "Address",
            "Phone number",
            "User payment info",
            "Purchase history",
            "Other in-app messages",
            "Photos",
            "Videos",
            "Files and docs",
            "Installed apps",
            "Other user-generated content"
          ]
        }
      },
      "Account management": {
        "mandatory": {
          "shared": [
            "User payment info"
          ]
        },
        "optional": {
          "collected": [
            "Name",
            "Email address",
            "User IDs",
            "Phone number"
          ]
        }
      },
      "Analytics": {
        "mandatory": {
          "shared": [
            "Crash logs"
          ],
          "collected": [
            "App interactions",
            "In-app search history",
            "Crash logs",
            "Diagnostics",
            "Other app performance data",
            "Device or other IDs"
          ]
        },
        "optional": {
          "collected": [
            "Name",
            "Email address",
            "User IDs",
            "Phone number",
            "Purchase history",
            "Files and docs",
            "Other user-generated content",
            "Other actions"
          ]
        }
      },
      "Advertising or marketing": {
        "optional": {
          "collected": [
            "Name",
            "Email address",
            "User IDs",
            "Phone number"
          ]
        },
        "mandatory": {
          "collected": [
            "App interactions",
            "In-app search history",
            "Device or other IDs"
          ]
        }
      },
      "Fraud prevention": {
        "optional": {
          "collected": [
            "Name",
            "User IDs",
            "Phone number"
          ]
        },
        "mandatory": {
          "collected": [
            "App interactions",
            "Device or other IDs"
          ]
        }
      },
      "security": {
        "optional": {
          "collected": [
            "Name",
            "User IDs",
            "Phone number"
          ]
        },
        "mandatory": {
          "collected": [
            "App interactions",
            "Device or other IDs"
          ]
        }
      },
      "compliance": {
        "optional": {
          "collected": [
            "Name",
            "User IDs",
            "Phone number"
          ]
        },
        "mandatory": {
          "collected": [
            "App interactions",
            "Device or other IDs"
          ]
        }
      },
      "Personalization": {
        "optional": {
          "collected": [
            "Name",
            "User IDs",
            "Purchase history",
            "Photos",
            "Other user-generated content",
            "Other actions"
          ]
        },
        "mandatory": {
          "collected": [
            "App interactions",
            "In-app search history",
            "Device or other IDs"
          ]
        }
      },
      "Developer communications": {
        "optional": {
          "collected": [
            "Email address",
            "User IDs"
          ]
        }
      }
    }
  

Now, despite the length (there is the same information of the safety property) we can appreciate the big difference: the user can for example extract right away what data goes beyond the needs of the app itself just by looking at the Advertising or marketing part:

    "Advertising or marketing": {
        "optional": {
          "collected": [
            "Name",
            "Email address",
            "User IDs",
            "Phone number"
          ]
        },
        "mandatory": {
          "collected": [
            "App interactions",
            "In-app search history",
            "Device or other IDs"
          ]
        }
      }
  

As you can see, this crucial information is not split and sunken any more, but it has been taken out of the mud and distilled into primary information: the user can now see right away what data is collected by the developer for uses that are not pertinent to the app itself, and distinguish between data that is abused altogether (the mandatory part) and data that is going to be abused in case there is consent (the optional part).

scores

The scores information contains information coming from scoring measures. This allows modules to add scoring information that is relative to any particular measure of interest, information that can be reused in various way (for instance to find what are the most dangerous, or the less risky apps, and so on).

In particular, scores contains a list of items, each containing a property measure containing information about a scoring measure. measure then contains three subproperties: id (a string with the unique identifier of the measure), score (a float with the numerical value of the measure) and rank (the rank of the measure). rank, in turn, then contains the two subproperties global and category, each containing an integer value with the rank of the app at global scale (in the whole dataset) and at local scale (among the apps belonging to the same main category) respectively.

Using the Amazon Kindle app and two measures from the MoboSearch project we have for instance the following piece of information:

    "scores": [
      {
        "measure": {
          "id": "org.mobosearch.scores.champion",
          "score": 0.0001068115234375,
          "rank": {
            "global": 66,
            "category": 2
          }
        }
      },
      {
        "measure": {
          "id": "org.mobosearch.scores.aware",
          "score": 0.0000457763671875,
          "rank": {
            "global": 39,
            "category": 2
          }
        }
      }
    ]
  }