Skip to content

Syncing Data from Github

Overview

In this example we will look at how we can use the Rest API to keep Components and References in sync with data derived from Github.

  1. We will create a workspace in Ardoq that models Github users and repositories as components and contributions (commits) as references.

  2. We will write a Python script that can take the name of a Github organization, find all listed (public) repositories, the Github users who have contributed to them and either create or update components and references in Ardoq.

Important

To make this example as simple as possible we will only use Github API calls that do not require authentication. Depending on the the number of repositories that an organization has (or the number of times you run the script), you may experience a 403: rate limit exceeded error when making requests to the Github API. The script has an optional --github-token flag that allows you to provide a valid Github Token to the script. Doing so should increase the limit. For more information about the Github API you should consult the official documentation.

Creating the Workspace

We create a new workspace using the Blank Workspace template. We create two component types GithubRepo and GithubUser.

Model

Once the workspace has been created we create a new reference type with the name Contributed to. You can style it however you want. We will use this reference to link GithubUser components to GithubRepo components. As we want to track the number of contributions that a user has made to a repository, we create a custom field on the Contributed to reference type called contributions of type Number.

Writing the Script

The following Python code does not require any external dependencies, however you should have downloaded the example client. We will create a new file called github_example.py in the same directory as the downloaded example_ardoq_client.py file. You can download thea copy of the github_example.py script here.

Setup

The initial setup ensures that we can fetch data from Github. We use the argparse library to keep the script maintainable and cross platform.

import example_ardoq_client
import argparse
import json
import urllib.request
import urllib.parse
import pprint
import time

parser = argparse.ArgumentParser()

parser.add_argument(
    "--host",
    metavar="ARDOQ_API_HOST",
    type=str,
    help="Required if you are using a custom domain",
)
parser.add_argument(
    "--org",
    metavar="ARDOQ_ORG_LABEL",
    type=str,
    help="The label associated with your org",
)
parser.add_argument(
    "--token", metavar="ARDOQ_API_TOKEN", type=str, help="Your secret API token"
)
parser.add_argument(
    "--github-token", metavar="GITHUB_TOKEN", type=str, help="Optional Github API token"
)
parser.add_argument(
    "github_org", metavar="GITHUB_ORG", type=str, help="Name of your Github org"
)
parser.add_argument(
    "workspace",
    metavar="WORKSPACE",
    type=str,
    help="Ardoq identifier of your workspace",
)

args = parser.parse_args()

print("Using Workspace: {}".format(args.workspace))
print("Using Github Org: {}".format(args.github_org))

We write two helper functions for making requests to the Github API.

def repos():
    req = urllib.request.Request(
        "https://api.github.com/orgs/{}/repos".format(args.github_org)
    )
    if args.github_token:
        req.add_header("Authorization", "Bearer {}".format(args.github_token))
    with urllib.request.urlopen(req) as resp:
        repos = json.loads(resp.read().decode("utf-8"))
        return list(filter(lambda repo: not repo["fork"], repos))


def contributors(repo_name):
    req = urllib.request.Request(
        "https://api.github.com/repos/{}/{}/contributors".format(
            args.github_org, repo_name
        )
    )
    if args.github_token:
        req.add_header("Authorization", "Bearer {}".format(args.github_token))
    with urllib.request.urlopen(req) as resp:
        if resp.getcode() == 200:
            return json.loads(resp.read().decode("utf-8"))
        return []

Fetching Data from Ardoq

When we created the Workspace we used "GithubRepo" and "GithubUser" as the component type names and "Contributed to" as the reference type name. Internally, these types have immutable ids. When working with the API it is (often) required to use these internal ids. Luckily, we can use the workspace context to lookup the ids for the component and reference type names.

api = example_ardoq_client.API(
    ardoq_api_host=args.host, ardoq_org_label=args.org, ardoq_api_token=args.token
)

github_workspace = args.workspace

ctx = api.read_workspace_context(github_workspace)

comp_types = {t["name"]: t["typeId"] for t in ctx["componentTypes"]}

ref_types = {t["name"]: t["type"] for t in ctx["referenceTypes"]}

Because we want to be able to run our script multiple times, we need to know which repositories and users have already been stored in Ardoq. The easiest approach is to simply fetch all of the components and references that belong to our github_workspace. Because we know that we are only interested in components of type GithubRepo or GithubUser and reference of type Contributed to we can make three separate requests that guarantees that we wont fetch Ardoq data that will never be needed.

We create a Python dictionary mapping Github repository names to Ardoq Components. We do this by using the results of a list components request. We filter for only those components that belong to the github_workspace and have a typeId with name "GithubRepo".

repo_components = api.list_components(
    {"rootWorkspace": github_workspace, "typeId": comp_types["GithubRepo"]}
)

repo_lookup = {c["name"]: c for c in repo_components}
We create a Python dictionary mapping Github user names to Ardoq Ids. We do this by using the results of a list components request. We filter for only those components that belong to the github_workspace and have a typeId with name "GithubUser".
user_components = api.list_components(
    {"rootWorkspace": github_workspace, "typeId": comp_types["GithubUser"]}
)

user_id_lookup = {c["name"]: c["_id"] for c in user_components}
Finally we create a Python dictionary mapping pairs of Ardoq ids representing the source (GithubUser) and target (GithubRepo) to Ardoq References. We do this by using the results of a list references request. We filter for only those references that belong to the github_workspace and have a type with name "Contributed to". Note that we have assumed that the source and target pair is unique.
contrib_references = api.list_references(
    {"rootWorkspace": github_workspace, "type": ref_types["Contributed to"]}
)

contrib_lookup = {(r["source"], r["target"]): r for r in contrib_references}

Computing the (Batch) Payload

Now that we know what is currently in Ardoq, we can traverse the Github data and build up a single Ardoq Batch request.

batch = example_ardoq_client.Batch()

org_repos = repos()
total_org_repos = len(org_repos)

for index, repo in enumerate(org_repos, 1):
    if not args.github_token:
        time.sleep(1)
    repo_name = repo["name"]
    qualified_repo_name = args.github_org + "/" + repo_name
    print("{}/{} : {}".format(index, total_org_repos, qualified_repo_name))
    repo_component = repo_lookup.get(qualified_repo_name)
    if repo_component:
        repo_id = repo_component["_id"]
        if repo["description"] != repo_component["description"]:
            batch.update_component(repo_id, {"description": repo["description"]})
    else:
        repo_id = qualified_repo_name
        batch.create_component(
            {
                "rootWorkspace": github_workspace,
                "typeId": comp_types["GithubRepo"],
                "name": qualified_repo_name,
                "description": repo["description"],
            },
            batchId=repo_id,
        )

    for user in contributors(repo_name):
        user_name = user["login"]
        user_id = user_id_lookup.get(user_name)
        if user_id is None:
            user_id = "user-" + user_name
            user_id_lookup[user_name] = user_id
            batch.create_component(
                {
                    "rootWorkspace": github_workspace,
                    "typeId": comp_types["GithubUser"],
                    "name": user_name,
                },
                batchId=user_id,
            )

        contrib_reference = contrib_lookup.get((user_id, repo_id))

        n = user["contributions"]
        displayText = "{} commit{}".format(n, "s" if n > 1 else "")

        if contrib_reference:
            if contrib_reference.get("customFields", {}).get("contributions") != n:
                batch.update_reference(
                    contrib_reference["_id"],
                    {
                        "displayText": displayText,
                        "customFields": {"contributions": n},
                    },
                )
        else:
            batch.create_reference(
                {
                    "source": user_id,
                    "target": repo_id,
                    "type": ref_types["Contributed to"],
                    "displayText": displayText,
                    "customFields": {"contributions": n},
                }
            )
We are now ready to make the Ardoq API request using the batch data.
print("Payload -------------------------------------")
print(json.dumps(batch.body))
print("---------------------------------------------")

if batch.is_empty():
    print("No change detected... nothing to do")
else:
    resp = api.batch(batch.body)
    pprint.pprint(resp)

Running the script

Make sure that you have the id of the workspace created in step one. The id is the Ardoq identifier in the url. In our case the workspace id is 95a2a80b31ec6ce5c06d629e.

https://app.ardoq.com/app/.../workspace/95a2a80b31ec6ce5c06d629e?...

Assuming that you have an API token <token> that grants you access to myorg hosted on a custom domain myorg.ardoq.com:

python3 github_example.py --host https://myorg.ardoq.com --token <token> <github-org> 95a2a80b31ec6ce5c06d629e

Should display something similar.

Using Workspace: 95a2a80b31ec6ce5c06d629e
Using Github Org: <github-org>
----------------------------------------------------
Using Ardoq host: https://myorg.ardoq.com
Using API token ending: ...XXX
Using Org Label: <NOT PROVIDED>
----------------------------------------------------
...

If you are not using a custom domain (ie your Ardoq instance is hosted on app.ardoq.com) then you could use the following

python3 github_example.py --org myorg --token <token> <github-org> 95a2a80b31ec6ce5c06d629e
Which would display
Using Workspace: 95a2a80b31ec6ce5c06d629e
Using Github Org: <github-org>    
----------------------------------------------------
Using Ardoq host: https://myorg.ardoq.com
Using API token ending: ...XXX
Using Org Label: myorg
----------------------------------------------------
...

If you have the Workspace open in Ardoq, then you will see your components appear!