Overview
Integrating git repos in a data engineering platform can help speed up development cycles and CICD workflows dramatically. In this walkthrough, I’ll show how to setup a repo hosted on github to interact with your snowflake environment.
Setup
Github
Let’s assume we have a git repo in mind with the following URL https://github.com/bstuddard/snowflake_demo_repo.git. We’ve already cloned this locally and are able to make commits/pushes to the github repo.
Snowflake workspaces
To test this in Snowflake that we are able to see the same files, the first thing we want to do is clone the repo to our user workspace. This will be somewhat akin to a development environment, where we can make changes to files, make commits, test things out, etc. This resource has some good examples. At a high level:
-
Setup an api integration, more information here:
CREATE OR REPLACE API INTEGRATION interactive_git_api_integration API_PROVIDER = git_https_api API_ALLOWED_PREFIXES = ('https://github.com/') API_USER_AUTHENTICATION = ( TYPE = SNOWFLAKE_GITHUB_APP ) ENABLED = TRUE; -
Confirm integration created. These are account level objects, but you can still confirm creation:
SHOW INTEGRATIONS; - Create a new workspace and choose the git repo option.
- Click the changes tab and set your name/email for commit purposes if needed.
- Select the correct branch and make sure you can see your files.
Snowflake repo clone
In order to run scripts against these files, we also need to register the repo as a git object:
-
Generate a GitHub Personal Access Token:
- Go to GitHub → Settings → Developer Settings → Personal Access Tokens
- Create token with ‘repo’ permissions (or ‘Contents: Read/Write’ for fine-grained tokens)
- Copy the token (starts with
ghp_)
- Login as an admin
-
Create a secret with the github token for auth purposes:
--Select db and schema first... CREATE OR REPLACE SECRET git_clone_token_secret TYPE = password USERNAME = 'username_example' PASSWORD = 'ghp_token'; -
Validate secret was created:
-- Show all secrets in current schema SHOW SECRETS;-- Describe the specific secret (won't show actual credentials for security) DESCRIBE SECRET git_clone_token_secret; -
Create an API integration to use:
CREATE OR REPLACE API INTEGRATION git_clone_api_integration API_PROVIDER = git_https_api API_ALLOWED_PREFIXES = ('https://github.com/your-username') -- Replace with your actual GitHub username ALLOWED_AUTHENTICATION_SECRETS = (git_clone_token_secret) ENABLED = TRUE; -
Create the git repository clone:
CREATE OR REPLACE GIT REPOSITORY git_clone_repo API_INTEGRATION = git_clone_api_integration GIT_CREDENTIALS = git_clone_token_secret ORIGIN = 'https://github.com/your-username/your-repo-name.git'; -- Replace with your actual repo URL
Testing
Basic File Changes
To test things out, first let’s make sure we can see the files:
SHOW GIT REPOSITORIES;
DESCRIBE GIT REPOSITORY git_clone_repo;
LS @git_clone_repo/branches/main/;Next, make a change to one of the files and let’s see if it comes through the workspace and actual repo. In order to refresh the workspace, you’ll need to go to the workspace view -> Changes -> Pull -> Fetch All.
In order to fetch the recent changes in the repo clone, execute the following:
ALTER GIT REPOSITORY git_clone_repo FETCH;You should now see your updated files after re-running the LS command.
Job Runs
To test out job runs, create a simple SQL file and execute the following:
EXECUTE IMMEDIATE FROM @git_clone_repo/branches/main/admin_scripts/test_repos.sqlConclusion
You now have both:
- Git Workspace: Interactive development environment for editing and committing
- Git Repository Clone: Programmatic access for data pipelines and automation
This dual setup enables a complete DevOps workflow - develop in the workspace, deploy via the repository clone!
All examples and files available on Github.