Skip to main content

Google BigQuery integration

Analyze survey data in your BigQuery data warehouse

Written by Agnieszka

Integrating Google BigQuery and Survicate allows you to collect and automatically send survey responses directly to your BigQuery data warehouse to store, query, and analyze user insights.

By integrating Survicate with BigQuery, you can:

  • Gain in-depth insights: Uncover patterns, identify trends, and get actionable insights from your survey data. For example, you can evaluate the effectiveness of your marketing campaigns by marrying survey responses with campaign performance data.

  • Simplify data management: Survicate structures and formats survey data automatically, making it ready for immediate use in BigQuery. The integration saves your time and effort in data preparation and lets you focus on more important tasks, like prioritizing feature requests by consolidating and analyzing the feedback received from your users.

  • Access fresh data: Your survey data is synced automatically every hour, so you always have up-to-date insights for timely analysis and decision-making. With regularly refreshed data, you can dynamically create targeted customer segments based on the most recent survey responses and other relevant data.

  • Benefit from improved security and efficiency: As a native integration, it ensures a fast connection, robust security, and guarantees no missed survey responses. Better efficiency means you can run thorough market research and competitive analysis by integrating survey responses with market trends and competitor insights.

In this article, you'll learn:

📌 Please note that you must have a billing (paid) account in BigQuery to connect it to Survicate.

How Survicate & BigQuery Work Together

Survicate collects and sends survey data directly to your BigQuery data warehouse. This data includes all responses to your surveys, respondent details, and other metadata, which can then be analyzed using BigQuery's powerful SQL-like queries.

Setting Up Survicate & BigQuery Integration

To store your Survicate data in BigQuery, complete the following steps:

  1. Create a project and enable BigQuery

  2. Create a service account for Survicate

  3. Connect BigQuery integration in Survicate

📌 BigQuery integration is available on some of our paid plans. If you can't access it on your current plan and would like to use it, reach out to our team.

Step 1. Create a Project and Enable BigQuery *

To create a project and enable BigQuery:

  1. Navigate to the Google Developers Console.

  2. Configure the Google Cloud Platform

    1. If you don’t have a project already, create one.

    2. If you have an existing project, enable the BigQuery API. Once you’ve done so, you should see BigQuery in the “Resources” section of Cloud Platform.

3. Copy the project ID. You’ll need it when connecting BigQuery in the Survicate panel.

When you create your project, you must enable billing so Survicate can successfully write data.

* if you don't have it yet

Step 2. Create a service account for Survicate

To create a service account for Survicate:

  1. In Google BigQuery, from the Navigation panel on the left, select IAM & admin ➡️ Service accounts.

  2. Click Create Service Account.

  3. Enter a name for the service account (for example, survicate-warehouse) and click Create.

  4. Assign the service account the following roles:

    • BigQuery Data Owner (You would need to create a new dataset named “survicate” (or a different customized name) in case you want to add BigQuery Data Owner permission only to a single dataset.)

    • BigQuery Job User

  5. Create a JSON key. The downloaded file will be used to connect the integration in the Survicate panel.

If you have trouble creating a new service account, refer to Google Cloud’s documentation about service accounts for more information.

Step 3. Connecting BigQuery in Survicate

Here's how to connect BigQuery in Survicate.

  1. Find the Google BigQuery on the Integrations list, click Connect:

  2. Fill out two required fields: Project ID, Credentials, and optionally Customize dataset name (if your dataset name is different than "survicate") or provide Location:

💡 Here's where to find the Project ID in Google BigQuery:

Follow the above instructions to create a dedicated project for Survicate data. Here's where you can find it in Google BigQuery:

💡Credentials:

Follow the above instructions to generate a dedicated .json file with credentials. Here's how the file looks after the download:

Seeing survey data in BigQuery

Once you have Survicate and BigQuery integrated, Survicate collects survey data and sends it periodically (every hour) to your BigQuery data warehouse.

Survey data is structured and formatted to be compatible with BigQuery's analysis tools, ensuring immediate usability. The integration will consistently sync all survey data, both old and new, maintaining an up-to-date data set in Google BigQuery.

Survicate will create 6 tables in your BigQuery project:

  • raw_respondents

  • raw_response

  • response

  • respondents

  • survicate_sync_history

  • survicate_sync_info

Tables with the raw_ prefix are used to sync data between Survicate and BigQuery. They can contain duplicates, as we sent all created and updated data there.

After each sync, we run a query job to merge data from raw_ tables to the response and respondents tables. ❗ All of your analysis should be done on the response and respondents tables without the raw_ prefix.

The survicate_sync_history and survicate_sync_info tables are system-managed tables that record the status and details of each sync run. They are not intended for analysis, they exist to help you monitor the integration and, if needed, trigger automated pipelines only after a sync has fully completed.

💡 If you run automated data pipelines that depend on Survicate data (for example, an Airflow DAG), you can query survicate_sync_history to check whether the latest sync has completed before triggering your pipeline. A simple query filtering for status = 'succeeded' and the most recent finished_at will tell you whether fresh data is ready.

Data structure in the response table:

  • workspace_id - identifier of the workspace to which the survey belongs

  • survey_id - survey identifier

  • survey_name - the name of your survey as in the Survicate panel

  • uuid - response UUID

  • respondent_uuid - visitor UUID of the visitor who submitted the response

  • collected_at - timestamp of when a response was collected

  • updated_at - timestamp of when a response was updated in the BigQuery table

  • url - URL of the page where the survey was displayed (empty string if null)

  • user_agent - browser or system details of where the response was collected

  • language - filled only if the survey has translations

  • questions - an array of the questions with answers and the question number from your survey (see below)

Each question contains

  • question_id - Question identifier

  • question_order_number - the ordinal number of the question in your survey, e.g., if it’s the first question of the survey, the second one, etc.

  • body - the question text (empty string if null)

  • type - the question type

  • answers - the array of answers (see below)

Each answer contains:

  • id - Answer/visit point ID

  • Insight - for answers that contain insights

  • values - the array of answer values with:

    • label - the answer label

    • value - the answer value


Data structure in the respondents table:

  • workspace_id - the identifier of the workspace to which the survey belongs

  • respondent_uuid - visitor UUID from Survicate

  • updated_at - timestamp of when the respondent was updated in the BigQuery table

  • custom_attributes - the array of custom attributes (see below)

Each custom attribute contains:

  • name - the custom attribute name

  • value - the custom attribute value


Data structure in the survicate_sync_history table:

survicate_sync_history records one row per sync run. You can use it to check whether the latest sync completed successfully and when.

  • sync_run_id - unique identifier for one Survicate sync attempt

  • started_at - timestamp when the sync run started

  • finished_at - timestamp when the sync run ended (empty if the sync is still running)

  • status - current status of the sync run: running, succeeded, or failed

  • rows - total number of rows written to raw tables during this sync run


Data structure in the survicate_sync_info table:

survicate_sync_info records one row per entity (responses and respondents) per sync run. It gives you a more granular view of each sync run, broken down by entity.

  • sync_info_id - unique identifier for this sync-info record

  • sync_run_id - links this record to the corresponding row in survicate_sync_history

  • entity_name - the entity processed in this run: responses or respondents

  • rows - number of rows written to the raw table for this entity

  • started_at - timestamp when processing of this entity started

  • finished_at - timestamp when processing of this entity finished

FAQ

How often is data synchronized for the Survicate - BigQuery integration?

We sync data in batches, once per hour.

📞 If you have any questions or need assistance - feel free to reach out to our team via chat or email: support@survicate.com.

Did this answer your question?