Performing Bulk Data API Calls¶
The Khoros Communities Bulk Data API is a very useful tool for retrieving analytical data for your community, and this SDK can be leveraged to query the Bulk Data API using Python.
See also
For additional information on how to leverage the Bulk Data API, refer to the Khoros Developer Documentation.
Note
This guide assumes that the khoros.core.Khoros
object has been instantiated
with the khoros
variable name, as illustrated in the snippet below.
>>> from khoros import Khoros
>>> khoros = Khoros(helper='helper.yml')
This guide covers the following topics:
Connecting to the API¶
In order to connect to the Bulk Data API, you will need to sign into the Community Analytics (formerly called Lithium Social Intelligence or LSI) user interface, click on your username in the top-right corner and retrieve your connection information. This information includes the following:
Community ID
Client ID
Access Token
There are then two ways that you can supply this information in the Python SDK, which are covered in the sections below.
During Instantiation¶
When instantiating the core object, you can supply the Bulk Data API connection information using the
bulk_data_settings
parameter, as shown below.
>>> bulk_data_settings = {
'community_id': 'example.prod',
'client_id': 'ay0CXXXXXXXXXX/XXXX+XXXXXXXXXXXXX/XXXXX4KhQ=',
'token': '2f25XXXXXXXXXXXXXXXXXXXXXXXXXa10dec04068',
}
>>> khoros = Khoros(defined_settings=settings, bulk_data_settings=bulk_data_settings, auto_connect=False)
Using a Helper File¶
Similar to how a helper file can be used to connect to the standard Community APIs, a helper file can be used to supply the connection information for the Bulk Data API. If the helper file is supplied in YAML format then it will appear similar to the example below.
# Define how to obtain the connection information
connection:
# Bulk Data API connection information
bulk_data:
community_id: example.prod
client_id: ay0CXXXXXXXXXX/XXXX+XXXXXXXXXXXXX/XXXXX4KhQ=
token: 2f25XXXXXXXXXXXXXXXXXXXXXXXXXa10dec04068
europe: no
Querying the Bulk Data API¶
When performing queries against the Bulk Data API, you must provide a “From Date” and a “To Date” and can query up to 7 days worth of data at one time. You also have the ability to export the data in JSON or CSV format.
As such, assuming you are using a helper file to authenticate (as explained in the previous
section), you will be leveraging the from_date
, to_date
, and export_type
parameters
with the khoros.core.Khoros.BulkData.query()
method.
For example, if you wished to capture data between October 25, 2022, and November 1, 2022, and wished to export the data in JSON format, then you would use syntax similar to what is shown below. The example below also demonstrates how you would export the results to a JSON file.
import json
from khoros import Khoros
# Instantiate the khoros object
khoros = Khoros(helper='helper.yml')
# Perform the Bulk Data API query
results = khoros.bulk_data.query(from_date='20221025', to_date='20221101', export_type='json')
# Export to a JSON file
with open('path/to/bulk_data_export.json', 'w') as file:
json.dump(results, file, indent=2)
Manipulating Retrieved Data¶
After querying the Bulk Data API, there are several ways you can easily manipulate the data you retrieved if you exported the results in JSON format. These options are explained below.
Filtering by User Type¶
When viewing your data, you may wish to pare the data down to only logged-in users, or perhaps
only anonymous users. This can be done using the
khoros.core.Khoros.BulkData.filter_anonymous()
method.
By default, the method will remove all anonymous users and retain only data for logged-in users.
However, you can leverage the remove_registered
Boolean parameter filter out logged-in users
instead and keep only the anonymous user data.
# The default parameters will remove anonymous user data
filtered_data = khoros.bulk_data.filter_anonymous(bulk_data)
# This will also remove anonymous user data
filtered_data = khoros.bulk_data.filter_anonymous(bulk_data, remove_anonymous=True)
# This will remove all logged-in user data
filtered_data = khoros.bulk_data.filter_anonymous(bulk_data, remove_registered=True)
Filtering by Action¶
If you are familiar with the action.key events
then you can filter the data for only entries with that specific action using the
khoros.core.Khoros.BulkData.filter_by_action()
method, as demonstrated below.
# This will filter for only events relating to creating posts
filtered_data = khoros.bulk_data.filter_by_action('messages.publish', bulk_data)
# This will filter for only events relating to messages marked as an accepted solution
filtered_data = khoros.bulk_data.filter_by_action('solutions.accept', bulk_data)
Counting Actions¶
If you just wish to count the number of times a specific event is found within your data
and do not need the raw data, then you can use the
khoros.core.Khoros.BulkData.count_actions()
method and supply the action.key
value that you wish to count.
accepted_solution_count = khoros.bulk_data.count_actions(bulk_data, 'solutions.accept')
Counting Logins and Views¶
Two of the more common events (logins and views) have their own methods, which means you
won’t need to remember their action.key
values. These methods are
khoros.core.Khoros.BulkData.count_logins()
and
khoros.core.Khoros.BulkData.count_views()
, respectively, and they are demonstrated
below.
# This returns the number of logins as an integer
num_logins = khoros.bulk_data.count_logins(bulk_data)
# This returns the number of views as an integer
num_views = khoros.bulk_data.count_views(bulk_data)