Last Updated: February 25, 2016
·
771
· exallium

Querying Google Sites API can result in partial results.

The Google Sites API, under ContentFeed, reads that getting entities off a base search (no parent, no path, just kinds) returns the "latest" content on the site. There is no other documentation as to what this "latest" term is, how they figure out what counts as latest, etc.

This resulted in a project having a large delay, due to the fact that our searches were not returning all of the required results. A small change to the application architecture later, and we were good again.

The problem was that there were a few hundred "announcement" pages not being returned by a general search only on kinds (in this case, "announcement", "announcementspage", "listpage", and "webpage".) Removing "announcement" from this initial list and then querying for announcements given an entity set as the parent forced Google's side to not use this undocumented heuristic, and instead return full results as expected in the first place.

Hopefully this note saves someone several hours of frustration.