Skip to Main Content
Optimizely CAB Portal
33 VOTE
Status Future consideration
Created by Guest
Created on May 18, 2021

Scheduled re-import of indexed content

As a site owner using Content Recommendations,

I would like the ability to schedule re-import of already indexed content in Content Recs,

so that:

a) there would be no need for manually triggering a re-scan, e.g. if we add or change metatag or data attributes to pages.

b) content that has been unpublished/deleted from the site would be purged from the Content Recs index.

c) Images are re-scraped from the source pages - very useful if editors have changed or unpublished the original image since the page was last scraped.

  • Attach files
  • Guest
    Reply
    |
    May 8, 2023

    Perhaps a better approach would be pass a 'modified' time meta attribute.

    ...

    <meta property="article:published_time" content="2019-10-18T15:19:00.000-04:00" />

    <meta property="article:modified_time" content="2023-01-18T15:19:00.000-04:00" />

    ...

    The scheduled re-import process would only re-index pages that have been modified since the last time it was scanned.


  • Guest
    Reply
    |
    Dec 7, 2021

    A note on this. I acknowledge that the initial scraping currently is resource-intensive, since it performs a full NLP analysis of the page content. However, a NLP re-scan would NOT be necessary for this suggested re-import - just a superficial re-import of the factual properties like title, (canonical) URL, metatags, image.

  • Guest
    Reply
    |
    Jun 4, 2021

    Thank you for your input. We will investigate the idea and see whether it's something we can address in a timely manner.