Content save options
Video tutorial

Agent content save options (01:49)
Agent directory
By default, the agent saves content in the <Agent Directory>/content folder (<Agent Directory> is the folder where the agent jar file is saved); for example: C:\Agent\content.
To save files in a different folder:
-
On the Manage Content Profile page, click the Content Save Options tab.
-
Click Custom directory and specify the path to the folder where you want to save content.
Important
|
|
-
If you plan to move the content downloaded by your agent to a remote/network drive, we recommend downloading the content to a local folder first, and then using a script to move the items to the remote/network drive. This will help prevent write permissions issues in the agent.
|
|
|
|
Content file naming format
-
<ItemID>--<Version>--<ContentID>.<ext> can be used to correlate downloaded content item components with their specific metadata in an item metadata file.
-
<ItemID> is a unique ID that remains the same throughout all revisions of the content item; for example, all stories that have the same item ID are part of the same 'story chain'. <ItemID> corresponds to the "altids.itemid" property in the JSON feed.
-
<Version> is the content item version number: typically 0 for the initial version, 1 for the first revision, 2 for the second revision and so on. The higher the number, the more recent the content item's version. <Version> corresponds to the "version" property in the JSON feed. Together with <ItemID>, <Version> may be useful for placing a story in the specific location in the 'story chain' if you are tracking all versions of the story for news management.
-
<ContentID> is a code that identifies a media rendition and corresponds to the "contentid" property in the JSON feed.
-
<ext> is a file name ending that designates the file format and typically consists of three alphanumeric characters (for example, .xml, .jpg, .pdf and .mpg).
-
<OriginalFileName>--<PartialContentID>.<ext> can be used to save files with original file names plus the last five digits of the <ContentID>; for example, EXAMPLE.JPEG--e68c0.jpg. This option helps prevent overwriting files saved with original file name. Files that have no original file name are saved with the file name in the format <ContentID>.<ext>.
|
Tip
|
|
|
Select the <OriginalFileName>--<PartialContentID>.<ext> option along with the JSON: Item Files and/or XML: Item Files (NewsML-G2) options on the Content Options tab to save item metadata files for video using the original file name of the associated video rendition if available.
If there is an original file name associated with the video rendition, the original video file name is applied to the saved JSON and/or NewsML-G2 item metadata files. For example, if the originalfilename value for the video rendition is 4329116_Portugal Virus UK Travel_0_Preview.mp4, the JSON item metadata file name would be 4329116_Portugal Virus UK Travel_0-item.json, and the NewsML-G2 item metadata file name would be 4329116_Portugal Virus UK Travel_0-item.xml.
|
|
|
|
|
Content folder structure
One folder for all content
All content is saved to the folder used for saving downloaded content (for example, to C:), and no subfolders are created:

One folder per entitlement
One folder is created per product, package or Followed Topic, but no subfolders are created for individual content item versions; for example:

One folder per entitlement and one subfolder per version
Tip
|
|
This folder structure option is useful if you are downloading AP Top Headlines or if you would like to save each story version with its linked media (optionally, including JSON metadata files for this story version and each linked media item) in a separate subfolder.
|
|
|
|
The agent creates one folder per entitlement below the folder used for saving downloaded content (for example, under C:) and subfolders for individual versions in each entitlement folder; for example:

Note
|
|
The format of the version folder name is <ItemID>-<Version>.
|
|
|
|
For AP Top Headlines, the agent creates one folder per entitlement (for example, AP Online Top General Headlines) below the folder used for saving downloaded content (for example, under C:) and subfolders for individual versions of each Top Headline story in each entitlement folder:

Tips
|
|
-
If you are interested in saving JSON metadata files for AP Top Headline stories, make sure to select the JSON: Associated Item Files check box under Metadata on the Content Options tab (since AP Top Headline stories are considered associated items of the Top Headline package, the agent treats them the same as linked media when downloading JSON metadata).
-
Make sure to select Save all duplicates under Duplicate Settings on the Content Options tab to save linked media in each subfolder even if the linked media file has already been downloaded with the previous story version.
|
|
|
|
Duplicate settings
About duplicates
Duplicate content is content that has been ingested more than once within a 24-hour period (the standard news cycle).
Duplicate content may be delivered for a variety of reasons; for example:
-
AP Top Headlines are filed multiple times throughout the day, often with the same stories.
-
AP editors may file the same story for print and online use.
-
The same story or media may appear in multiple entitlements (products, packages or saved searches).
-
Stories may share linked media; for example, the same picture may be linked to two different stories about the same news event.
Filtering out duplicates
By default, the agent does not ingest duplicate content.
Saving duplicates
On the Content Save Options tab, scroll down to the Duplicate Settings section and select one of these options:
|
Tip
|
|
This option is useful if you are saving content for each entitlement in a different folder using the One folder per entitlement option on the Content Save Options tab.
|
|
|
|
Tip
|
|
This option is useful if you are saving content for each entitlement in a different folder using the One folder per entitlement option on the Content Save Options tab.
|
|
|
Content deletion
Important
|
|
-
It is strongly recommended not to save content in any system folders.
-
It is not recommended to save content to your Desktop.
|
|
|
|
Deleting downloaded content files automatically after a certain time from the directory where they are saved cleans up older files, preventing them from exhausting disk space. When content file deletion is enabled, the agent deletes content, log, item metadata and JSON feed files according to the specified settings.
Note
|
|
By default, the agent does not delete older files automatically.
|
|
|
|
Enabling content file deletion
On the Content Save Options tab, select one of these options under Content Deletion:
-
Delete after 24 hours
-
Delete after 48 hours
-
Delete after 7 days