(Redirected from LOCKSS Daemon)
[edit] Recent Releases
[edit] Daemon 1.52.3
- Features
- Content can be collected significantly faster without overloading publishers, by collecting static and dynamic content at different rates. Plugins may specify rates according to MIME type or URL pattern:
- au_mime_rate_limiter_map: a map from MIME-type (or comma-separated list of MIME-types) to rate string.
- au_url_rate_limiter_map: a map from URL regexp to rate string.
- The number of simultaneous crawls per plugin/publisher/site can be set with org.lockss.crawler.concurrentCrawlLimitMap, a map from concurrentCrawlPoolKey (fetchRateLimiterKey) to allowed number of simultaneous crawls.
- The interval at which the daemon checks for new content can be set per AU, by adding the param nc_interval to the AUs config in the title DB. Value should be a time interval (eg, 2w to check every two weeks). This must be set when the AU is first configured; the next release will provide a way to change the recrawl frequency for existing AUs.
- For multi-homed machines, the source address of crawls can be set with org.lockss.crawler.crawlFromAddr, or to the daemon's configured local address by setting org.lockss.crawler.crawlFromLocalAddr = true.
- If the LCAP socket is bound to the local address (with org.lockss.scomm.bindToLocalIpOnly = true, messages are by default sent from that address. Set org.lockss.scomm.sendFromBindAddr = false to override.
- ServeContent has support for rewriting Javascript.
- DaemonStatus query arg "columns=All" or "columns=*" selects all table columns.
- Poll status table includes individual voter agreement, actual poll duration and end times.
- The config backup file is now served with a name derived from the host name.
- Content can be collected significantly faster without overloading publishers, by collecting static and dynamic content at different rates. Plugins may specify rates according to MIME type or URL pattern:
- Bug fixes
- Message counts in Comm Peer Data are now correct.
- ViewContent returns 404 status if AU or URL not found.
- Linux scripts correctly update /etc/updatedb.conf to exclude content directories from indexing in the locate database.
- Rapid daemon death detection has been disabled until false triggers are diagnosed.
[edit] Daemon 1.51.6
- Features
- Adding AUs in Journal Configuration is much faster.
- URLs with path components (chars between /) longer than 255 characters can now be preserved. (Thanks to CMU practicum.)
- IPv6 addresses can be used as LCAP identities (not yet supported by hostconfig).
- AUID DaemonStatus table has optional Publisher and Year columns, to support CLOCKSS article counts.
- New module provides a range of analysis tools to KbartConverter, allowing more accurate coverage gap reporting in KBART exports, even when publisher metadata exhibits inconsistent volume identifier formats or unusual year ranges.
- CSV KBART output is now customizable.
- Bug fixes
- Link rewriter (ServeContent) more completely rewrites inline CSS in HTML files.
- Link rewriter (ServeContent) host-relative mode works (org.lockss.serveContent.absoluteLinks = false).
- PeerIds are normalized; referring to the same peer by two equivalent forms of the same IP_ADDR:PORT is now harmless. (Matters for IPv6, where alternate forms of address are common.)
- ViewContent returns 404 status if AU or URL not found.
- KbartConverter processes volume sequences and Roman numbers.
- Several fixes to startup scripts to eliminate corner cases, ensure only one daemon instance, pass correct release number to runssl.
[edit] Daemon 1.50.2
- Features
- Requests to content server for pages not belonging to any preserved AU will be redirected to publisher if org.lockss.serveContent.missingFileAction = Redirect.
- All content accesses are logged at the log level specified by org.lockss.proxy.accessLogLevel (for proxy) and org.lockss.serveContent.accessLogLevel (for content server).
- IPv6 connections to servers (UI, proxy, content) are now allowed. and the corresponding access lists allow IPv6 addresses & masks.
- IP access lists may contain comment lines (beginning with #).
- If org.lockss.daemon.bindAddrs is set to a list of local IP addresses, content and admin servers will listen at just those addresses. Allows multiple daemons on hosts with multiple IP addresses to run all servers on standard ports.
- If org.lockss.scomm.bindToLocalIpOnly = true, LCAP listener binds only to the IP address specified by its LCAP identity (or internal NATted address, if applicable).
- If org.lockss.poll.discardSavedPolls = true, saved polls will not be resumed at startup.
- To improve the results returned by the Open URL resolver, plugins can now specify the location (URL) of various bibliographic features such as volume TOC or title TOC.
- A new SAX-based XML metadata extractor is available to plugin writers.
- Substance checking is enabled by default (for plugins that supply substance patterns).
- Plugins can now specify separate implementation or compatibility versions for individual features (such as polling, substance checking, metadata extraction), allowing daemon to detect when various databases need to be updated.
- Bug fixes
- Blanks in AU configuration parameter values will now match the various URL-encodings of blank ("+", "%20") when used in crawl rules.
- Crawl start pages (start URLs) need not all contain a permission statement, as long as at least one per host does.
- Linux scripts did not always restart the daemon after an unexpected exit.
[edit] Daemon 1.49.3
- Features
- The Holdings List now includes the option of limiting output to only configured titles and AUs. It is also available on the content server port without admin login.
- AUs whose plugins supply metadata extractors have an additional "List Metadata" link on their status detail page, which invokes metadata extraction on the AU and displays all extracted metadata. This is intended to facilitate testing and debugging article iterators and metadata extractors. A more general query facility for previously extracted and stored metadata will be available soon.
- Plugins can now declare an implementation version for individual features, by including a map as the value of plugin_feature_version_map.
The value of the Poll key determines polling compatibility between peers. This allows the primary plugin version number to be incremented without causing polling incompatibility.
The value of the Metadata key should be changed whenever the characteristics of the plugin's metadata extractors change. This informs the daemon that metadata extraction should be run again. - Crawl error detail includes Severity (Warning, Error or Fatal), so can tell which URL(s) caused crawl to fail.
- "Crawl Plugins" button in DebugPanel requests crawl for all plugin registries.
- Plugin status links to status summary of its AUs.
- Can now explode WARC files on ingest like ARC, ZIP, TAR. (Code from Felix @ LuKII.)
- Plugin may specify a custom OaiHandler. (Patch from Leonid @ Harvard.)
- Bug fixes
- XML and HTML Metadata extractors unencode XML and HTML entities in extracted strings.
- ARC and WARC export files are served with the correct content-type.
- ServeContent sends the proper referer when requesting files from the publisher.
- Plugin-specific URL normalizer may add or remove the default port to/from the URL.
- Unsetting crawlPriorityAuidMap param properly removes crawl priorities.
- OaiCrawlSpec determines whether failure to collect a start URL is an error, as originally intended.
- Link rewriter used by ServeContent now correctly re-encodes the output with the proper charset after rewriting.
[edit] Daemon 1.48.7
Release 1.48.7 has taken much longer than previous releases, and this list of features and bug fixes reflects that. This release has some major new components and some significant bugs were discovered during pre-release testing, which produced an unfortunate delay in the eventual release. We plan to return to our normal smaller, more frequent releases.
- Features
- List Holdings servlet produces bibliographic listing of all titles available for preservation, in industry-standard KBART format.
- CSV listing can be imported directly into other library systems, such as link resolvers.
- HTML listing is convenient for viewing and searching for titles, ISSNs, etc.. Fields displayed and their ordering can be customized.
- ServeContent servlet provides direct (non-proxied) access to preserved content.
- Content can be accessed by URL (url=...), DOI (doi=...) or through OpenURL parameters.
- If publisher is up and has newer content it will be served; otherwise the locally preserved copy is served.
- Includes many improvements to HTML and CSS link rewriters.
- OpenURL Resolver supports bibliographic queries of preserved content.
- Available through ServeContent using OpenURL 1.0 or earlier OpenURL 0.1 query parameter keys.
- Returns article specified if Metadata Manager is enabled or DOI is given; otherwise returns table of contents where user can navigate to specified article.
- Returns content from publisher if not currently preserved in local LOCKSS box.
- Metadata Manager uses metadata extraction framework to build database of article-level bibliographic information from preserved content.
- Metadata extraction not enabled by default in this release, but can be enabled through Expert Config parameters.
- Feature is available for field testing in this release; sites are encouraged to enable it and provide feedback
- Contact LOCKSS support for information on enabling and helping to field test this feature.
- Metadata Extraction framework
- Metadata extractors can emit multiple ArticleMetadata.
- SubTreeArticleIterator visitor can emit multiple ArticleFiles.
- Several new base classes simplify ArticleIterators and extractors.
- ArticleMetadata holds separate raw and cooked metadata maps. Cooked map is accessed via MetadataField descriptors which specify cardinality and optional validator, normalizer and splitter.
- Added ArticleMetadata.get/setLocale(). Default locale specified by org.lockss.metadata.defaultLocale, default is Locale.US.
- Configuration information can be loaded from the config server securely. In hostconfig, enter an https: configuration URL, and optionally the name of a keystore to be used to authenticate the server.
- Subsidiary config URLs (titledbs, etc.) in config files may now be relative, so that they will be fetched via https if the parent was. This also removes the major obstacle to supporting redundant prop servers.
- If no titlesets are explicitly defined, the basic ones (All Titles, Active AUs, Inactive AUs) will automatically be used. Avoids the "No titlesets are defined" problem that frequently occurs in testing.
- Crawler's cookie processing is more permissive by default (HttpClient's COMPATIBILITY mode). To enforce strict adherence to spec, set org.lockss.urlconn.cookiePolicy = RFC2109, or set per plugin with au_crawl_cookie_policy.
- Crawls can be disabled for specific AUs by giving them a priority less than -10000 in org.lockss.crawler.crawlPriorityAuidMap.
- AUs are automatically restarted when a new version of a plugin is collected and installed. (org.lockss.plugin.restartAusWithNewPlugin is new true by default.)
- Plugins may include conditional sections in order to override, e.g., crawl windows and rates in testing. If org.lockss.daemon.testingMode = FOO, values in plugin's FOO_override map are copied to definition map.
- Alerts can be sent on AU create/delete.
- The title db xml files formerly in test/frameworks/title_db_files are now generated automatically, into test/frameworks/tdbxml, from the tdb source files in tdb/*. These files are built by the new ant target ant tdb-pln -Dplnname=PLN.
- List Holdings servlet produces bibliographic listing of all titles available for preservation, in industry-standard KBART format.
- UI Changes
- Ranges of AUs can be selected in Add/Remove Titles form by selecting one end and shift selecting the other end.
- New DaemonStatus table displays AU configuration, with instantiated crawl rules, start URLs, etc. Linked from AU detail page.
- DaemonStatus ...&output=xml&outputVersion=2 outputs raw values into xml, no user-friendly display formatting.
- DaemonStatus requests can specify the list of columns to be included in status tables, and their order, by setting the columns query arg to a semicolon-separated list of column names.
- ViewContent serves filtered file if filter query arg non-null.
- ViewContent links to version table if multiple versions.
- API to add AUs by specifying AUID to look up in title DB: AuConfig?lockssAction=AddByAuid&auid=auid
- Bug fixes
- Deleted AUs no longer cause daemon restarts or interfere with crawl status display
- Protect against simultaneous Add Titles operations. Fixes "Internal inconsistency: AU exists but is not in config file".
- Plugin-specified custom handling of network errors now takes effect for errors thrown while data is being read (e.g., IOExceptions cause by chunking errors).
- Failure to collect any start URL now causes crawl to fail. Can be disabled for OAI crawls with OaiCrawlSpec.setFailOnStartUrlError(false).
- Crawl status is correctly updated if crawler throws OutOfMemoryError.
- HTTP status code 400 now treated like 404 (NoRetryDeadLinkException) by default. Previously caused crawl to fail.
- Restored config load failure explanation message to UI.
- UI now displays configured hostname instead of reverse DNS of IP.
- Workarond for Double.parseDouble() bug.
- Max heap on Linux increased to 1536MB.
- Fixed concurrency bug in Content-Type caches, and avoid excessive churn caused by inclusion of highly variable attributes (e.g., url)
- Correctly serve export and other files larger than 2GB. Default is now to create unlimited size files (one per AU).
- ServeContent fixes
- Restored configurable missing file behavior (one of 404, HostAuIndex, AuIndex).
- Made host down behavior match ProxyHandler.
- Added org.lockss.serveContent.neverProxy, prevents forwarding or redirecting to publisher. "noproxy=1" query arg does the same thing.
- Handle compressed publisher response.
- Pass correct charset to link rewriter.
- Fixed invalid Last-Modified: -1 response header.
- Send Content-Length: when possible to allow connection reuse.
- Get connection pools from ProxyManager.
- Buffer rewritten files up to org.lockss.serveContent.maxBufferedRewrite bytes (default 64K).
- Fixed file descriptor leak in WARCWriter and ARCWriter.
- Exporter handles full disk gracefully.
- Namespace no longer hardwired into OAI queries. (Patch from Leonid @ Harvard).
- For Linux & Solaris, /etc/init.d/lockss and the scripts that it invokes fixed to prevent accidental multiple invocations starting multiple daemon instances.
[edit] Daemon 1.47.7
1.47.7 is a patch release to fix two time-critical bugs. It contains no new features.
- Bug fixes
- The daemon failed to promptly close files in some situations, leading to "Too many open files" errors.
- A combination of bugs in the poller caused occasional spikes in network traffic. In addition to fixing the bugs, message rate limiters have been added.
[edit] Daemon 1.47.5
1.47.5 is a patch release to fix two time-critical bugs. It contains no new features.
- Bug fixes
- Workaround for a bug at MetaPress (which currently hosts the Springer titles), which frequently causes malformed HTTP responses.
- Removed an inadvertent limit of 10000 AUs on each disk.
[edit] Daemon 1.47.3
- Features
- The priority with which AUs are scheduled to be crawled can now be explicitly controlled. This may be used, eg, to force earlier collection of content that is known to be in danger of disappearing. org.lockss.crawler.crawlPriorityAuidMap should be set to a list of comma-separated pairs; each element consists of a regular expression to be matched against AUIDs, followed by an integer priority. AUs will be given the crawl priority corresponding to the first regexp their AUID matches, if any. Normally this would be set on a PLN-wide basis by the PLN admin, but it can also be set on an individual box using Expert Config.
- The priority of crawls started by debug panel is now set by org.lockss.debugPanel.crawlPriority (default 10).
- When a new plugin (or new version of an existing plugin) is loaded, the daemon will attempt to start any configured but not running AUs belonging to that plugin. (This condition could normally only occur as a result of a bad plugin, which could not be loaded.)
- If org.lockss.plugin.crawlRulesIncludeStartUrl = true (default false), start URLs and permission pages are implicitly included in the crawl rules.
- Plugin registry AUs now poll by default. To prevent this set org.lockss.plugin.registries.enablePolls = false.
- When files are repaired from a peer, the source of the repair (PeerId) and time is now recorded in the file's properties.
- Files with the MIME-type application/xhtml+xml are treated the same text/html, for purposes of link extraction and filtering.
- The daemon can now detect situations where, due to an error in a plugin or a major site redesign, crawls of an AU collect no files containing substantial content. (Ie, only the manifest and possibly minor images and css, etc.) Plugins may supply a list of regular expressions of either substantial URLs (au_substance_url_pattern) or insubtantial URLs (au_non_substance_url_pattern). If an AU is found to contain no substance URLs (or only non-substance URLs) it will be marked as having no substance and will not be voted on. To enable this, set org.lockss.substanceChecker.detectNoSubstanceMode = All (or to Crawl or Vote to enable detection only when the AU is crawled or invited into a poll, respectively).
- Bug fixes
- The default crawl order has been restored to true breadth-first.
- ListArticles no longer needlessly invokes metadata extractors just to list articles.
- SubTreeArticleIterator was omitting URLs that are an initial substring of another URL.
- HTML filters should be able to handle charset changes in <meta> tags without having to artificially raise org.lockss.filter.html.mark.
- CLOCKSS now accepts Creative Commons V3 license as a valid permission statement.
[edit] Daemon 1.46.2
- Features
- Improved CSS parsing. The crawler now uses a simple regular expression-based parser to find URLs in CSS files and <style> sections of HTML files. The old parser was too strict and failed on some CSS files that browsers accept. The new parser is much more robust in the presence of CSS syntax errors. To revert to the old parser (though there should be no need to do so), set org.lockss.mimeInfo.defaultCssExtractorFactory = org.lockss.extractor.FluteCssLinkExtractor$Factory, or select it in an individual plugin by including:
<entry> <string>text/css_link_extractor_factory</string> <string>org.lockss.extractor.FluteCssLinkExtractor$Factory</string> </entry>
- The metadata extraction framework can now handle complex and irregular article structures. Metadata query facilities and plugin-specific metadata extractors for content in the public LOCKSS network will be added in the next several releases.
- Optimized title database loading to reduce startup times.
Bug fixes
- Improved CSS parsing. The crawler now uses a simple regular expression-based parser to find URLs in CSS files and <style> sections of HTML files. The old parser was too strict and failed on some CSS files that browsers accept. The new parser is much more robust in the presence of CSS syntax errors. To revert to the old parser (though there should be no need to do so), set org.lockss.mimeInfo.defaultCssExtractorFactory = org.lockss.extractor.FluteCssLinkExtractor$Factory, or select it in an individual plugin by including:
- The link rewriter framework now operates on binary streams instead of character streams, facilitating link rewriting in binary file formats.
- ServeContent serves all binary files correctly.
[edit] Daemon 1.45.2
(Daemon 1.45.2 also incorporates the changes in daemon 1.44.2)
- Features
- Setup for secure LCAP communication has been simplified. See LCAP Over SSL.
- Options have been added to the HashCUS servlet to make it easier to use from external scripts. See HashCUS
- ServeContent accepts an auid query argument, to allow deterministically retrieving a file from a specific AU.
- The representation of title info has been revamped to save a substantial amount of memory.
- When the Linux RPM is updated, the daemon is automatically stopped and restarted with the new version. It is no longer necessary manually to run /etc/init.d/lockss stop before, and /etc/init.d/lockss start after the update. If configuration options have been added that require hostconfig to be rerun, a message to that effect will be displayed and the daemon will not be restarted.
- Bug fixes
- The end of the output from ListObjects and HashCUS is now marked with a comment line (# end) so scripts can detect incomplete output.
[edit] Daemon 1.44.2
(Daemon 1.44 was not released.)
- Features
- The keys used for secure LCAP communication may now be split between a shared keystore containing the public keys for all boxes (org.lockss.scomm.sslPublicKeystoreName), and a separate private keystore for each box containing only its private key (org.lockss.scomm.sslPrivateKeystoreName). This should simplify the process of adding boxes to a secure PLN.
- Keystores may be loaded from a file, resource (jar) or URL, by setting the appropriate one of org.lockss.keyMgr.keystore.id.file, org.lockss.keyMgr.keystore.id.resource or org.lockss.keyMgr.keystore.id.url. See LOCKSS Network Administration for more details.
- A more flexible script to create plugin jars is available in test/scripts/jarplugin. This script allows multiple plugins to be packaged in a single jar, to support plugin inheritance. The resulting jar must be signed manually using, eg, jarsigner.
- Config parameter changes
- org.lockss.plugin.keystore.password is now optional. An alternate plugin keystore may be used by setting just org.lockss.plugin.keystore.location.
[edit] Daemon 1.43.3
- Features
- For sites requiring all outgoing connections to be proxied (not just port 80), a proxy may now be specified (using the platform configuration dialogue) for the props (configuration) file fetch.
- In PLNs only, preserved content may now be exported as an ARC, WARC or ZIP file. Feature is enabled by setting org.lockss.export.enabled = true.
- If org.lockss.blockHasher.ignoreFilesOutsideCrawlSpec = true, polls will ignore files whose URL doesn't match the current crawl rules. This avoids disagreement caused by files erroneously collected on some machines (before the plugin was fixed) but not others. Polls also now ignore globally excluded files (see org.lockss.crawler.globallyExcludedUrlPattern)
- Plugins may now supply a handler to be run to determine the disposition of HTTP responses, eg, to determine, based on context, whether certain errors should cause a crawl to fail.
- Bug fixes
- The status of polls running when an AU was restarted after a plugin update was sometimes incorrect.
- Content files with huge numbers of unclosed table row tags (
<TR>) could cause HTML filters to throw a stack overflow, which could cause a daemon restart. - The crawler now properly handles SocketTimeoutExceptions.
- A number of O/S dependencies have been fixed, so that the code now compiles and passes unit tests on Windows and MacOS. Actually running the daemon on Windows or MacOS is still not supported due to filesystem inadequacies.
- Config parameter changes
- org.lockss.plugin.keystore.password is now optional. An alternate plugin keystore may be used by setting just org.lockss.plugin.keystore.location.
[edit] Daemon 1.42.1
- Features
- The daemon has always loaded and dynamically installed new versions of plugins as they become available, but existing AUs continued to use the old plugin definition until they were manually deactivated and reactivated, or the daemon restarted. Now, AUs may be automatically updated whenever their plugin is updated. For now you must set org.lockss.plugin.restartAusWithNewPlugin = true to enable this; it will become the default in a couple releases.
[edit] Daemon 1.41.2
- Features
- In addition to using crawl rules to specify which links the crawler should follow, content can be filtered before it's passed to a link extractor during a crawl. This allows links to be excluded based on the context in which they appear on the page (e.g., in a "recent news" section), rather than matching the URL itself. Plugins should set mime-type_crawl_filter_factory to the name of a FilterFactory
- The admin UI now makes it easy to distribute newly added AUs among the available disks.
- Old versions of content files can be accessed. In the AU detail page, the version number of files that have more than one version is a link to a FileVersions table, which lists the size and collection date of each version, and provides a link to the content.
- If you set org.lockss.accounts.enabled = true to enable user accounts, the following user actions generate auditable events that can be logged. The events can be sent to, for example, the local syslog if you set org.lockss.alert.action.syslog.host = 127.0.0.1 and org.lockss.alert.action.syslog.enabled = true.
- User account:
- created
- disabled
- logged in
- logged out
- password changed
- User actions:
- Change to Content Access IP list
- Change to Admin UI Access IP list
- Change to Content Access Options
- Use of Debug Panel
- Use of Expert Config
- User account:
- Bug fixes
- When using SSL, network errors could cause LCAP connections to hang in a state that prevented the originating peer from initiating any further connections to the receiving peer. In addition, SO_KEEPALIVE is now turned on by default; it can be disabled by setting org.lockss.scomm.socketKeepAlive = false.
- XML status tables are served with the correct charset (UTF-8).
[edit] Daemon 1.40.2
- Features
- The daemon can now collect content from sites that use Akamai to cache content. The embedded source URL is extracted from Akamai URLs if org.lockss.UrlUtil.normalizeAkamaiUrl is true, so that the files are collected from and stored under the source URL. Additional work is needed on the link rewriter to allow such content to be easily browsable.
- Individual AUs can be collected via different proxies by setting the AU config param crawl_proxy (e.g., in the title DB) to host:port. Set to DIRECT to cancel the effect of a global proxy.
- Plugins can control whether more than one of their AUs may crawl simultaneously, by specifying how fetch rate limiters are shared between AUs. (AUs sharing a a rate limiter will not crawl at the same time.) By default all AUs belonging to a plugin share a rate limiter. Plugins may set plugin_fetch_rate_limiter_source to one of:
- au - each AU gets its own rate limiter and multiple AUs may crawl simultaneously
- plugin - all AUs belonging to the plugin share the same rate limiter and only one may crawl at a time
- key:key - all AUs belonging to plugins that use the same key share a rate limiter
- host:param - param should be one of the base URL AU config parameters of the plugin. The host part of the parameter value for the AU is extracted and used as the rate limiter key. (I.e., all AUs crawling from the same host will share a rate limiter.)
- title_attr:attr - the value of the attribute attr in the AU's title DB entry is used as the rate limiter key
- The crawler was previously hardwired to fetch no more than 10 files per minute, no matter how low a plugin set its au_def_pause_time. The minimum delay can now be changed by setting org.lockss.baseau.minFetchDelay (default 6000ms).
- ListObjects servlet with arg type=files produces a tab-separated list of url, mime-type, size.
- The maximum size of filtered streams recorded by HashCUS can be controlled by setting org.lockss.hashcus.truncateFilteredStream (default 100K). -1 means no limit.
- Transmission speed of LCAP messages longer than org.lockss.scomm.minMeasuredMessageSize bytes (default 5MB) is reported in the log at debug level.
- The ExplodedPlugin used by CLOCKSS boxes to ingest Elsevier and Springer source content has been restructured to make it definable and more like other plugins.
- The Elsevier and Springer plugins for CLOCKSS now have initial support for metadata extraction, including DOIs.
- Plugin jars generated by genplugin now include all .xml files in plugin dir, to allow for inheritance.
- genkey accepts command line args to set certificate distinguished name (DN) values (from Monika @ MetaArchive).
- Bug fixes
- Added RandomManager to coordinate use of SecureRandom and ensure the desired algorithm is always used.
- Unit tests seed SecureRandom to avoid exhausting kernel's entropy.
- Exploder creates one AU per journal per year.
[edit] Daemon 1.39.2
- Features
- Plugins may now control the order in which URLs are fetched during a crawl. The load on servers that prepare and cache presentations for multiple pages (e.g., an issue or a multi-page article) in a batch may be significantly reduced by fetching all pages in a single article or issue in a depth-first fashion, rather than the default breadth-first. Plugins may supply a comparator to order URLs by setting plugin_crawl_url_comparator_factory to the name of a CrawlUrlComparatorFactory.
- The user interface allows admin users to set arbitrary daemon configuration parameters on the Expert Config page. (For example, to tailor user account settings to local policies.)
- In a network where each peer's identity is confirmed using SSL and cryptographic certificates, the poller may be configured to serve repairs to trusted peers without prior agreement, by setting org.lockss.poll.v3.repairAnyTrustedPeer to true.
- ServeContent and the audit proxy can be configured to generate a browsable index of close-match AUs along with a 404 response for a non-preserved URL.
- Bug fixes
- Failed plugin registry crawls were retried too often when no regular AUs needed crawling.
- Linux hostconfig script erroneously changed owner of /etc and /etc/lockss.
[edit] Daemon 1.38.4
- Features
- The daemon's administrative web user interface now supports:
- SSL (https).
- Multiple user accounts.
- Current users status.
- User-settable passwords.
- Strict password quality and rotation requirements.
- Finer-grained permissions.
- Customizable logo displayed on each page.
- Customizable login page banner.
- Instructions for enabling these features are in beta test and will be posted soon. Contact us if you need them.
- The daemon now includes a framework for extracting bibliographic metadata from the content being preserved and displaying it. The details of how this is accomplished are publisher-dependent, thus metadata is available only for those publishers whose plugins have been enhanced to support it. In this release the only plugins to have been enhanced are those for HighWire Press and BePress. The AU status page for AUs with these plugins will have links to generate:
- A list of all the DOIs in the AU.
- A tab-separated table of the URL for each article in the AU and its DOI.
- Plugin Inheritance. If a plugin's plugin_parent attribute is set, the plugin's definition is the merge of the parent's and child's definitions, with attributes set in the child taking prededence.
- Keystore management has been centralized, so multiple daemon components (e.g., LCAP SSL and admin UI) may share keystores.
- Crawl-end report (and HashCUS) now report hash digest in hex (was base64).
- Size of login page checker buffer is settable.
- AU status displays existence and status of crawl window.
- SSL startup script (/etc/lockss/runssl) is passed daemon release name arg (e.g., --release 1.38.4).
- Added a framework for PluginUtil to display various attributes of a hypothetical AU.
- Config parameter changes:
- org.lockss.scomm.SslClientAuth renamed to org.lockss.scomm.sslClientAuth.
- org.lockss.scomm.SslProtocol renamed to org.lockss.scomm.sslProtocol.
- org.lockss.scomm.SslKeyStore and org.lockss.scomm.SslPrivateKeyPasswordFile replaced by org.lockss.scomm.sslKeystoreName.
- org.lockss.scomm.SslTempKeystore no longer used.
- Many new parameters to configure keystores, admin UI, account management, etc. See org.lockss.keyMgr.keystore.
- The daemon's administrative web user interface now supports:
- Bug fixes
- Hashed byte-count statistics were kept in an int.
- ServeContent failed to rewrite links in several cases.
- Record of which peers don't have which AUs was being reset too often, causing peers to be invited needlesly.
- Added missing log level mappings to syslog logger.
[edit] Daemon 1.37.2
- Features
- Highest agreement with consensus is reported, as well as most recent.
- Agreement history may be transferred to a replacement PeerId. (E.g, when a peer changes IP address.)
- Select box in daemon status pages is now usable from lynx and other browsers without javascript.
- Bug fixes
- Eliminated unnecessary hashing of older content versions when repair received
- Proxy error messages include request hostname.
- Files served by ServeContent are now cacheable.
- HTTP servers (proxy, ServeContent, etc.) should now restart correctly when port is changed.
- Crawler no longer double-fetches pages from sites requiring authentication.
- Unknown host errors during crawl are reported correctly.
- Crawl end report hashes unfiltered content.