Solr Nested Documents Download

Copy and paste this link to your website, so they can see this document directly without any plugins.



Keywords

Solr, More, Hidden, Getting, Apache, https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html, Gems:, using, about, your, that, with, Songs, Reference, details, Guide., used, "Love", Love",, most, documents, will, from, field, specific, Update, Block, boost, should, edismax

Transcript

Hidden Gems
Getting More Out Of Apache Solr
ApacheCon 2014 NA - 2014-04-08
https://people.apache.org/~hossman/ac2014na
https://twitter.com/_hossman
http://www.lucidworks.com/
Hidden Gems: Getting More Out Of Apache Solr 1 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Monitoring
Hidden Gems: Getting More Out Of Apache Solr 2 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Admin UI
Hidden Gems: Getting More Out Of Apache Solr 3 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Hidden Gems: Getting More Out Of Apache Solr 4 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Anything information you see in the Admin UI, is available programmatically to remote clients via Request Handlers or JMX.
Stefan Matheis provided a Great overview of how the Admin UI works @ LuceneRevolution 2013 (with Video).
Hidden Gems: Getting More Out Of Apache Solr 5 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
JMX
Hidden Gems: Getting More Out Of Apache Solr 6 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
More details about using JMX in the Solr Reference Guide.
Hidden Gems: Getting More Out Of Apache Solr 7 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
HTTP Admin APIs
{
"filterCache":{
"stats":{
"lookups":27,
"hits":23,
"hitratio":0.85,
"inserts":4,
"evictions":0,
"size":4,
"warmupTime":0,
"cumulative_lookups":31,
"cumulative_hits":25,
"cumulative_hitratio":0.81,
"cumulative_inserts":6,
"cumulative_evictions":0}},
Hidden Gems: Getting More Out Of Apache Solr 8 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
facet.method
More details about using facet.method in the Solr Reference Guide.
Hidden Gems: Getting More Out Of Apache Solr 9 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
fc vs. fcs
Both iterate over the matching documents and increment counters per-term
fc (Default)
Single FieldCache (or UnInvertedField) over entire index
Typically faster look-ups than fcs once un-inverted structure is built
fcs
FieldCache per index segment
Typically faster to re-build than fc in NRT situations -- only modified segments need built
Per-segment FieldCache is also used in sorting -- re-use in faceting may reduce total heap usage.
As with most things related to performance -- your experience will almost certainly vary from the observations of others. Always do some
comparisons yourself using real data and realistic update/query patterns.
The most important thing folks should remember regarding the performance of fc & fcs is that using DocValues is probably a better
choice than either of them.
Hidden Gems: Getting More Out Of Apache Solr 10 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
enum
Enumerates all terms in the field and computes a set intersection with the matching documents
Leverages the filterCache
Small Cardinality Fields
Cached document sets may use less RAM than FilterCache
fq constraints on the same field will re-use the cached document sets
High Cardinality Fields
FieldCache / UnInvertedField may not fit in RAM
Slower enum can still be used to get counts
Use facet.enum.cache.minDf to minimize filterCache churn
Can be used for faceting on full-Text fields to build tag clouds
Hidden Gems: Getting More Out Of Apache Solr 11 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Result Clustering
aka: Dynamic Faceting
Hidden Gems: Getting More Out Of Apache Solr 12 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Hidden Gems: Getting More Out Of Apache Solr 13 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Screenshot taken from the Carrot2's online demo of FoamTree's Voronoi treemap visualization tool.
Hidden Gems: Getting More Out Of Apache Solr 14 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Clustering Component
{
"clusters":[{
"labels":["Environmental"],
"score":6.393107732455205,
"docs":["9781901362930", "9781841130903",
"9781841130897", "9781841133607"]},
{
"labels":["Human Rights"],
"score":15.667620783438327,
"docs":["9781841130354", "9781841134574",
"9781841136530"]},
{
"labels":["Anatomy of Tort Law"],
"score":11.181459329239996,
"docs":["9781901362091", "9781901362084"]},
{
"labels":["Litigation"],
"score":8.560711128059928,
"docs":["9781841132983", "9781841134574"]},
...
Details about Configuring & using Result Clustering in the Solr Reference Guide.
Hidden Gems: Getting More Out Of Apache Solr 15 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Function Boosting &
Personalized Scoring
At ApacheCon 2012 EU, I talked in depth about "Boosting & Biasing" using domain knowledge & user analytics (Video).
Hidden Gems: Getting More Out Of Apache Solr 16 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Basic Function Boosting
q = Nightfall Isaac Asimov
defType = edismax
boost = div( popularity, add(1,price) )
q = {!boost b=$my_func v=$qq}
qq = +title:Nightfall author:"Isaac Asimov"
my_func = div( popularity, add(1,price) )
The edismax QParser has explicit support for a boost param that can be used to apply a multiplicity function boost, but the boost QParser
can also be used to wrap any query type you can imagine.
If you aren't familiar with the "{!qparser_name param=$variable}..." syntax, you should take a look at "Local Params" in the Solr
Reference Guide.
Hidden Gems: Getting More Out Of Apache Solr 17 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Custom Category Boosts Per User
Accumulate data on how much each of your users like/dislike various categories
Batch process for every user:
A normalized "Z-Score" preference for each category
Record the 3 most significant (ie: greatest absolute value Z-Score) categories
At query time:
Look-up the user's 3 most significant category scores
Use the Z-Scores as exponents in a boost function over those category queries
qq = ...search terms...
q = {!boost b=$b v=$qq}
b = prod(pow( query($cat1), $z_cat1),
pow( query($cat2), $z_cat2),
pow( query($cat3), $z_cat3))
cat1 = category:action # The user's 3 most significant categories,
z_cat1 = 1.48 # ... and their Z-scores
cat2 = category:comedy
z_cat2 = 1.33
cat3 = category:kids
z_cat3 = -1.7
This specific example of personalized scores using categorical preferences comes from long time Solr user Amit Nithian via solr-user
mailing list.
Amit is giving a talk tomorrow that I suspect will go into a lot more details on this sort of thing.
Hidden Gems: Getting More Out Of Apache Solr 18 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Defaults, Appends, Invariants,
... Oh My!
Hidden Gems: Getting More Out Of Apache Solr 19 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Lots of Options = Long URLs ?
http://server:8983/solr/collection_name/select?defType=edismax
&qf=title^4+authors^3+description&pf2=title,author
&boost=div(popularity,add(1,price))&sort=score+desc,+price+desc
&fl=id,title,description,price&fq=instock:true
&rows=100&start=0&q=Nightfall+Isaac+Asimov
Hidden Gems: Getting More Out Of Apache Solr 20 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Lots of Options ≠ Long URLs !
http://server:8983/solr/collection_name/select?q=Nightfall+Isaac+Asimov

edismax
title^4 authors^3 description
title, author
div(popularity,add(1,price))
score desc, price desc
id,title,description,price
instock:true
100
0

Configuring request parameter defaults in your solrconfig.xml instead of in your clients, helps centralize your business logic about
how you want your searches to work in a single place, so it's easier to change with out needing to modify all of your potential search
client code. It also reduces the size of the HTTP requests, which may result in noticeable impacts on your network load & query
throughput.
More details about using Default request parameters in the Solr Reference Guide.
Hidden Gems: Getting More Out Of Apache Solr 21 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Prevent Client Mistakes
http://server:8983/solr/collection_name/select?q=Nightfall+Isaac+Asimov

edismax
title^4 authors^3 description
title, author
div(popularity,add(1,price))
score desc, price desc
100


instock:true


id,title,description,price
0

Besides the benefits mentioned above regarding parameter defaults, Using appends and invariantss lets you enforce rules that client
developers might "forget" to enforce themselves, or might not realize are important from a business / performance standpoint.
Hidden Gems: Getting More Out Of Apache Solr 22 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Hide Implementation Details
/select?shipping=free_to_members&cat=books&q=Nightfall+Isaac+Asimov

{!term f=category v=$cat}


instock:true
{!switch case.any='*:*'
default=$cat_filter
v=$cat}

{!switch case.any='*:*'
case.free_to_members='member_shipping:0.0'
case.free='shipping_cost:0.0'
v=$shipping}



any
any

More details about using switch QParser in the Solr Reference Guide.
Hidden Gems: Getting More Out Of Apache Solr 23 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Hierarchical Documents
(aka: Block Join)
A classic IR problem that is Block Joining solves is the idea of very large text documents (eg: textbooks), divided up into hierarchical
chunks (eg: volume, part, section, chapter) and you want to be able to find books that contain a chapter with some specific criteria and
another chapter with some different specific criteria.
Hidden Gems: Getting More Out Of Apache Solr 24 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Nested Documents


100
album
Wayne's World (soundtrack)

101
song
Bohemian Rhapsody
Queen


102
song
Hot and Bothered
Cinderella

...

...

JSON Syntax for indexing document blocks was added by SOLR-5183 after Solr 4.7 was released, and will be included in Solr 4.8.
More details about using Block Joins in the Solr Reference Guide
Hidden Gems: Getting More Out Of Apache Solr 25 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
"Soundtrack" Albums
/select?q=soundtrack&fq=doctype:album
{ "response":{"numFound":3,"start":0,"docs":[
{
"id":"100",
"album_name":"Wayne's World (soundtrack)"},
{
"id":"200",
"album_name":"Empire Records (Soundtrack)"},
{
"id":"300",
"album_name":"Reality Bites (Soundtrack)"}]
}}
Hidden Gems: Getting More Out Of Apache Solr 26 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
"Love" Songs
/select?q=love&fq=doctype:song
{ "response":{"numFound":7,"start":0,"docs":[
{ "id":"114",
"song_name":"Loud Love",
"artist_name":"Soundgarden"},
{ "id":"112",
"song_name":"Loving Your Lovin'",
"artist_name":"Eric Clapton"},
{ "id":"406",
"song_name":"One Year of Love",
"artist_name":"Queen"},
{ "id":"503",
"song_name":"Ready For Love",
"artist_name":"Bad Company"},
{ "id":"532",
"song_name":"Hammer of Love",
"artist_name":"Bad Company"},
...,
Hidden Gems: Getting More Out Of Apache Solr 27 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Soundtracks containing "Love" Songs
/select?q=soundtrack&fq={!parent which="doctype:album"}love
{ "response":{"numFound":2,"start":0,"docs":[
{
"id":"100",
"album_name":"Wayne's World (soundtrack)"},
{
"id":"300",
"album_name":"Reality Bites (Soundtrack)"}]
}}
Hidden Gems: Getting More Out Of Apache Solr 28 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
"Love" Songs on Soundtracks
/select?q=love&fq={!child of="doctype:album"}soundtrack
{ "response":{"numFound":3,"start":0,"docs":[
{
"id":"114",
"song_name":"Loud Love",
"artist_name":"Soundgarden"},
{
"id":"112",
"song_name":"Loving Your Lovin'",
"artist_name":"Eric Clapton"},
{
"id":"314",
"song_name":"Baby, I Love Your Way",
"artist_name":"Big Mountain"}]
}}
Hidden Gems: Getting More Out Of Apache Solr 29 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Hiding the Details
/songs
/songs?song=love
/songs?album=soundtrack
/songs?song=love&album=soundtrack


doctype:album
{!child of=$album_filter v=$album_query}
{!df=song_name_t v=$song}
{!df=album_name_t v=$album}
{!switch case='*:*'
default=$song_query
v=$song}



{!switch case='doctype:song'
default=$songs_by_album
v=$album}



Remember what I was saying about simplifying requests using defaults and the switch QParser?
All Songs
"Love" Songs
Songs on "Soundtrack" Albums
"Love" Songs on "Soundtrack" Albums
A similar /albums handler could be configured with the appropriate (reversed) rules.
Hidden Gems: Getting More Out Of Apache Solr 30 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Block Join Caveats
Fairly new feature, still evolving (SOLR-5142)
Currently only supported as constant score queries
Special _root_ field needed to handle deletes when updating a block:
Currently has a bug if you "update" a parent document to have no children (SOLR-5211)
Doesn't play nicely with deleting by id -- need to use delete by query to ensure all children are removed
Coming Soon: Option to include nested child documents in search results (SOLR-5285)
The constant score limitation should be easy to fix - I think it's just a hard coded simplification that needs a new local param option.
Deleting a child doc (that has no children of it's own) by ID should work fine -- the specific problem comes in when you try to delete a
parent doc by id: it will orphan the children and lead to non-deterministic behavior.
Hidden Gems: Getting More Out Of Apache Solr 31 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Update Processors
Hidden Gems: Getting More Out Of Apache Solr 32 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Pipeline Of Reusable Tools


authors
editors

contributors


;
contributors



authors
primary_author


primary_author

In this example, imagine that our input documents contain only the (multi-valued) authors and editors fields. We build up the (single
valued) contributors string field by combining the two, and we populate the primary_author field using the first value from the authors
field.
There are a lot more Update Processors like these. More info about these specific Update Processors:
CloneFieldUpdateProcessorFactory
ConcatFieldUpdateProcessorFactory
FirstFieldValueUpdateProcessorFactory
Hidden Gems: Getting More Out Of Apache Solr 33 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Script Your Own

update-script.js

42


// in update-script.js
function processAdd(cmd) {
doc = cmd.solrDoc; // org.apache.solr.common.SolrInputDocument
if (params.get("min_popularity") < doc.getFieldValue("popularity")) {
doc.addField("is_hot","true");
}
}
If the existing Update Processors don't do what you need, you can write your own in Java -- or in any scripting language supported by your
JVM.
Hidden Gems: Getting More Out Of Apache Solr 34 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Q & A
Hidden Gems: Getting More Out Of Apache Solr 35 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html
Me https://twitter.com/_hossman
My Company
http://www.lucidworks.com/
These Slides
https://people.apache.org/~hossman/ac2014na
Solr Docs
https://lucene.apache.org/solr/documentation.html
Mailing Lists & IRC
https://lucene.apache.org/solr/discussion.html
Join The Revolution in DC, November 11-14
http://www.lucenerevolution.org/2014/call-for-speakers
Hidden Gems: Getting More Out Of Apache Solr 36 of 36
https://people.apache.org/~hossman/ac2014na/hidden-gems-apache-solr.html

Online Document Converter

This website help webmasters to achieve a better user experience. Instead of putting a link to download their prices lists and another type of documents, you can simply send a special link to this service, and we will show your document to your users directly without the need of downloading a special application or installing another browsers plugin. Currently, we can read about hundred the most used database files. Moreover, your users can share this document directly on social networks, giving your document additional page views. By using this service, you can save costs by not overloading your own web server, give your users a better way to read documents online without any problems, and allow them to easily download converted copy for offline reading in a format they like.


Previous 10

Next 10