Scala week 3
Object-Oriented Sets:
package objsets
import common._
import TweetReader._
/**
* A class to represent tweets.
*/
class Tweet(val user: String, val text: String, val retweets: Int) {
override def toString: String =
"User: " + user + "\n" +
"Text: " + text + " [" + retweets + "]"
}
/**
* This represents a set of objects of type `Tweet` in the form of a binary search
* tree. Every branch in the tree has two children (two `TweetSet`s). There is an
* invariant which always holds: for every branch `b`, all elements in the left
* subtree are smaller than the tweet at `b`. The eleemnts in the right subtree are
* larger.
*
* Note that the above structure requires us to be able to compare two tweets (we
* need to be able to say which of two tweets is larger, or if they are equal). In
* this implementation, the equality / order of tweets is based on the tweet's text
* (see `def incl`). Hence, a `TweetSet` could not contain two tweets with the same
* text from different users.
*
*
* The advantage of representing sets as binary search trees is that the elements
* of the set can be found quickly. If you want to learn more you can take a look
* at the Wikipedia page [1], but this is not necessary in order to solve this
* assignment.
*
* [1] http://en.wikipedia.org/wiki/Binary_search_tree
*/
abstract class TweetSet {
/**
* This method takes a predicate and returns a subset of all the elements
* in the original set for which the predicate is true.
*
* Question: Can we implment this method here, or should it remain abstract
* and be implemented in the subclasses?
*/
def filter(p: Tweet => Boolean): TweetSet = filterAcc(p, new Empty)
/**
* This is a helper method for `filter` that propagetes the accumulated tweets.
*/
def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet
/**
* Returns a new `TweetSet` that is the union of `TweetSet`s `this` and `that`.
*
* Question: Should we implment this method here, or should it remain abstract
* and be implemented in the subclasses?
*/
def union(that: TweetSet): TweetSet = that.filterAcc(twit => true,this)
/**
* Returns the tweet from this set which has the greatest retweet count.
*
* Calling `mostRetweeted` on an empty set should throw an exception of
* type `java.util.NoSuchElementException`.
*
* Question: Should we implment this method here, or should it remain abstract
* and be implemented in the subclasses?
*/
def mostRetweeted: Tweet
def isEmpty: Boolean
/**
* Returns a list containing all tweets of this set, sorted by retweet count
* in descending order. In other words, the head of the resulting list should
* have the highest retweet count.
*
* Hint: the method `remove` on TweetSet will be very useful.
* Question: Should we implment this method here, or should it remain abstract
* and be implemented in the subclasses?
*/
def descendingByRetweet: TweetList = {
if(isEmpty) Nil
else {
val mostret = mostRetweeted
new Cons(mostret,remove(mostret).descendingByRetweet)
}
}
/**
* The following methods are already implemented
*/
/**
* Returns a new `TweetSet` which contains all elements of this set, and the
* the new element `tweet` in case it does not already exist in this set.
*
* If `this.contains(tweet)`, the current set is returned.
*/
def incl(tweet: Tweet): TweetSet
/**
* Returns a new `TweetSet` which excludes `tweet`.
*/
def remove(tweet: Tweet): TweetSet
/**
* Tests if `tweet` exists in this `TweetSet`.
*/
def contains(tweet: Tweet): Boolean
/**
* This method takes a function and applies it to every element in the set.
*/
def foreach(f: Tweet => Unit): Unit
}
class Empty extends TweetSet {
def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet = acc
/**
* The following methods are already implemented
*/
def contains(tweet: Tweet): Boolean = false
def incl(tweet: Tweet): TweetSet = new NonEmpty(tweet, new Empty, new Empty)
def remove(tweet: Tweet): TweetSet = this
def foreach(f: Tweet => Unit): Unit = ()
def isEmpty: Boolean = true
def mostRetweeted: Tweet = {
throw new java.util.NoSuchElementException
}
}
class NonEmpty(elem: Tweet, left: TweetSet, right: TweetSet) extends TweetSet {
def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet = {
if (p(elem)) left.filterAcc(p, right.filterAcc(p, acc.incl(elem)))
else left.filterAcc(p, right.filterAcc(p, acc))
}
def isEmpty: Boolean = false
def mostRetweeted: Tweet = {
def MaxRet(Twit1: Tweet, Twit2: Tweet): Tweet = {
if (Twit1.retweets > Twit2.retweets) Twit1 else Twit2
}
if (left.isEmpty && right.isEmpty) elem
else if (right.isEmpty) MaxRet(left.mostRetweeted,elem)
else if (left.isEmpty) MaxRet(right.mostRetweeted,elem)
else MaxRet(left.mostRetweeted,MaxRet(left.mostRetweeted,elem))
}
/**
* The following methods are already implemented
*/
def contains(x: Tweet): Boolean =
if (x.text < elem.text) left.contains(x)
else if (elem.text < x.text) right.contains(x)
else true
def incl(x: Tweet): TweetSet = {
if (x.text < elem.text) new NonEmpty(elem, left.incl(x), right)
else if (elem.text < x.text) new NonEmpty(elem, left, right.incl(x))
else this
}
def remove(tw: Tweet): TweetSet =
if (tw.text < elem.text) new NonEmpty(elem, left.remove(tw), right)
else if (elem.text < tw.text) new NonEmpty(elem, left, right.remove(tw))
else left.union(right)
def foreach(f: Tweet => Unit): Unit = {
f(elem)
left.foreach(f)
right.foreach(f)
}
}
trait TweetList {
def head: Tweet
def tail: TweetList
def isEmpty: Boolean
def foreach(f: Tweet => Unit): Unit =
if (!isEmpty) {
f(head)
tail.foreach(f)
}
}
object Nil extends TweetList {
def head = throw new java.util.NoSuchElementException("head of EmptyList")
def tail = throw new java.util.NoSuchElementException("tail of EmptyList")
def isEmpty = true
}
class Cons(val head: Tweet, val tail: TweetList) extends TweetList {
def isEmpty = false
}
object GoogleVsApple {
val google = List("android", "Android", "galaxy", "Galaxy", "nexus", "Nexus")
val apple = List("ios", "iOS", "iphone", "iPhone", "ipad", "iPad")
lazy val googleTweets: TweetSet = TweetReader.allTweets.filter(t => google.exists(s => t.text.contains(s)))
lazy val appleTweets: TweetSet = TweetReader.allTweets.filter(t => apple.exists(s => t.text.contains(s)))
/**
* A list of all tweets mentioning a keyword from either apple or google,
* sorted by the number of retweets.
*/
lazy val trending: TweetList = googleTweets.union(appleTweets).descendingByRetweet
}
object Main extends App {
// Print the trending tweets
GoogleVsApple.trending foreach println
}
Written by Imad BOUHAMIDI
Related protips
8 Responses
i didn'tunderstand this assignment. Could you please tell me about this solution.?
My email id : rkulhari.live@gmail.com
Hey dude. interesting solution. I think you are pretty close there. How are you liking scala. i wonder about the mostRetweeted solution. You added isEmpty to the classes that wasn't there, which seems to kind of ruin the polymorphism with so many if else statements.
I struggled very long on this one (couple of days, maybe) and with a tip from another student, modeled it after the filterAcc, which I really like the way that you did that.
I wonder though, shouldn't you maybe remove this because of the honor code in this class?
I love your union function. It is so much faster than the one defined in the video:
def union(other: TweetSet): TweetSet=
((left union right) union other) incl elem
Is this because of the tail recursiveness?
The union is indeed uttermost elegance and it should be faster because it appends to the existing tree, whereas Odersky-demonstrated decomposition-based method starts appending to the left or right part of the tree, which is smaller.
Other procedures, howvever could be de-bulkanized, by using variable accumulators + foreach
def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet = {
var res = acc
foreach(t => if (p(t)) res = res incl t)
res
}
def mostRetweeted: Tweet = {
var most: Tweet = elem
(left union right).foreach(t => if (t.retweets > most.retweets) most = t)
most
}
This is definitely neater since Oderski, despite discourages the variables, prefers this approach in TweetTest.asSet
Google/Apple tweets can also be refactored
def companyTweets(l: List[String]) = TweetReader.allTweets filter (t => l.exists(k => t.text.contains(k)))
lazy val googleTweets: TweetSet = companyTweets(google)
lazy val appleTweets: TweetSet = companyTweets(apple)
Thank you for posting your solution, especially your Union method, it was extremely helpful.
I was almost there, despite by the serial CPU killer akka the Union method from the course. Thanks for your post.
Well, my experience is that changing the implementation of union round a bit made little difference, the performance was still appalling. I couldn't even construct the allTweets value it just took forever. The whole union thing maxed out with more than about 70 tweets.
However, this solution went like the wind:
def asList: List[Tweet] =
left.asList:::right.asList:::List(elem)
def union(that: TweetSet): TweetSet = {
TweetReader.toTweetSet(this.asList ::: that.asList ::: List(elem): List[Tweet])}
I.e. bung everything in Lists, concatenate the Lists and then reconstruct the resultant TweetSet from the concatenated List. Crude maybe, but effective. (the definition of asList for Empty is left to the reader)
I think there is a bug in
MaxRet(left.mostRetweeted,MaxRet(left.mostRetweeted,elem))
shouldn't it be?:
MaxRet(left.mostRetweeted,MaxRet(right.mostRetweeted,elem))
and instead of
if (left.isEmpty && right.isEmpty) elem
else if (right.isEmpty) MaxRet(left.mostRetweeted,elem)
else if (left.isEmpty) MaxRet(right.mostRetweeted,elem)
MaxRet(left.mostRetweeted,MaxRet(right.mostRetweeted,elem))
you can also write
MaxRet(
if (left.isEmpty) elem else left.mostRetweeted,
MaxRet(if (right.isEmpty) elem else right.mostRetweeted, elem)
)