preface when there is rain neither sun nor moon, the people actually do not think so. perhaps there are seasonal climate not to be cold rain, let the sun or the cool side. has the rain the night is otherwise a moonlit night no flavor. some
preface when there is rain neither sun nor moon, the people actually do not think so.
perhaps there are seasonal climate not to be cold rain, let the sun or the cool side.
has the rain the night is otherwise a moonlit night no flavor.
sometimes not reminiscent of li shangyin “when he cut a total of west window candle, but then when the famous hope of reunion among friends”
in the rain, fom the sky come the silk notes; here, quiet and comfortable.
presentationin this page, we will show the homework of the course of the databases technology in cmu. all of the questions we tested in the postgresql.
this section mainly is to practice the query operation of the sql, include sql view, index, etc. we also sure that all the sqls will pass in the mysql, sqlite… , this article also will provide the chinese version in detail.
中文版
question detailsin this homework you will have to write sql queries to answer questions on a movie dataset, about movies and actors. the database contains two tables:
? movies(mid, title, year, num ratings, rating) , primary key: mid
? play in(mid, name, cast position), primary key: mid, name
the tables contain the obvious information: which actor played in what movie, at what position; for each movie, we have the title (eg., ’gone with the wind’), year of production, count of rating reviews it received, and the average score of those ratings (a float in the range 0 to 10, with ’10’ meaning ’excellent’).
we will use postgres, which is installed in the your own machines.
question 1: warm-up queries … … … … … … … … . [5 points]
(a) [2 points] print all actors in the movie quantum of solace, sorted by cast position. print only their names.
(b) [3 points] print all movie titles that were released in 2002, with rating larger than 8 and with more than one rating (num ratings > 1).
question 2: find the star’s movies … … … … … … . . [5 points]
(a) [5 points] print movie titles where sean connery was the star (i.e. he had position 1 in the cast). sort the movie titles alphabetically.
question 3: popular actors … … … … … … … … . . [15 points]
(a) [8 points] we want to find the actors of the highest quality. we define their quality as the weighted average of the ratings of the movies they have played in (regardless of cast position), using the number of ratings for each movie as the
weight. in other words, we define the quality for a particular actor as
print the names of the top 5 actors, according to the above metric. break ties alphabetically.
(b) [7 points] now we want to find the 5 most popular actors, in terms of number of ratings (regardless of positive or negative popularity). i.e, if actor ‘smith’ played in 2 movies, with num ratings 10 and 15, then smith’s popularity is 25 (=10+15). print the top 5 actor names according to popularity. again, break ties alphabetically.
question 4: most controversial actor … … … … … . [10 points]
(a) [10 points] we want to find the most controversial actor. as a measure of controversy, we define the maximum difference between the ratings of two movies that an actor has played in (regardless of cast position). that is, if actor ‘smith’ played in a movie that got rating=1.2, and another that got rating=9.5, and all the other movies he played in, obtained scores within that range, then smith’s contoversy score is 9.5-1.2= 8.3. print the name of the top-most controversial actor - again, if there is a tie in first place, break it alphabetically.
question 5: the minions … … … … … … … … … . [20 points]
(a) [20 points] find the “minions” of annette nicole: print the names of actors who only played in movies with her and never without her. the answer should not contain the name of annette nicole. order the names alphabetically.
question 6: high productivity … … … … … … … … [5 points]
(a) [5 points] find the top 2 most productive years (by number of movies produced). solve ties by preferring chronologically older years, and print only the years.
question 7: movies with similar cast … … … … … . [15 points]
(a) [8 points] print the count of distinct pairs of movies that have at least one actor in common (ignoring cast position). exclude self-pairs, and mirror-pairs.
(b) [7 points] print the count of distinct pairs of moves that have at least two actors in common (again, ignoring cast position). again, exclude self-pairs, and mirror pairs.
question 8: skyline query … … … … … … … … … [25 points]
(a) [25 points] we want to find a set of movies that have both high popularity (ie, high num ratings) as well as high quality (rating). no single movie may achieve both - in which case, we want the so-called skyline query 2 . more specifically, we want all movies that are not “dominated” by any other movie:
definition of domination : movie “a” dominates movie “b” if movie “a” wins over movie “b”, on both criteria, or wins on one, and ties on the rest.
figure 1 gives a pictorial example: the solid dots (’a’, ’d’, ’f’) are not dominated by any other dot, and thus form the skyline. all other dots are dominated by at least one other dot: e.g., dot ’b’ is dominated by dot ’a’, being inside the shaded rectangle that has ’a’ as the upper-right corner.
figure 1: illustration of skyline and domination : ’a’ dominates all points in the shaded rectangle; ’a’, ’d’ and ’f’ form the skyline of this cloud of points.
given the above description, print the title of all the movies on the skyline, along with the rating and the number of ratings.
answerwe give the postgres version in detail, we will see you can tranfer it easily in mysql or sqlite.
initialization:## drop the table if existsdrop table if exists movies cascade; drop table if exists play_in cascade;## create tables movies and play_increate table movies (mid integer primary key, title varchar(200),year integer,num_ratings integer, rating real);create table play_in (mid integer references movies, name varchar(100), cast_position integer, primary key(mid, name));create index mid on movies(mid);
insert valuesinsert into some values into the table movies and play_in,
you will find the datas just in the follow links in my 360 yunfiles:
https://yunpan.cn/csflzxqaprxsi password: f3ab
## use copy in postgres\copy movies from '~/data/movie_processed.dat';\copy play_in from '~/data/movie_actor_processed.dat';## if you use other databases(mysql, sqlite), you can use the sql statement: insert into ... valuse()
the flowing image show the test infos in my ubuntu os:
solution 1(a) select name from play_in p, movies mwhere p.mid = m.mid and m.title=’quantum of solace’order by p.cast_position;## (a) result just like this:name------------------------------daniel craigolga kurylenkomathieu amalricjudi denchgiancarlo gianninigemma artertonjeffrey wrightdavid harbourjesper christensenanatole taubmanrory kinneartim pigott-smithfernando guillen-cuervojesus ochoaglenn fosterpaul rittersimon kassianidesstana katiclucrezia lante della rove...neil jacksonoona chaplin(21 rows)(b) select title from movieswhere year = 2002 and rating>8 and num_ratings>1;## (b) result just like this:title---------------------------------------the lord of the rings: the two towerscidade de deusmou gaan dou(3 rows)
the flowing image show the test solution 1 infos in my ubuntu os:
solution 2select title from movies m, play_in pwhere m.mid = p.mid and name = ’sean connery’ and cast_position = 1order by title;## result just like this:title---------------------------------------der name der rosediamonds are foreverdr. noentrapmentfinding forresterfirst knightfrom russia with lovegoldfingernever say never againthe hunt for red octoberthe league of extraordinary gentlementhunderballyou only live twice(13 rows)
the flowing image show the test solution 2 infos in my ubuntu os:
solution 3(a) drop view if exists weigthedratings;create view weightedratings asselect name, sum(rating*num_ratings)/sum(num_ratings) as weightedratingfrom movies m, play_in p where m.mid = p.mid group by(name);select name from weightedratingsorder byweightedrating desc, name asc limit 5;## (a) result just like this:name-----------------------adam kalesperisaidan feorealeksandr kajdanovskyalexander kaidanovskyalisa frejndlikh(5 rows)(b) drop view if exists actorsumratings;create view actorsumratings asselect name, sum(num_ratings) as popularityfrom play_in p, movies mwhere p.mid = m.midgroup by name;select name from actorsumratingsorder by popularity desc, name asc limit 5;## (b) result just like this:name----------------------johnny deppalan rickmanorlando bloomhelena bonham cartermatt damon(5 rows)
the flowing images show the test solution 3 infos in my ubuntu os:
solution 4drop view if exists ratinggap;create view ratinggap asselect p1.name, max(abs(m1.rating-m2.rating)) as gapfrom play_in p1, play_in p2, movies m1, movies m2where p1.mid = m1.mid and p2.mid = m2.mid and p1.name = p2.namegroup by(p1.name);select name from ratinggap order by(gap) desc limit 1;## result just like this:name---------------john travolta(1 row)
the flowing image show the test solution 4 infos in my ubuntu os:
solution 5drop view if exists mastersmovies cascade;create view mastersmovies asselect m.mid,m.title from movies m, play_in pwhere m.mid = p.mid and p.name = ’annette nicole’;drop view if exists coactors;create view coactors asselect distinct name from mastersmovies m , play_in pwhere p.mid = m.mid;drop view if exists combinations;create view combinations asselect name,mid from mastersmovies , coactors;drop view if exists nonexistent;create view nonexistent asselect * from combinationsexcept (select name, mid from play_in);drop view if exists potentialresults;create view potentialresults asselect * from coactorsexcept (select distinct(name) from nonexistent);drop view if exists notmastersmovies;create view notmastersmovies asselect m.mid from movies mexcept (select mid from mastersmovies);select * from potentialresultswhere name not in (select namefrom play_in p, notmastersmovies mwhere m.mid = p.midunion select ’annette nicole’) order by name;## result just like this:name-----------------christian perry(1 row)
the flowing image show the test solution 5 infos in my ubuntu os:
solution 6drop view if exists moviesperyear;create view moviesperyear asselect year, count(title) num_moviesfrom movies group by(year);select year from moviesperyear order by num_movies desc limit 2;## result just like this:year------20062007(2 rows)
the flowing image show the test solution 6 infos in my ubuntu os:
solution 7(a) select count(*) from (select distinct m1.mid, m2.mid from movies m1, movies m2, play_in p1, play_in p2 where m1.mid > m2.mid and m1.mid = p1.mid and m2.mid = p2.mid and p1.name = p2.name) as count;## (a) result just like this:count--------104846(1 row)(b) select count(*) from (select distinct m1.mid, m2.midfrom movies m1, movies m2, play_in p1,play_in p2, play_in p3, play_in p4where m1.mid > m2.mid and m1.mid = p1.mid and m2.mid = p2.mid andm1.mid = p3.mid and m2.mid = p4.mid and p2.name p4.name and p1.name = p2.name and p3.name = p4.name) as count;## (b) result just like this:count-------6845(1 row)
the flowing image show the test solution 7 infos in my ubuntu os:
solution 8drop view if exists dominated;create view dominated asselect distinct m2.mid, m2.title,m2.num_ratings, m2.ratingfrom movies m1, movies m2where m2.rating<=m1.rating and m2.num_ratings<=m1.num_ratings and not (m2.rating = m1.rating and m2.num_ratings=m1.num_ratings);select title,num_ratings,rating from moviesexcept (select title,num_ratings,rating from dominated);
the flowing image show the test solution 8 infos in my ubuntu os:
reference[1] http://www.ruanyifeng.com/blog/2013/12/getting_started_with_postgresql.html
[2] http://www.postgresql.org/docs/
[3] http://www.cs.cmu.edu/~epapalex/15415s14/postgresqlreadme.htm
[4] http://www.cs.cmu.edu/~christos/courses/