One important question about LZ compression is "Is it good?" And the answer is "yes", but it isn't necessarily immediately obvious why.
Join me below the fold for some bit-counting and fun with variable length coding.
KT Companion
Tuesday, 19 August 2014
Bonus Workshop Questions - Week 2
Getting caught up! Sorry about the delay.
6. Write a regular expression to solve the following problem:
(e) Find long words whose letters are in alphabetical order.
7. Practice using awk, or alternatively, bash-scripting with grep and sed. For example, write a program that finds all of the email addresses in enron-headers.txt. (You might contrast your observations with enron-emails.txt.) How many words are there in Fathers and Sons by Ivan Turgenev (turgenev.txt)? How many instances are there of the same word repeated twice? How many words in the dictionary (words.txt; it’s actually an inflectional lexicon) have their letters in
alphabetical order? How many of the nine letter words (nines.txt)? [All of these are available on the CIS servers, i.e. nutmeg and dimefox, connection instructions on the LMS.]
Solutions below the fold.
6. Write a regular expression to solve the following problem:
(e) Find long words whose letters are in alphabetical order.
7. Practice using awk, or alternatively, bash-scripting with grep and sed. For example, write a program that finds all of the email addresses in enron-headers.txt. (You might contrast your observations with enron-emails.txt.) How many words are there in Fathers and Sons by Ivan Turgenev (turgenev.txt)? How many instances are there of the same word repeated twice? How many words in the dictionary (words.txt; it’s actually an inflectional lexicon) have their letters in
alphabetical order? How many of the nine letter words (nines.txt)? [All of these are available on the CIS servers, i.e. nutmeg and dimefox, connection instructions on the LMS.]
Solutions below the fold.
Monday, 18 August 2014
Tuesday, 29 July 2014
Bonus workshop questions: Week 1
Reasoning about data is crucial for Knowledge Technologies.
Consider the bar chart about the projected growth of structured and unstructured data in the article "Data Science and Prediction" by Vasant Dhar. (The article is linked in the readings on the LMS; also the graph was reproduced on the slide "Some Data about Data" from Lecture 1 (under the Fair Use provision of the Australian Copyright Act).)
a) The height of the bars indicate the total capacity of archived data for the years 2008-2015 - there seems to be an exponential increase. Do you expect this to continue? Why or why not?
b) The share of the total archived data belonging to databases has decreased over the years in question: in 2008, this was about 13% of the total data; in 2015, it is projected to be less than 11%. Why do you think this is happening?
(Answers below the fold.)
Consider the bar chart about the projected growth of structured and unstructured data in the article "Data Science and Prediction" by Vasant Dhar. (The article is linked in the readings on the LMS; also the graph was reproduced on the slide "Some Data about Data" from Lecture 1 (under the Fair Use provision of the Australian Copyright Act).)
a) The height of the bars indicate the total capacity of archived data for the years 2008-2015 - there seems to be an exponential increase. Do you expect this to continue? Why or why not?
b) The share of the total archived data belonging to databases has decreased over the years in question: in 2008, this was about 13% of the total data; in 2015, it is projected to be less than 11%. Why do you think this is happening?
(Answers below the fold.)
Friday, 25 July 2014
Nothing means anything, or "What's knowledge all about?"
Knowledge technologies is a different kind of Computer Science subject. In my experience, many students come to this subject with a technical background, where a subject might typically have goal structures like:
- learn some definitions
- learn some algorithms
- apply the algorithms for solving certain kinds of problems
Tuesday, 22 July 2014
What's this blog all about?
This blog is designed to be a companion for the subject Knowledge Technologies (COMP30018/90049) at the University of Melbourne.
[Edit: moved content below the fold because jump breaks are awesome.]
[Edit: moved content below the fold because jump breaks are awesome.]
Subscribe to:
Posts (Atom)