1c. Smalltalk Collections
1 Intro
1.1 Purpose of Collections
-
Collections hold other
objects and provide protocol for enumerating through the held objects.
-
Collections are organized
in a number of ways:
-
Indexed or not indexed
-
Indexed by explicit or
implicit keys
-
One kind of object or
varied kinds of objects
-
Ordering to the elements
exists (such as an index), or no ordering.
-
Fixed in size or expandable.
1.2 Kinds of Collections
-
Exact list of collections
is implementation dependent.
-
Collections common to
most implementations:
| Collection Name |
Description |
| Collection |
Abstract parent of all collection
classes |
| Bag |
Unordered, unstructured; holds anything
but nil |
| Dictionary |
Holds key/value pairs (Associations);
keys are arbitrary |
| IdentityDictionary |
Holds key/value pairs (Associations);
keys are symbols; based on ==. |
| Set |
Like Bag but holds just one of any
object but nil |
| IndexedCollection |
Abstract parent of indexed classes;
name varies |
| FixedSizeCollection |
Abstract parent of fixed sized classes;
name varies |
| Array |
Holds anything |
| ByteArray |
Holds integers in range 0 to 255. |
| String |
Holds characters |
|
Symbol |
Holds characters; symbols are identity
objects |
| Interval |
Pseudo-collection representing range
of
numbers |
| VariableSizeCollection |
Abstract parent of variable sized
collections;
name varies |
| OrderedCollection |
Add anything anywhere; acts like
array, queue,
stack |
| SortedCollection |
Always stays sorted; holds anything |
1.3 Bags, Sets
-
Neither provides any ordering.
-
Accumulations of stuff.
-
Adding duplicate objects
makes the difference:
-
Bags hold as many of one
object as are added.
-
Sets always hold just
one.
1.4 Dictionaries
-
Collections that hold
key/value pairs called associations.
-
Dictionary keys can be
anything but are usually strings or symbols.
-
Values are added with
an associated key and are retrieved by using the key.
-
Keys of Identity dictionaries
must be identity objects, usually symbols.
-
Examples
" Create a dictionary and add elements
"
| dict |
dict := Dictionary new.
dict at: 'Fido' put: 'Mutt'.
dict at: 'Spot' put: 'Dalmation'.
dict at: 'Blackie' put: 'Labrador'.
^ dict at: 'Fido'
" Answers: 'Mutt' "
1.5 Indexed Collections
-
Collections with integer
indexes.
-
Elements are accessed
using indexes.
-
IndexedCollection
(name differs) is the abstract parent class of all indexed collections.
1.6 Arrays
-
Indexed collections with
fixed size.
-
Arrays are created with
a given number of elements, and cannot grow.
-
Each element is initialized
to nil.
-
Examples
| a |
a := Array new: 8.
1 to: a size do: [ :n | a at: n put:
n ].
^ a
" Answers: (1 2 3 4 5 6 7 8)
1.7 OrderedCollections
-
Indexed collections of
variable size.
-
Ordered collections can
grow when new elements are added.
-
Examples:
| oc s1 s2 |
oc := OrderedCollection new: 10.
1 to: 5 do: [ :n | oc add: n ].
^ oc
" Answers: OrderedCollection(1 2 3
4 5 ) "
1.8 ByteArrays, Strings,
Symbols
-
Each is a fixed size,
indexed collection of one kind of object.
-
Byte arrays hold integers
in the range 0 to 255.
-
Symbols and strings hold
characters.
1.9 Intervals
-
Objects acting as read-only
collections.
-
Their contents are computed
as needed.
-
Created by specifying
a range of numeric values, a start and end value, and an increment.
-
Example
| collection oc |
collection := 1 to: 20 by: 2.
oc := OrderedCollection new.
oc addAll: collection.
^ oc
" Answers: OrderedCollection(1 3 5
7 9 11 13 15 17 19) "
2 Collection Protocol
2.1 Basic Collection Protocol
-
Basic collection protocol,
which includes messages for creation, iteration, testing, and conversion.
2.1.1 Creation Messages
-
aCollectionClass new
-
Creates new instance of
the named class.
-
Allocates some default
amount of space to hold elements.
-
Generally useful only
for collections that grow.
Set new
OrderedCollection new
aCollectionClass new: size
-
Creates a new instance
of the named class
-
Allocates specified amount
of space to hold elements.
Array new: 20
OrderedCollection new: 100
2.1.2 Iteration Messages
-
Pass successive elements
of aCollection
to a one parameter block for some kind of processing.
-
If aCollection
has an ordering, pass the elements in the collection order.
-
Return value is sometimes
expected from the block.
-
aCollection do: aBlock
-
Passes elements to the
block, ignoring return values.
#( 1 2 3 8 9 )
do: [ :element |
Transcript show: element printString ]
aCollection collect: aBlock
-
Returns a new collection
of the same kind as aCollection
and holding the values answered by aBlock.
#( 1 2 3 8 9 )
collect: [ :element | element + 1 ]
" Answers: (2 3 4 9 10) "
aCollection detect: aBlock
-
Answers first element
causing aBlock
to answer true.
#( -2 0 2 )
detect: [ :element | element > 0 ]
" Answers: 2 "
aCollection detect: aBlock
ifNone: errorBlock
-
Answers first element
causing aBlock
to answer true.
-
If no such value, invokes
errorBlock
and answers its result.
#( -2 0 2 )
detect: [ :element | element > 0 ]
ifNone: [ nil ].
" Answers: 2 "
aCollection inject: aValue
into: aBlock
-
Answers a value formed
from an initial value, aValue,
and the elements of aCollection.
-
Two-parameter block aBlock
is invoked on each element of the receiver passing:
-
First, aValue
and the first collection element
-
For each successive invocation,
the prior block result and the next collection element.
" Sum the elements in an array "
#( 1 2 3 )
inject: 0
into: [ :sum :element | sum + element ].
" Answers: 6 "
aCollection reject: aBlock
-
Answer a new collection
of the same kind as the original holding each element for which aBlock
answers false.
#( -2 0 2 )
reject: [ :element | element >= 0 ].
" Answers: ( -2 ) "
aCollection select: aBlock
-
Answer a new collection
of the same kind as the original holding each element for which aBlock
answers true.
#( -2 0 2 )
select: [ :element | element >= 0 ].
" Answers: ( 0 2 ) "
2.1.3 Conversion Messages
-
The conversion messages
are: asArray,
asBag,
asByteArray,
asOrderedCollection,
asSet,
and asSortedCollection:.
-
Each kind of collection
responds to the messages by converting itself to an instance of the appropriate
class.
2.1.4 Testing Messages
-
aCollection occurrencesOf:
anObject
-
Answers the number of
times anObject
occurs in aCollection.
-
aCollection size
-
Answers the number of
elements in the collection.
-
aCollection isEmpty
-
Answers true
if the size is zero.
2.2 Modifying while Iterating
-
Bad practice - can cause
big trouble.
-
So-called unordered collections
actually have some internal ordering, which may not be obvious.
" Wrongly duplicating elements in a Bag "
| a |
a := #(1 2 3 4 5) asBag.
a do: [ :elem | a add: elem ].
^ a
" Answers: Bag(1 1 2 2 3 3 4 4 5 5 ) "
" Wrongly duplicating elements in an OrderedCollection "
| a |
a := #(1 2 3 4 5) asOrderedCollection.
a do: [ :elem | a add: elem ].
^ a
" Answers:
OrderedCollection(1 2 3 4 5 1 nil nil nil 1) "
Even more bizare things
happen when doing something similar to dictionaries.
" WRONG: Add elements to a dictionary while iterating thru it "
| d |
d := Dictionary new.
#(1 2 3 4 5) do: [ :elem |
d at: elem printString put: elem ].
d keysDo: [ :key |
d at: key,'x' put: (d at: key)*10 ].
d keysDo: [ :key |
Transcript show: key, ':', (d at: key) printString, ' ' ]
" Transcript contains:
1x:10 2x:20 3x:30 4x:40 5x:50 1xx:100 1:1 2:2 3:3 4:4 5:5
2xx:200 3xx:300 4xx:400 "
14 elements in the dictionary,
not 10 as expected; some added elements were themselves processed?
-
Some collections grow
or shrink when elements are added or removed, potentially moving
elements around.
-
This is why some elements
may be skipped or be processed more than once.
-
Dictionaries are hashed
and new elements go wherever they happen to go.
-
When iterating through
the dictionary some internal representation of the keys is used which does
have some ordering, meaningful only to other parts of the innards, but
potentially causing bugs when updating at the same time.
3 Using add:
3.1 How to use the add: message?
" Being burnt by add: "
| items |
items := Set new add: 1.
items add: 2.
^ items
add:
message answers its parameter, rather than self.
Use yourself
message.
" Using the yourself message
"
| items |
items := Set new
add: 1;
yourself.
items add: 2.
^items
" Using the yourself message "
^Set new
add: 1;
add: 2;
yourself
3.2 Why does add: return its parameter?
-
Answering the added object
allows that object to be further manipulated without having to assign it
to a variable.
-
Examples
" Add, then initialize - unclean"
| coll |
coll := OrderedCollection new.
(coll add: (String new: 20))
atAllPut: $*
" Initialize, then add - cleaner"
| coll |
coll := OrderedCollection new.
coll add: ((String new: 20) atAllPut:
$*)
" Add, save, then initialize "
| coll str |
coll := OrderedCollection new.
str := coll add: (String new: 100).
str atAllPut: $*
" Initialize, then add - cleaner"
| coll str |
coll := OrderedCollection new.
str := (String new: 100) atAllPut:
$*.
coll add: str
-
None of the alternatives
above that use the 'feature' are as good as those that don’t.
4 Using inject:into:
-
This message is rarely
used, mostly by experienced smalltalkers.
4.1 Using inject:into:
for Summing
" Simple inject:into: to sum elements in an array "
| array sum |
array := #( 1 2 3 4 5 ).
sum := array
inject: 0
into: [ :inj :ele | inj + ele ]
" Answers: 15 "
Result of each step becomes
the injected value for the next step.
| Iteraton |
Injected value |
Element value |
Result |
| 1 |
0 |
1 |
1 |
| 2 |
1 |
2 |
3 |
| 3 |
3 |
3 |
6 |
| 4 |
6 |
4 |
10 |
| 5 |
10 |
5 |
15 |
-
We can sum a property
of the collection.
" Compute the total length of a collection
of strings "
| arrayOfStrings |
arrayOfStrings := #('asdf' 'qwer'
'wert' '1234' ).
arrayOfStrings
inject: 0
into: [ :len :str | len + str
size].
" Answers: 16 "
4.2 Using inject:into:
for forming products
" n Factorial using inject:into:
"
| n |
n := 10.
(1 to: n)
inject: 1
into: [ :inj :ele | inj * ele
]
" Answer: 3628800 "
4.3 Injecting Collections
-
Any object can be injected.
-
Fibonacci series requires
memory of the previous two elements to compute the current element
" Fibonacci using inject:into: "
| nterms |
nterms := 10.
^(3 to: nterms)
inject: (OrderedCollection with: 1 with: 1)
into: [ :inj :n |
inj
addLast: ((inj at: n - 2) + (inj at: n - 1) );
yourself ].
" Result: 1 1 2 3 5 8 13 21 34 55 "
Accumulating acollection
of words into a string, separating words with a blank.
" Given an array of words, make a sentence fragment "
| words |
words := #( 'Gone' 'With' 'the' 'Wind' ).
words
inject: ''
into: [ :inj :ele | inj, ele, ' ' ]
" Answers: 'Gone with the wind ' "
4.4
Simulating collect: and select: with inject:into:
-
Several standard collection
iterators can be simulated with inject:into:.
" collect: "
stuff collect: [ :element | element
+ 1]
"collect: simulated with inject:into:
"
| stuff |
stuff := #( 1 2 3 4 5 ) asOrderedCollection.
stuff
inject: (stuff class
new: stuff size)
into: [ :result :element
| result add: element + 1; yourself ]
" Answers: OrderedCollection(2 3 4
5 6 ) "
" Select: "
stuff select: [ :element | element
> 0 ]
" select: simulated with inject:into:
"
| stuff |
stuff := #( 1 2 3 4 5 ) asOrderedCollection.
stuff
inject: stuff class new
into: [ :result :element
|
element
> 0
ifTrue: [ result add: element; yourself ] ]
" Answers: OrderedCollection(1 2 3
4 5 ) "
-
collect:
and select:
have defaults for the class of the collection they return.
-
inject:into:
versions simulate this by making the result the same kind as that of the
original collection.
-
Simulation won't work
if stuff is an Array.
4.5 Using
inject:into: to find a maximum value
" Find the largest value in a collection
using inject:into: "
| c |
c := #( 1 4 3 6 9 8 2 ).
^c
inject: 0
into: [ :max :elem |
max max: elem ]
" Answers: 9 "
" Find the largest value in a collection
using do: "
| c max |
c := #( 1 4 3 6 9 8 2 ).
max := 0.
c do: [ :elem | max := max max: elem
].
^ max
" Answers: 9 "
5 Hashed Collections
5.1 Intro
-
Sets, bags, and dictionaries
are hashed collections.
-
User defined instances
must implement equal (=)
and hash
methods so that the values work properly in hashed collections.
5.2 How do hashed collections
work?
-
Collections that use hashing
to find elements include sets, bags, and dictionaries.
-
Value answered by the
hash method is used in an indexing operation to locate elements.
-
Hash values are integers
derived from the object.
-
Hash values are used for
two things:
-
(Quick) Comparison for
(possible) equality
-
Basis for the value used
as the index into a collection.
-
Example
-
String stored in a bag
is stored in an indexed collection, such as an array, at a position that
is calculated similar to this:
position := (aString hash \\ bagIndexedCollection size) + 1
Position is used as an
index into bagIndexedCollection
for the place where the value resides.
Since hash values are
usually not unique, they are used to check whether a given element is already
in the collection.
-
If equal values do not
produce equal hashes, hashed collections will not work.
-
Objects in Smalltalk must
thus always obey the rule:
-
Equal objects must
have equal hash values.
5.3 (Re)Implement hash
method
-
Reimplementing the =
method implies reimplementing hash.
-
Example: =
and hash
for a US-style phone number
-
Components: area code,
exchange, and number.
= aPhoneNumber
^(self areaCode = aPhoneNumber areaCode
and: [self exchange = aPhoneNumber exchange])
and: [self number = aPhoneNumber number]
hash
method must answer a hash that uses all components compared by equal (=).
hash
^ areaCode hash + exchange
hash + number hash
or:
hash
^ (areaCode hash xor:
exchange hash) xor: number hash
6 Bags and Sets
6.1 Concatenate
Message (#,) for Bags and Sets
-
Implemented for collections
that have an ordering.
-
Forms a new collection
which has the two component collections placed 'end-to-end'.
-
Not meaninful for sets/bags
since they have no ordering.
-
Better to use addAll:
than adding #,
yourself.
-
addAll:
changes the collection and #,
answers a new one.
-
Example - comma-equivalent
(self copy)
addAll: aCollection;
yourself
7 OrderedCollections
7.1 Size of an OrderedCollection
-
Number of elements actually
in the collection, not the potential size of the collection.
-
Example - size of a collection
" Examples of size of an OrderedCollection
"
| oc s1 s2 |
oc := OrderedCollection new: 10.
oc add: oc size.
1 to: 5 do: [ :n | oc add: n*10 ].
oc add: oc size.
^ oc
" Answers: OrderedCollection(0 10
20 30 40 50 6 ) "
7.2 Capacity of an OrderedCollection
-
Number of elements that
can be inserted before it needs to grow.
-
Provided with new: when
a new ordered collection is created.
-
capacity
answers the number of potential elements the collection can hold (some
dialects)
oc capacity
IBM Smalltalk:
-
Data in an ordered collection
is held within another collection.
-
elements
answers this other collection.
oc elements size
8 SortedCollection
-
Adding to a SortedCollection
vs. sorting a built collection?
-
Which is better:
inProperOrder: aCollection
| sc |
sc := SortedCollection new.
sc addAll: aCollection
or:
inProperOrder: aCollection
| sc |
sc := aCollection asSortedCollection
Tests
" Add to sorted collection "
| sc aCollection |
aCollection :=
(1 to: 500 by: 3),
(1 to: 500 by: 5),
(1 to: 500 by: 7).
Time millisecondsToRun: [
10 timesRepeat: [
sc
:= SortedCollection new.
sc
addAll: aCollection]].
" Results: Squeak 2.6: 170 171 (NT)
"
" Convert to sorted collection "
| sc aCollection |
aCollection :=
(1 to: 500 by: 3),
(1 to: 500 by: 5),
(1 to: 500 by: 7).
Time millisecondsToRun: [
10 timesRepeat: [sc :=
aCollection asSortedCollection]].
" Results: Squeak 2.6: 170 170 (NT)
"
9 Strings
9.1 Streams vs. Concatenation
" Build a string holding 100 ’x’ characters
using concatenation "
Time millisecondsToRun: [
10000 timesRepeat: [
| s
|
s :=
'x'.
99
timesRepeat: [ s := s , 'x' ] ] ].
" Results: 19508 19619 "
" Build a string holding 100 ’x’ characters
using a stream "
Time millisecondsToRun: [
10000 timesRepeat: [
| s
|
s :=
WriteStream on: (String new: 100).
100
timesRepeat: [ s nextPutAll: 'x' ] ] ].
" Results: 8412 8312 "
-
Concatenation
-
Each concatenation causes
a new string to be allocated.
-
The old parts may become
garbage.
-
This is repeated 99 times:
-
Streams
-
Values are appended to
the string that stores stream contents.
-
If stream contents string
is too short, it will be expanded and the old one thrown away
-
This does not happen often.
9.2 Regular Expressions
-
No builtin support for
regular expressions in Smalltalk.
10 Symbols
10.1 Comparing
Symbols vs. Comparing Strings
-
Symbols require only a
comparison of object pointers.
-
Strings require a comparison
of characters in the strings.
-
Often faster to use ==
to compare symbols since ==
is often implemented directly by the compiler.
-
Making (converting to)
symbols is an expensive operation. This is bad:
aString asSymbol == #Cancel ifTrue: [ ... ]
Symbols are identity objects.
Two symbols that have
equal values are always the same object.
If two symbols compare
equal with =
then they will always compare equal with ==.
-
Magic which makes symbols
work:
-
Basically depends on a
table which holds the symbols.
-
Hash of a symbol (guaranteed
to be unique among all symbols) is the index into the table.
-
Object pointer to a symbol
is pointer to the table entry.
-
Timing differences (Time
millisecondsToRun: [100000 timesRepeat: [ ...]])
| Expression |
Squeak 2.6 (NT) |
| symbolOne == symbolTwo |
170 |
| stringOne = stringTwo |
561 |
| symbolOne == stringOne asSymbol |
2123 |
| stringOne asSymbol == stringTwo asSymbol |
4066 |
-
Comparing strings 5 times
slower than comparing symbols
-
Converting string to symbols
100 times more expensive.