I understand that classes are like mold from which you can create objects, and a class defines a number of methods and variables (class,instances,local...) inside of it.
Let's say we have a class like this:
class Person
def initialize (name,age)
@name = name
@age = age
end
def greeting
"#{@name} says hi to you!"
end
end
me = Person.new "John", 34
puts me.greeting
As I can understand, when we call Person.new
we are creating an object of class Person
and initializing some internal attributes for that object, which will be stored in the instance variables @name
and @age
. The variable me
will then be a reference to this newly created object.
When we call me.greeting
, what happens is that greeting
method is called on the object referenced by me, and that method will use the instance variable @name
that is directly tied/attached to that object.
Hence, when calling a method on an object you are actually "talking" to that object, inspecting and using its attributes that are stored in its instance variables. All good for now.
Let's say now that we have the string "hello"
. We created it using a string literal, just like: string = "hello"
.
My question is, when creating an object from a built in class (String, Array, Integer...), are we actually storing some information on some instance variables for that object during its creation?
My doubt arises because I can't understand what happens when we call something like string.upcase
, how does the #upcase
method "work" on string
? I guess that in order to return the string in uppercase, the string object previously declared has some instance variables attached to, and the instances methods work on those variables?
Sorry to disillusion you, but Ruby is not written in Ruby. It's written in C. A string may present itself as an object to you, but under the hood it is some C storage — and C has no instance variables, and no classes; it is not an object oriented language at all. The upcase
method is mapped to a C function and simply produces a new sequence of characters. See https://github.com/ruby/ruby/blob/6d742c9412d444650d705b65bc2d5c850054c226/string.c#L7407 for the source code.
My question is, when creating an object from a built in class (String, Array, Integer...), are we actually storing some information on some instance variables for that object during its creation?
Yes, we are, basically:
string = "hello"
is shorthand for string = String.new("hello")
take a look at the following:
https://ruby-doc.org/core-3.1.2/String.html#method-c-new (ruby 3)
https://ruby-doc.org/core-2.3.0/String.html#method-c-new (ruby 2)
What's the difference between String.new and a string literal in Ruby?
You can also check the following (to extend the functionalities of the class):
Extend Ruby String class with method to change the contents
So the short answer is:
Dealing with built in classes (String, Array, Integer, ...etc) is almost the same thing as we do in any other class we create
Hence, when calling a method on an object you are actually "talking" to that object, inspecting and using its attributes that are stored in its instance variables. All good for now.
No, that is very much not what you are doing in an Object-Oriented Program. (Or really any well-designed program.)
What you are describing is a break of encapsulation, abstraction, and information hiding . You should never inspect and/or use another object's instance variables or any of its other private implementation details.
In Object-Orientation , all computation is performed by sending messages between objects. The only thing you can do is sending messages to objects and the only thing you can observe about an object is the responses to those messages.
Only the object itself can inspect and use its attributes and instance variables. No other object can, not even objects of the same type.
If you send an object a message and you get a response, the only thing you know is what is in that response. You don't know how the object created that response: did the object compute the answer on the fly? Was the answer already stored in an instance variable and the object just responded with that? Did the object delegate the problem to a different object? Did it print out the request, fax it to a temp agency in the Philippines, and have a worker compute the answer by hand with pen and paper? You don't know. You can't know. You mustn't know. That is at the very heart of Object-Orientation.
This is, BTW, exactly how messaging works in real-life. If you send someone a message asking "what is π²" and they answer with "9.8696044011", then you have no idea whether they computed this by hand, used a calculator, used their smart phone, looked it up, asked a friend, or hired someone to answer the question for them.
You can imagine objects as being little computers themselves: they have internal storage, RAM, HDD, SSD, etc. (instance variables), they have code running on them, the OS, the basic system libraries, etc. (methods), but one computer cannot read another computer's RAM (access its instance variables) or run its code (execute its methods). It can only send it a request over the network and look at the response.
So, in some sense, your question is meaningless: from the point of view of Object-Oriented Abstraction, is should be impossible to answer your question, because it should be impossible to know how an object is implemented internally.
It could use instance variables, or it could not. It could be implemented in Ruby, or it could be implemented in another programming language. It could be implemented as a standard Ruby object, or it could be implemented as some secret internal private part of the Ruby implementation.
In fact, it could even not exist at all, (For example. in many Ruby implementations small integers do not actually exist as objects at all. The Ruby implementation will just make it look like they do.)
My question is, when creating an object from a built in class (String, Array, Integer...), are we actually storing some information on some instance variables for that object during its creation?
[…] [W]hat happens when we call something like
string.upcase
, how does the#upcase
method "work" onstring
? I guess that in order to return the string in uppercase, the string object previously declared has some instance variables attached to, and the instances methods work on those variables?
There is nothing in the Ruby Language Specification that says how the String#upcase
method is implemented. The Ruby Language Specification only says what the result is , but it doesn't say anything about how the result is computed .
Note that this is not specific to Ruby. This is how pretty much every programming language works. The Specification says what the results should be, but the details of how to compute those results is left to the implementor. By leaving the decision about the internal implementation details up to the implementor, this frees the implementor to choose the most efficient, most performant implementation that makes sense for their particular implementation.
For example, in the Java platform, there are existing methods available for converting a string to upper case. Therefore, in an implementation like TruffleRuby, JRuby, or XRuby, which sits on top of the Java platform, it makes sense to just call the existing Java methods for converting strings to upper case. Why waste time implementing an algorithm for converting strings to upper case when somebody else has already done that for you? Likewise, in an implementation like IronRuby or Ruby.NET, which sit on top of the .NET platform, you can just use .NET's builtin methods for converting strings to upper case. In an implementation like Opal, you can just use ECMAScript's methods for converting strings to upper case. And so on.
Unfortunately, unlike many other programming languages, the Ruby Language Specification does not exist as a single document in a single place). Ruby does not have a single formal specification that defines what certain language constructs mean.
There are several resources, the sum of which can be considered kind of a specification for the Ruby programming language.
Some of these resources are:
String
s very broadly (since they have changed significantly between Ruby 1.8 and Ruby 1.9). It obviously also does not specify features which were added after the ISO Ruby Specification was written, such as Ractors or Pattern Matching.ruby/spec
– Note that the ruby/spec
is unfortunately far from complete. However, I quite like it because it is written in Ruby instead of "ISO-standardese", which is much easier to read for a Rubyist, and it doubles as an executable conformance test suite. For example, this is what the ISO/IEC 30170:2012 Information technology — Programming languages — Ruby specification has to say about String#upcase
:
15.2.10.5.42
String#upcase
upcase
- Visibility : public
- Behavior : The method returns a new direct instance of the class
String
which contains all the characters of the receiver, with all the lower-case characters replaced with the corresponding upper-case characters.
As you can see, there is no mention of instances variables or really any details at all about how the method is implemented. It only specifies the result.
If a Ruby implementor wants to use instance variables, they are allowed to use instances variables, if a Ruby implementor doesn't want to use instance variables, they are allowed to do that, too.
If you check the Ruby Spec Suite for String#upcase
, you will find specifications like these (this is just an example, there are quite a few more):
describe "String#upcase" do
it "returns a copy of self with all lowercase letters upcased" do
"Hello".upcase.should == "HELLO"
"hello".upcase.should == "HELLO"
end
describe "full Unicode case mapping" do
it "works for all of Unicode with no option" do
"äöü".upcase.should == "ÄÖÜ"
end
it "updates string metadata" do
upcased = "aßet".upcase
upcased.should == "ASSET"
upcased.size.should == 5
upcased.bytesize.should == 5
upcased.ascii_only?.should be_true
end
end
end
Again, as you can see, the Spec only describes results but not mechanisms . And this is very much intentional.
The same is true for the Ruby-Doc documentation of String#upcase
:
upcase(*options)
→string
Returns a string containing the upcased characters in
self
:s = 'Hello World.' # => "Hello World!" s.upcase # => "HELLO WORLD!"
The casing may be affected by the given
options
; see Case Mapping .
There is no mention of any particular mechanism here, nor in the linked documentation about Unicode Case Mapping.
All of this only tells us how String#upcase
is specified and documented , though. But how is it actually implemented ? Well, lucky for us, the majority of Ruby implementations are Free and Open Source Software, or at the very least make their source code available for study.
In Rubinius , you can find the implementation of String#upcase
in core/string.rb
lines 819–822 and it looks like this:
def upcase str = dup str.upcase! || str end
It just delegates the work to String#upcase!
, so let's look at that next, it is implemented right next to String#upcase
in core/string.rb
lines 824–843 and looks something like this (simplified and abridged):
def upcase: return if @num_bytes == 0 ctype = Rubinius:.CType i = 0 while i < @num_bytes c = @data[i] if ctype.islower(c) @data[i] = ctype.toupper!(c) end i += 1 end end
So, as you can see, this is indeed just standard Ruby code using instance variables like @num_bytes
which holds the length of the String
in platform bytes and @data
which is an Array
of platform bytes holding the actual content of the String
. It uses two helper methods from the Rubinius::CType
library (a library for manipulating individual characters as byte-sized integers). The "actual" conversion to upper case is done by Rubinius::CType::toupper!
, which is implemented in core/ctype.rb
and is extremely simple (to the point of being simplistic):
def self.toupper!(num) num - 32 end
Another very simple example is the implementation of String#upcase
in Opal , which you can find in opal/corelib/string.rb
and looks like this:
def upcase `self.toUpperCase()` end
Opal is an implementation of Ruby for the ECMAScript platform. Opal cleverly overloads the Kernel#`
method, which is normally used to spawn a sub shell (which doesn't exist in ECMAScript) and execute commands in the platform's native command language (which on the ECMAScript platform arguably is ECMAScript). In Opal, Kernel#`
is instead used to inject arbitrary ECMAScript code into Ruby.
So, all that `self.toUpperCase()`
does, is call theString.prototype.toUpperCase
method on self
, which does work because of how the String
class is defined in Opal :
class::String < `String`
In other words, Opal implements Ruby's String
class by simply inheriting from ECMAScript's String
"class" (really the String
Constructor function ) and is therefore able to very easily and elegantly reuse all the work that has been done implementing String
s in ECMAScript.
Another very simple example is TruffleRuby . Its implementation of String#upcase
can be found in src/main/ruby/truffleruby/core/string.rb
and looks like this:
def upcase(*options) s = Primitive.dup_as_string_instance(self) s.upcase!(*options) s end
Similar to Rubinius, String#upcase
just delegates to String#upcase!
, which is not surprising since TruffleRuby's core library was originally forked from Rubinius's. This is what String#upcase!
looks like :
def upcase:(*options) mapped_options = Truffle:.StringOperations,validate_case_mapping_options(options. false) Primitive,string_upcase! self, mapped_options end
The Truffle::StringOperations::valdiate_case_mapping_options
helper method is not terribly interesting, it is just used to implement the rather complex rules for what the Case Mapping Options that you can pass to the various String
methods are allowed to look like. The actual "meat" of TruffleRuby's implementation of String#upcase!
is just this: Primitive.string_upcase, self, mapped_options
.
The syntax Primitive.some_name
was agreed upon between the developers of multiple Ruby implementations as "magic" syntax within the core of the implementation itself to be able to call out from Ruby code into "primitives" or "intrinsics" that are provided by the runtime system, but are not necessarily implemented in Ruby.
In other words, all that Primitive.string_upcase, self, mapped_options
tells us is "there is a magic function called string_upcase!
defined somewhere deep in the bowels of TruffleRuby itself, which knows how to convert a string to upper case, but we are not supposed to know how it works".
If you are really curious, you can find the implementation of Primitive.string_upcase!
in src/main/java/org/truffleruby/core/string/StringNodes.java
. The code looks dauntingly long and complex, but all you really need to know is that the Truffle Language Implementation Framework is based on constructing Nodes for an AST-walking interpreter. Once you ignore all the machinery related to constructing the AST nodes, the code itself is actually fairly simple.
Once again, the implementors are relying on the fact that the Truffle Language Implementation Framework already comes with a powerful implementation of strings , which the TruffleRuby developers can simply reuse for their own strings.
By the way, this idea of "primitives" or "intrinsics" is an idea that is used in many programming language implementations. It is especially popular in the Smalltalk world. It allows you to write the definition of your methods in the language itself, which in turn allows features like reflection and tools like documentation generators and IDEs (eg for automatic code completion) to work without them having to understand a second language, but still have an efficient implementation in a separate language with privileged access to the internals of the implementation.
For example, because large parts of YARV are implemented in C instead of Ruby, but YARV is the implementation that the documentation on Ruby-Doc and Ruby-Lang is generated from, that means that the RDoc Ruby Documentation Generator actually needs to understand both Ruby and C. And you will notice that sometimes documentation for methods implemented in C is missing, incomplete, or corrupted. Similarly, trying to get information about methods implemented in C usingMethod#parameters
sometimes returns non-sensical or useless results. This would not happen if YARV used something like Intrinsics instead of directly writing the methods in C.
JRuby implements String#upcase
in several overloads of org.jruby.RubyString.upcase
and String#upcase!
in several overloads of org.jruby.RubyString.upcase_bang
.
However, in the end, they all delegate to one specific overload of org.jruby.RubyString.upcase_bang
defined in core/src/main/java/org/jruby/RubyString.java
like this:
private IRubyObject upcase_bang(ThreadContext context, int flags) {
modifyAndKeepCodeRange();
Encoding enc = checkDummyEncoding();
if (((flags & Config.CASE_ASCII_ONLY) != 0 && (enc.isUTF8() || enc.maxLength() == 1)) ||
(flags & Config.CASE_FOLD_TURKISH_AZERI) == 0 && getCodeRange() == CR_7BIT) {
int s = value.getBegin();
int end = s + value.getRealSize();
byte[]bytes = value.getUnsafeBytes();
while (s < end) {
int c = bytes[s] & 0xff;
if (Encoding.isAscii(c) && 'a' <= c && c <= 'z') {
bytes[s] = (byte)('A' + (c - 'a'));
flags |= Config.CASE_MODIFIED;
}
s++;
}
} else {
flags = caseMap(context.runtime, flags, enc);
if ((flags & Config.CASE_MODIFIED) != 0) clearCodeRange();
}
return ((flags & Config.CASE_MODIFIED) != 0) ? this : context.nil;
}
As you can see, this is is a very low-level way of implementing it.
In MRuby , the implementation looks again very different. MRuby is designed to be light-weight, small, and easy to embed into a larger application. It is also designed to be used in small embedded systems such as robots, sensors, and IoT devices. Because of that, it is designed to be very modular: a lot of the parts of MRuby are optional and are distributed as "MGems". Even parts of the core language are optional and can be left out, such as support for the catch
and throw
keywords, big numbers, the Dir
class, meta programming, eval
, the Math
module, IO
and File
, and so on.
If we want to find out where String#upcase
is implemented, we have to follow a trail of breadcrumbs. We start with the mrb_str_upcase
function in src/string.c
which looks like this:
static mrb_value
mrb_str_upcase(mrb_state *mrb, mrb_value self)
{
mrb_value str;
str = mrb_str_dup(mrb, self);
mrb_str_upcase_bang(mrb, str);
return str;
}
This is a pattern we have already seen a couple of times: String#upcase
just duplicates the String
and then delegates to String#upcase!
, which is implemented just above in mrb_str_upcase_bang
:
static mrb_value
mrb_str_upcase_bang(mrb_state *mrb, mrb_value str)
{
struct RString *s = mrb_str_ptr(str);
char *p, *pend;
mrb_bool modify = FALSE;
mrb_str_modify_keep_ascii(mrb, s);
p = RSTRING_PTR(str);
pend = RSTRING_END(str);
while (p < pend) {
if (ISLOWER(*p)) {
*p = TOUPPER(*p);
modify = TRUE;
}
p++;
}
if (modify) return str;
return mrb_nil_value();
}
As you can see, there is a lot of mechanic in there to extract the underlying data structure from the Ruby String
object, iterate over that data structure making sure to not run over the end, etc., but the real work of actually converting to uppercase is actually performed by the TOUPPER
macro defined in include/mruby.h
:
#define TOUPPER(c) (ISLOWER(c) ? ((c) & 0x5f) : (c))
There you have it! That's how String#upcase
works "under the hood" in five different Ruby implementations: Rubinius, Opal, TruffleRuby, JRuby, and MRuby. And it will again be different in IronRuby, YARV, RubyMotion, Ruby.NET, XRuby, MagLev, MacRuby, tinyrb, MRI, IoRuby, or any of the other Ruby implementations of present, future, and past.
This shows you that there are many different ways of approaching how to implement something like String#upcase
in a Ruby implementation. There are almost as many different approaches as there are implementations!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.