简体   繁体   中英

What happens when a method is used on an object created from a built in class?

I understand that classes are like mold from which you can create objects, and a class defines a number of methods and variables (class,instances,local...) inside of it.

Let's say we have a class like this:

class Person
  def initialize (name,age)
    @name = name
    @age = age
  end 
  
  def greeting
    "#{@name} says hi to you!"
  end 
end 

me = Person.new "John", 34
puts me.greeting

As I can understand, when we call Person.new we are creating an object of class Person and initializing some internal attributes for that object, which will be stored in the instance variables @name and @age . The variable me will then be a reference to this newly created object.

When we call me.greeting , what happens is that greeting method is called on the object referenced by me, and that method will use the instance variable @name that is directly tied/attached to that object.

Hence, when calling a method on an object you are actually "talking" to that object, inspecting and using its attributes that are stored in its instance variables. All good for now.

Let's say now that we have the string "hello" . We created it using a string literal, just like: string = "hello" .

My question is, when creating an object from a built in class (String, Array, Integer...), are we actually storing some information on some instance variables for that object during its creation?

My doubt arises because I can't understand what happens when we call something like string.upcase , how does the #upcase method "work" on string ? I guess that in order to return the string in uppercase, the string object previously declared has some instance variables attached to, and the instances methods work on those variables?

Sorry to disillusion you, but Ruby is not written in Ruby. It's written in C. A string may present itself as an object to you, but under the hood it is some C storage — and C has no instance variables, and no classes; it is not an object oriented language at all. The upcase method is mapped to a C function and simply produces a new sequence of characters. See https://github.com/ruby/ruby/blob/6d742c9412d444650d705b65bc2d5c850054c226/string.c#L7407 for the source code.

My question is, when creating an object from a built in class (String, Array, Integer...), are we actually storing some information on some instance variables for that object during its creation?

Yes, we are, basically:

string = "hello" is shorthand for string = String.new("hello")

take a look at the following:

https://ruby-doc.org/core-3.1.2/String.html#method-c-new (ruby 3)

https://ruby-doc.org/core-2.3.0/String.html#method-c-new (ruby 2)

What's the difference between String.new and a string literal in Ruby?

You can also check the following (to extend the functionalities of the class):

Extend Ruby String class with method to change the contents

So the short answer is:

Dealing with built in classes (String, Array, Integer, ...etc) is almost the same thing as we do in any other class we create

Hence, when calling a method on an object you are actually "talking" to that object, inspecting and using its attributes that are stored in its instance variables. All good for now.

No, that is very much not what you are doing in an Object-Oriented Program. (Or really any well-designed program.)

What you are describing is a break of encapsulation, abstraction, and information hiding . You should never inspect and/or use another object's instance variables or any of its other private implementation details.

In Object-Orientation , all computation is performed by sending messages between objects. The only thing you can do is sending messages to objects and the only thing you can observe about an object is the responses to those messages.

Only the object itself can inspect and use its attributes and instance variables. No other object can, not even objects of the same type.

If you send an object a message and you get a response, the only thing you know is what is in that response. You don't know how the object created that response: did the object compute the answer on the fly? Was the answer already stored in an instance variable and the object just responded with that? Did the object delegate the problem to a different object? Did it print out the request, fax it to a temp agency in the Philippines, and have a worker compute the answer by hand with pen and paper? You don't know. You can't know. You mustn't know. That is at the very heart of Object-Orientation.

This is, BTW, exactly how messaging works in real-life. If you send someone a message asking "what is π²" and they answer with "9.8696044011", then you have no idea whether they computed this by hand, used a calculator, used their smart phone, looked it up, asked a friend, or hired someone to answer the question for them.

You can imagine objects as being little computers themselves: they have internal storage, RAM, HDD, SSD, etc. (instance variables), they have code running on them, the OS, the basic system libraries, etc. (methods), but one computer cannot read another computer's RAM (access its instance variables) or run its code (execute its methods). It can only send it a request over the network and look at the response.

So, in some sense, your question is meaningless: from the point of view of Object-Oriented Abstraction, is should be impossible to answer your question, because it should be impossible to know how an object is implemented internally.

It could use instance variables, or it could not. It could be implemented in Ruby, or it could be implemented in another programming language. It could be implemented as a standard Ruby object, or it could be implemented as some secret internal private part of the Ruby implementation.

In fact, it could even not exist at all, (For example. in many Ruby implementations small integers do not actually exist as objects at all. The Ruby implementation will just make it look like they do.)

My question is, when creating an object from a built in class (String, Array, Integer...), are we actually storing some information on some instance variables for that object during its creation?

[…] [W]hat happens when we call something like string.upcase , how does the #upcase method "work" on string ? I guess that in order to return the string in uppercase, the string object previously declared has some instance variables attached to, and the instances methods work on those variables?

There is nothing in the Ruby Language Specification that says how the String#upcase method is implemented. The Ruby Language Specification only says what the result is , but it doesn't say anything about how the result is computed .

Note that this is not specific to Ruby. This is how pretty much every programming language works. The Specification says what the results should be, but the details of how to compute those results is left to the implementor. By leaving the decision about the internal implementation details up to the implementor, this frees the implementor to choose the most efficient, most performant implementation that makes sense for their particular implementation.

For example, in the Java platform, there are existing methods available for converting a string to upper case. Therefore, in an implementation like TruffleRuby, JRuby, or XRuby, which sits on top of the Java platform, it makes sense to just call the existing Java methods for converting strings to upper case. Why waste time implementing an algorithm for converting strings to upper case when somebody else has already done that for you? Likewise, in an implementation like IronRuby or Ruby.NET, which sit on top of the .NET platform, you can just use .NET's builtin methods for converting strings to upper case. In an implementation like Opal, you can just use ECMAScript's methods for converting strings to upper case. And so on.

Unfortunately, unlike many other programming languages, the Ruby Language Specification does not exist as a single document in a single place). Ruby does not have a single formal specification that defines what certain language constructs mean.

There are several resources, the sum of which can be considered kind of a specification for the Ruby programming language.

Some of these resources are:

  • The ISO/IEC 30170:2012 Information technology — Programming languages — Ruby specification – Note that the ISO Ruby Specification was written around 2009–2010 with the specific goal that all existing Ruby implementations at the time would easily be compliant. Since YARV only implements Ruby 1.9+ and MRI only implements Ruby 1.8 and lower, this means that the ISO Ruby Specification only contains features that are common to both Ruby 1.8 and Ruby 1.9. Also, the ISO Ruby Specification was specifically intended to be minimal and only contain the features that are absolutely required for writing Ruby programs. Because of that, it does for example only specify String s very broadly (since they have changed significantly between Ruby 1.8 and Ruby 1.9). It obviously also does not specify features which were added after the ISO Ruby Specification was written, such as Ractors or Pattern Matching.
  • The Ruby Spec Suite aka ruby/spec – Note that the ruby/spec is unfortunately far from complete. However, I quite like it because it is written in Ruby instead of "ISO-standardese", which is much easier to read for a Rubyist, and it doubles as an executable conformance test suite.
  • The Ruby Programming Language by David Flanagan and Yukihiro 'matz' Matsumoto – This book was written by David Flanagan together with Ruby's creator matz to serve as a Language Reference for Ruby.
  • Programming Ruby by Dave Thomas, Andy Hunt, and Chad Fowler – This book was the first English book about Ruby and served as the standard introduction and description of Ruby for a long time. This book also first documented the Ruby core library and standard library, and the authors donated that documentation back to the community.
  • The Ruby Issue Tracking System , specifically, the Feature sub-tracker – However, please note that unfortunately, the community is really, really bad at distinguishing between Tickets about the Ruby Programming Language and Tickets about the YARV Ruby Implementation: they both get intermingled in the tracker.
  • The Meeting Logs of the Ruby Developer Meetings .
  • New features are often discussed on the mailing lists , in particular the ruby-core (English) and ruby-dev (Japanese) mailing lists.
  • The Ruby documentation – Again, be aware that this documentation is generated from the source code of YARV and does not distinguish between features of Ruby and features of YARV.
  • In the past, there were a couple of attempts of formalizing changes to the Ruby Specification, such as the Ruby Change Request (RCR) and Ruby Enhancement Proposal (REP) processes, both of which were unsuccessful.
  • If all else fails, you need to check the source code of the popular Ruby implementations to see what they actually do.

For example, this is what the ISO/IEC 30170:2012 Information technology — Programming languages — Ruby specification has to say about String#upcase :

15.2.10.5.42 String#upcase

upcase

  • Visibility : public
  • Behavior : The method returns a new direct instance of the class String which contains all the characters of the receiver, with all the lower-case characters replaced with the corresponding upper-case characters.

As you can see, there is no mention of instances variables or really any details at all about how the method is implemented. It only specifies the result.

If a Ruby implementor wants to use instance variables, they are allowed to use instances variables, if a Ruby implementor doesn't want to use instance variables, they are allowed to do that, too.

If you check the Ruby Spec Suite for String#upcase , you will find specifications like these (this is just an example, there are quite a few more):

describe "String#upcase" do
  it "returns a copy of self with all lowercase letters upcased" do
    "Hello".upcase.should == "HELLO"
    "hello".upcase.should == "HELLO"
  end

  describe "full Unicode case mapping" do
    it "works for all of Unicode with no option" do
      "äöü".upcase.should == "ÄÖÜ"
    end

    it "updates string metadata" do
      upcased = "aßet".upcase

      upcased.should == "ASSET"
      upcased.size.should == 5
      upcased.bytesize.should == 5
      upcased.ascii_only?.should be_true
    end
  end
end

Again, as you can see, the Spec only describes results but not mechanisms . And this is very much intentional.

The same is true for the Ruby-Doc documentation of String#upcase :

upcase(*options)string

Returns a string containing the upcased characters in self :

 s = 'Hello World.' # => "Hello World!" s.upcase # => "HELLO WORLD!"

The casing may be affected by the given options ; see Case Mapping .

There is no mention of any particular mechanism here, nor in the linked documentation about Unicode Case Mapping.

All of this only tells us how String#upcase is specified and documented , though. But how is it actually implemented ? Well, lucky for us, the majority of Ruby implementations are Free and Open Source Software, or at the very least make their source code available for study.

In Rubinius , you can find the implementation of String#upcase in core/string.rb lines 819–822 and it looks like this:

 def upcase str = dup str.upcase! || str end

It just delegates the work to String#upcase! , so let's look at that next, it is implemented right next to String#upcase in core/string.rb lines 824–843 and looks something like this (simplified and abridged):

 def upcase: return if @num_bytes == 0 ctype = Rubinius:.CType i = 0 while i < @num_bytes c = @data[i] if ctype.islower(c) @data[i] = ctype.toupper!(c) end i += 1 end end

So, as you can see, this is indeed just standard Ruby code using instance variables like @num_bytes which holds the length of the String in platform bytes and @data which is an Array of platform bytes holding the actual content of the String . It uses two helper methods from the Rubinius::CType library (a library for manipulating individual characters as byte-sized integers). The "actual" conversion to upper case is done by Rubinius::CType::toupper! , which is implemented in core/ctype.rb and is extremely simple (to the point of being simplistic):

 def self.toupper!(num) num - 32 end

Another very simple example is the implementation of String#upcase in Opal , which you can find in opal/corelib/string.rb and looks like this:

 def upcase `self.toUpperCase()` end

Opal is an implementation of Ruby for the ECMAScript platform. Opal cleverly overloads the Kernel#` method, which is normally used to spawn a sub shell (which doesn't exist in ECMAScript) and execute commands in the platform's native command language (which on the ECMAScript platform arguably is ECMAScript). In Opal, Kernel#` is instead used to inject arbitrary ECMAScript code into Ruby.

So, all that `self.toUpperCase()` does, is call theString.prototype.toUpperCase method on self , which does work because of how the String class is defined in Opal :

 class::String < `String`

In other words, Opal implements Ruby's String class by simply inheriting from ECMAScript's String "class" (really the String Constructor function ) and is therefore able to very easily and elegantly reuse all the work that has been done implementing String s in ECMAScript.

Another very simple example is TruffleRuby . Its implementation of String#upcase can be found in src/main/ruby/truffleruby/core/string.rb and looks like this:

 def upcase(*options) s = Primitive.dup_as_string_instance(self) s.upcase!(*options) s end

Similar to Rubinius, String#upcase just delegates to String#upcase! , which is not surprising since TruffleRuby's core library was originally forked from Rubinius's. This is what String#upcase! looks like :

 def upcase:(*options) mapped_options = Truffle:.StringOperations,validate_case_mapping_options(options. false) Primitive,string_upcase! self, mapped_options end

The Truffle::StringOperations::valdiate_case_mapping_options helper method is not terribly interesting, it is just used to implement the rather complex rules for what the Case Mapping Options that you can pass to the various String methods are allowed to look like. The actual "meat" of TruffleRuby's implementation of String#upcase! is just this: Primitive.string_upcase, self, mapped_options .

The syntax Primitive.some_name was agreed upon between the developers of multiple Ruby implementations as "magic" syntax within the core of the implementation itself to be able to call out from Ruby code into "primitives" or "intrinsics" that are provided by the runtime system, but are not necessarily implemented in Ruby.

In other words, all that Primitive.string_upcase, self, mapped_options tells us is "there is a magic function called string_upcase! defined somewhere deep in the bowels of TruffleRuby itself, which knows how to convert a string to upper case, but we are not supposed to know how it works".

If you are really curious, you can find the implementation of Primitive.string_upcase! in src/main/java/org/truffleruby/core/string/StringNodes.java . The code looks dauntingly long and complex, but all you really need to know is that the Truffle Language Implementation Framework is based on constructing Nodes for an AST-walking interpreter. Once you ignore all the machinery related to constructing the AST nodes, the code itself is actually fairly simple.

Once again, the implementors are relying on the fact that the Truffle Language Implementation Framework already comes with a powerful implementation of strings , which the TruffleRuby developers can simply reuse for their own strings.

By the way, this idea of "primitives" or "intrinsics" is an idea that is used in many programming language implementations. It is especially popular in the Smalltalk world. It allows you to write the definition of your methods in the language itself, which in turn allows features like reflection and tools like documentation generators and IDEs (eg for automatic code completion) to work without them having to understand a second language, but still have an efficient implementation in a separate language with privileged access to the internals of the implementation.

For example, because large parts of YARV are implemented in C instead of Ruby, but YARV is the implementation that the documentation on Ruby-Doc and Ruby-Lang is generated from, that means that the RDoc Ruby Documentation Generator actually needs to understand both Ruby and C. And you will notice that sometimes documentation for methods implemented in C is missing, incomplete, or corrupted. Similarly, trying to get information about methods implemented in C usingMethod#parameters sometimes returns non-sensical or useless results. This would not happen if YARV used something like Intrinsics instead of directly writing the methods in C.

JRuby implements String#upcase in several overloads of org.jruby.RubyString.upcase and String#upcase! in several overloads of org.jruby.RubyString.upcase_bang .

However, in the end, they all delegate to one specific overload of org.jruby.RubyString.upcase_bang defined in core/src/main/java/org/jruby/RubyString.java like this:

private IRubyObject upcase_bang(ThreadContext context, int flags) {
    modifyAndKeepCodeRange();
    Encoding enc = checkDummyEncoding();
    if (((flags & Config.CASE_ASCII_ONLY) != 0 && (enc.isUTF8() || enc.maxLength() == 1)) ||
            (flags & Config.CASE_FOLD_TURKISH_AZERI) == 0 && getCodeRange() == CR_7BIT) {
        int s = value.getBegin();
        int end = s + value.getRealSize();
        byte[]bytes = value.getUnsafeBytes();
        while (s < end) {
            int c = bytes[s] & 0xff;
            if (Encoding.isAscii(c) && 'a' <= c && c <= 'z') {
                bytes[s] = (byte)('A' + (c - 'a'));
                flags |= Config.CASE_MODIFIED;
            }
            s++;
        }
    } else {
        flags = caseMap(context.runtime, flags, enc);
        if ((flags & Config.CASE_MODIFIED) != 0) clearCodeRange();
    }

    return ((flags & Config.CASE_MODIFIED) != 0) ? this : context.nil;
}

As you can see, this is is a very low-level way of implementing it.

In MRuby , the implementation looks again very different. MRuby is designed to be light-weight, small, and easy to embed into a larger application. It is also designed to be used in small embedded systems such as robots, sensors, and IoT devices. Because of that, it is designed to be very modular: a lot of the parts of MRuby are optional and are distributed as "MGems". Even parts of the core language are optional and can be left out, such as support for the catch and throw keywords, big numbers, the Dir class, meta programming, eval , the Math module, IO and File , and so on.

If we want to find out where String#upcase is implemented, we have to follow a trail of breadcrumbs. We start with the mrb_str_upcase function in src/string.c which looks like this:

static mrb_value
mrb_str_upcase(mrb_state *mrb, mrb_value self)
{
  mrb_value str;

  str = mrb_str_dup(mrb, self);
  mrb_str_upcase_bang(mrb, str);
  return str;
}

This is a pattern we have already seen a couple of times: String#upcase just duplicates the String and then delegates to String#upcase! , which is implemented just above in mrb_str_upcase_bang :

static mrb_value
mrb_str_upcase_bang(mrb_state *mrb, mrb_value str)
{
  struct RString *s = mrb_str_ptr(str);
  char *p, *pend;
  mrb_bool modify = FALSE;

  mrb_str_modify_keep_ascii(mrb, s);
  p = RSTRING_PTR(str);
  pend = RSTRING_END(str);
  while (p < pend) {
    if (ISLOWER(*p)) {
      *p = TOUPPER(*p);
      modify = TRUE;
    }
    p++;
  }

  if (modify) return str;
  return mrb_nil_value();
}

As you can see, there is a lot of mechanic in there to extract the underlying data structure from the Ruby String object, iterate over that data structure making sure to not run over the end, etc., but the real work of actually converting to uppercase is actually performed by the TOUPPER macro defined in include/mruby.h :

#define TOUPPER(c) (ISLOWER(c) ? ((c) & 0x5f) : (c))

There you have it! That's how String#upcase works "under the hood" in five different Ruby implementations: Rubinius, Opal, TruffleRuby, JRuby, and MRuby. And it will again be different in IronRuby, YARV, RubyMotion, Ruby.NET, XRuby, MagLev, MacRuby, tinyrb, MRI, IoRuby, or any of the other Ruby implementations of present, future, and past.

This shows you that there are many different ways of approaching how to implement something like String#upcase in a Ruby implementation. There are almost as many different approaches as there are implementations!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM