给定参数的格式问题

Question

I am trying to use textcat package for n-gram analysis, which has the following function: 我正在尝试使用textcat软件包进行n-gram分析，该软件包具有以下功能：

textcat(x, p = TC_char_profiles, method = "CT", ..., options = list())

The function specification indicates that 功能说明表明

The argument x can be a character vector of texts, or an R object which can be coerced to this using as.character. 参数x可以是文本的字符向量，也可以是R对象，可以使用as.character将其强制为该对象。

I do not know what does the "R object which can be coerced to this using as.character" mean? 我不知道“可以使用as.character强制使用的R对象”是什么意思？ In other words, I do not quite understand what should be the correct input format for this x in accordance with the above description. 换句话说，根据上述说明，我不太了解此x的正确输入格式。 Suppose I have a 100 documents. 假设我有100个文件。 How to transfer these documents into the format of x? 如何将这些文件转换为x格式？

Answer 1

You really have two questions here. 您这里确实有两个问题。

(1). （1）。 What does the "R object which can be coerced to this using as.character" mean? “可以使用as.character强制使用此功能的R对象”是什么意思？

That means that other classes of R object can be passed in, in place of one that is just character . 这意味着可以传递R对象的其他类，以代替仅仅是character 。 An example is a factor, where as.character(x) will drop the extra features provided and revert to a simple character vector. 一个例子是一个因素，其中as.character(x)将删除提供的额外功能并恢复为简单的字符向量。

as.character(1:2) ## will give a vector c("1", "2") as.character（1：2）##将给出向量c（“ 1”，“ 2”）

This extends for other derived classes, and it's a standard R idiom to provide a method for common functions like as.character that define a coercion from any given class to character. 这扩展到其他派生类，这是一种标准的R习惯用法，它为诸如as.character等通用功能提供一种方法，该方法定义从任何给定类到字符的强制转换。

(2). （2）。 In what format must my data be to input to textcat ? 我的数据必须以什么格式输入到textcat ？

In short, it must be a character vector or something that can be coerced to one . 简而言之，它必须是字符向量或可以强制为一个的东西 。 You are asking about documents, so presumably you have text files. 您正在询问文档，因此大概有文本文件。 The function readLines will provide a character vector from a text file, a vector as long as the number of lines in the file. readLines函数将提供文本文件中的字符向量，该向量与文件中的行数一样长。 Any more for this question needs a lot more detail from you about what the analysis is supposed to do, does it need to be broken into lines of text from a file? 对于这个问题，您还需要更多有关分析应该执行的详细信息，是否需要将其分解为文件中的文本行？ Broken into words? 变成文字？ Keep sets of lines/words from different files as separate sets? 将不同文件中的行/单词集分开保存？ And so on. 等等。

In really simplistic terms using the example in readLines , you could do something like this but further detail needs more information for your question: 使用readLines的示例，以非常简单的方式，您可以执行类似的操作，但更多细节需要更多有关您的问题的信息：

 cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file="ex.data",
     sep="\n")
 readLines("ex.data", n=-1)
 x <-      readLines("ex.data", n=-1)

 require(textcat)  
 textcat(x)

给定参数的格式问题

问题描述

1 个解决方案

解决方案1
1 2012-04-01 03:54:24

给定参数的格式问题

问题描述

1 个解决方案

解决方案1 1 2012-04-01 03:54:24

解决方案1
1 2012-04-01 03:54:24