简体   繁体   English

给定参数的格式问题

[英]issues on the format for a given argument

I am trying to use textcat package for n-gram analysis, which has the following function: 我正在尝试使用textcat软件包进行n-gram分析,该软件包具有以下功能:

textcat(x, p = TC_char_profiles, method = "CT", ..., options = list())

The function specification indicates that 功能说明表明

The argument x can be a character vector of texts, or an R object which can be coerced to this using as.character. 参数x可以是文本的字符向量,也可以是R对象,可以使用as.character将其强制为该对象。

I do not know what does the "R object which can be coerced to this using as.character" mean? 我不知道“可以使用as.character强制使用的R对象”是什么意思? In other words, I do not quite understand what should be the correct input format for this x in accordance with the above description. 换句话说,根据上述说明,我不太了解此x的正确输入格式。 Suppose I have a 100 documents. 假设我有100个文件。 How to transfer these documents into the format of x? 如何将这些文件转换为x格式?

You really have two questions here. 您这里确实有两个问题。

(1). (1)。 What does the "R object which can be coerced to this using as.character" mean? “可以使用as.character强制使用此功能的R对象”是什么意思?

That means that other classes of R object can be passed in, in place of one that is just character . 这意味着可以传递R对象的其他类,以代替仅仅是character An example is a factor, where as.character(x) will drop the extra features provided and revert to a simple character vector. 一个例子是一个因素,其中as.character(x)将删除提供的额外功能并恢复为简单的字符向量。

as.character(1:2) ## will give a vector c("1", "2") as.character(1:2)##将给出向量c(“ 1”,“ 2”)

This extends for other derived classes, and it's a standard R idiom to provide a method for common functions like as.character that define a coercion from any given class to character. 这扩展到其他派生类,这是一种标准的R习惯用法,它为诸如as.character等通用功能提供一种方法,该方法定义从任何给定类到字符的强制转换。

(2). (2)。 In what format must my data be to input to textcat ? 我的数据必须以什么格式输入到textcat

In short, it must be a character vector or something that can be coerced to one . 简而言之,它必须是字符向量或可以强制为一个的东西 You are asking about documents, so presumably you have text files. 您正在询问文档,因此大概有文本文件。 The function readLines will provide a character vector from a text file, a vector as long as the number of lines in the file. readLines函数将提供文本文件中的字符向量,该向量与文件中的行数一样长。 Any more for this question needs a lot more detail from you about what the analysis is supposed to do, does it need to be broken into lines of text from a file? 对于这个问题,您还需要更多有关分析应该执行的详细信息,是否需要将其分解为文件中的文本行? Broken into words? 变成文字? Keep sets of lines/words from different files as separate sets? 将不同文件中的行/单词集分开保存? And so on. 等等。

In really simplistic terms using the example in readLines , you could do something like this but further detail needs more information for your question: 使用readLines的示例,以非常简单的方式,您可以执行类似的操作,但更多细节需要更多有关您的问题的信息:

 cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file="ex.data",
     sep="\n")
 readLines("ex.data", n=-1)
 x <-      readLines("ex.data", n=-1)

 require(textcat)  
 textcat(x)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM