简体   繁体   English

如何在R编程中启动for循环

[英]how to start a for loop in R programming

I'm new to programming and I wrote a code that finds spam words for the first email but I would like to write a for loop that would do this for all of the emails. 我是编程的新手,我写了一个代码来查找第一封电子邮件中的垃圾邮件,但是我想编写一个for循环,对所有电子邮件都执行此操作。 Any help would be appreciated. 任何帮助,将不胜感激。 Thank you. 谢谢。

words = grepl("viagra", spamdata[[ 1 ]]$header[ "Subject"])

I presume that you want to loop over the elements of spamdata and build up an indicator whether the string "viagra" is found in the subject lines of your emails. 我假设您想遍历spamdata各个元素,并建立一个指标,以确定是否在电子邮件的主题行中找到了字符串"viagra"

Lets set up some dummy data for illustration purposes: 让我们设置一些虚拟数据用于说明目的:

subjects <- c("Buy my viagra", "Buy my Sildenafil citrate",
              "UK Lottery Win!!!!!")
names(subjects) <- rep("Subject", 3)
spamdata <- list(list(Header = subjects[1]), list(Header = subjects[2]),
                 list(Header = subjects[3]))

Next we create a vector words to hold the result of each iteration of the loop. 接下来,我们创建一个向量words来保存循环的每次迭代的结果。 You do not want to be growing words or any other object at each iteration - that will force copying and will slow your loop down. 您不想在每次迭代时都增加words或其他任何对象-这样会强制复制并降低循环速度。 Instead allocate storage before you begin - here using the length of the list over which we want to loop: 而是在开始之前分配存储空间-在这里使用我们要循环遍历的列表的长度:

words <- logical(length = length(spamdata))

You can set up a loop as so 您可以这样设置一个循环

## seq_along() creates a sequence of 1:length(spamdata) 
for(i in seq_along(spamdata)) {
    words[ i ] <- grepl("viagra", spamdata[[ i ]]$Header["Subject"])
}

We can then look at words : 然后我们可以看一下words

> words
[1]  TRUE FALSE FALSE

Which matches what we know from the made up subjects. 哪个与我们从虚构主题中了解的内容相匹配。

Notice how we used i as a place holder for 1 , 2 , and 3 - at each iteration of the loop, i takes on the next value in the sequence 1 , 2 , 3 so we can i) access the i th component of spamdata to get the next subject line, and ii) access the i th element of words to store the result of the grepl() call. 注意我们是如何使用的i作为占位符12 ,和3 -在循环的每次迭代中, i将在下次值序列中的123 ,所以我们可以ⅰ)访问的i的个分量spamdata以获得下一个主题行, 并且 ii)访问wordsi个元素以存储grepl()调用的结果。

Note that instead of an implicit loop we could also use the sapply() or lapply() functions, which create the loop for you but might need a bit of work to write a custom function. 请注意,除了隐式循环,我们还可以使用sapply()lapply()函数,它们为您创建了循环,但可能需要一些工作来编写自定义函数。 Instead of using grepl() directly, we can write a wrapper: 代替直接使用grepl() ,我们可以编写包装器:

foo <- function(x) {
    grepl("viagra", x$Header["Subject"])
}

In the above function we use x instead of the list name spamdata because when lapply() and sapply() loop over the spamdata list, the individual components (referenced by spamdata[[i]] in the for() loop) get passed to our function as argument x so we only need to refer to x in the grepl() call. 在上面的函数中,我们使用x代替列表名称spamdata因为当lapply()sapply()spamdata列表上循环时,各个组件(由for()循环中的spamdata[[i]]引用)传递给我们的函数作为参数x因此我们只需要在grepl()调用中引用x

This is how we could use our wrapper function foo() in lapply() or sapply() , first lapply() : 这就是我们可以在lapply()sapply()首先使用lapply()情况下使用包装函数foo() lapply()

> lapply(spamdata, foo)
[[1]]
[1] TRUE

[[2]]
[1] FALSE

[[3]]
[1] FALSE

sapply() will simplify the returned object where possible, as follows: sapply()将尽可能简化返回的对象,如下所示:

> sapply(spamdata, foo)
[1]  TRUE FALSE FALSE

Other than that, they work similarly. 除此之外,它们的工作方式相似。

Note we can make our wrapper function foo() more useful by allowing it to take an argument defining the spam word you wish to search for: 请注意,我们可以使包装函数foo()更有用,方法是允许它接受定义您要搜索的垃圾邮件词的参数:

foo <- function(x, string) {
    grepl(string, x$Header["Subject"])
}

We can pass extra arguments to our functions with lapply() and sapply() like this: 我们可以使用lapply()sapply()将额外的参数传递给函数,如下所示:

> sapply(spamdata, foo, string = "viagra")
[1]  TRUE FALSE FALSE
> sapply(spamdata, foo, string = "Lottery")
[1] FALSE FALSE  TRUE

Which you will find most useful ( for() loop or the lapply() , sapply() versions) will depend on your programming background and which you find most familiar. 您会发现最有用的( for()循环或lapply()sapply()版本)将取决于您的编程背景以及您最熟悉的背景。 Sometimes for() is easier and simpler to use, but perhaps more verbose (which isn't always a bad thing!), whilst lapply() and sapply() are quite succinct and useful where you don't need to jump through hoops to create a workable wrapper function. 有时for()更易于使用和简单,但可能更冗长(这并不总是一件坏事!),而lapply()sapply()则非常简洁实用,在您不需要跳过篮球的情况下创建一个可行的包装函数。

In R a loopstakes this form, where variable is the name of your iteration variable, and sequence is a vector or list of values: 在R中,采用这种形式,其中variable是迭代变量的名称,而sequence是向量或值列表:

for (variable in sequence) expression 用于(顺序可变)表达式

The expression can be a single R command - or several lines of commands wrapped in curly brackets: 该表达式可以是单个R命令-或用大括号括起来的几行命令:

for (variable in sequence) { 
    expression
    expression
    expression
}

In this case it would be for(words){ do whatever you want to do} 在这种情况下,for(words){做您想做的事}

Also

Basic loop theory 基本循环理论

The basic structure for loop commands is: for(i in 1:n){stuff to do} , where n is the number of times the loop will execute. 循环命令的基本结构为: for(i in 1:n){stuff to do} ,其中n是循环将执行的次数。

listname[[1]] refers to the first element in the list “listname.” listname[[1]]引用列表“ listname”中的第一个元素。

In a for loop, listname[[i]] refers to the variable corresponding to the ith iteration of the for loop. 在for循环中, listname[[i]]是指与for循环的第i次迭代相对应的变量。

The code for(i in 1:length(yesnovars)) tells the loop to execute only once for each variable in the list. for(i in 1:length(yesnovars))告诉循环仅对列表中的每个变量执行一次。

Answer taken from the following sources: 答案来自以下来源:
Loops in R R中的循环
Programming in R 用R编程

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM