简体   繁体   中英

How can I extend a function in R to run parallely on different datasets?

I have made a function find.string() which accepts a string and outputs different patterns in it.

For example: find.string("abcabcabc") - "abc"

Now, what I wish to do is, I have two large datasets containing multiple rows of character vectors (like the one mentioned above). Now, I want to run this function parallely on both of them. The datasets are of the form:

1 2 "abcabcabc"
2 3 "adcadcadc"
3 4 "yufyufyuf"
4 5 "xyzxyzxyz"
..............

And similarly, with the first two columns being the same, and only the third column changing, I have,

1 2 "fbfbfbfbfb"
2 3 "bbfbfbfbbf"
3 4 "fbffffbfbf"
4 5 "fbfbbbbbbb"
...............

So, basically, on merging these two datasets, I will have,

1 2 "abcabcabc" "fbfbfbfbfb"
2 3 "adcadcadc" "bbfbfbfbbf"
3 4 "yufyufyuf" "fbffffbfbf"
4 5 "xyzxyzxyz" "fbfbbbbbbb"
...........................

Now, I want to run the function parallely on both the third and columns character vectors and store the output. How can I do it in R?

Perhaps a data.table approach would be faster than trying to parallelize your code, but I would need a sample of your data to make sure this answer addresses your question

library(data.table)

cols <- c("colstring1", "colstring2")

setDT(data)[, (cols) := lapply (cols, function(x)  find.string(x) )]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM