R使用读取线从文本文件中提取条目的百分比

Question

嗨，我有一个很大的txt文件（字符），我想在其中提取10％的条目并将其保存到另一个txt文件中。

con1 <- file("ABC.txt", "rb")   # 2,36 mio DS
dfc1<-readLines(con1, ??? ,skipNul = TRUE)#

代替？？？ 我想要类似<所有数据的10％之类的东西。

所以，如果我的ABC.txt是

“ BBC Worldwide是英国广播公司（BBC）的主要商业部门和全资子公司。该业务的存在是为了支持BBC公共服务使命，并代表其实现利润最大化……”

我的新文件应仅包含以下内容的10％（随机）：

“全球业务代表...”

有没有办法在R中做到这一点？

谢谢

Answer 1

如果您读入文本文件，则可以使用以下代码使用stringr包来获取单词的10％随机样本：

text<- c("BBC Worldwide is a principle commercial arm and a wholly owned subsidiary of the British Broadcasting Corporation (BBC). The business exists to support the BBC public service mission and to maximise profits on its behalf...")
set.seed(9999)
library(stringr)
selection<-sample.int(str_count(text," ")+1, round(0.1*str_count(text," ")+1))
subset<-word(text, selection)

R使用读取线从文本文件中提取条目的百分比

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-03-03 17:20:10

R使用读取线从文本文件中提取条目的百分比

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-03-03 17:20:10

解决方案1
1 已采纳 2018-03-03 17:20:10