简体   繁体   English

从R中的文件名中提取变量

[英]extracting variable from file names in R

I have files that contain multiple rows, I want to add two new rows that I create by extracting varibles from the filename and multipling them by current rows.我有包含多行的文件,我想通过从文件名中提取变量并将它们乘以当前行来添加我创建的两个新行。 For example I have a bunch of file that are named something like this例如,我有一堆文件,它们的名字是这样的

file1[1000,1001].txt

file1[2000,1001].txt

between the [] there are always 2 numbers spearated by a comma[]之间总是有 2 个数字用逗号分隔

the file itself has multiple columns, for example column1 & column2文件本身有多个列,例如column1 & column2

I want for each file to extract the 2 values in the name of the file and then use them as variables to make 2 new columns that used the variable to modify the values.我希望每个文件都提取文件名中的 2 个值,然后将它们用作变量来创建 2 个使用该变量修改值的新列。

for example例如

file1[1000,2000]

the file contains two columns该文件包含两列

column1    column2
1             2
2             4

I want at the end to add the first file name value to column 1 to create column3 and add the second file name value to column 2 to create column 4, ending up with something like this我想最后将第一个文件名值添加到第 1 列以创建第 3 列,并将第二个文件名值添加到第 2 列以创建第 4 列,最终得到这样的结果

column1  column2 column3 column4
1            2     1001     2002
2            4     1002     2004

thanks for the help.感谢您的帮助。 I am almost there just a few more issues original files has 2 columns "X_Parameter" "Y_Parameter", the file name is "test(64084,4224).txt your code works great at extracting the two values V1 "64084" and V2 "4224" from the file name. I then add these values to the original data set. this yields 4 columns. "X_Parameter" "Y_Parameter" "V1" "V2".我几乎还有几个问题 原始文件有 2 列“X_Parameter”“Y_Parameter”,文件名是“test(64084,4224).txt 你的代码在提取两个值 V1“64084”和 V2 时效果很好“ 4224”。然后我将这些值添加到原始数据集中。这会产生 4 列。“X_Parameter”“Y_Parameter”“V1”“V2”。

setwd("~/Desktop/txt/")
txt_names = list.files(pattern = ".txt")
for (i in 1:length(txt_names)){assign(txt_names[i], read.delim(txt_names[i]))
DS1 <- read.delim(file = txt_names[i], header = TRUE, stringsAsFactors = TRUE)
require(stringr)
remove_text <- str_extract(txt_names, pattern = "\\[[0-9,0-9]+\\]")
step1 <- gsub("(\\[)", "", remove_text)
step2 <- gsub("(\\])", "", step1)
DS2<-as.data.frame(do.call("rbind", (str_split(step2, ","))))
DS1$V1<-DS2$V1
DS1$V2<-DS2$V2

My issue arises when tying to sum "X_Parameter" and "V1" to make "absoluteX" and sum "Y_Parameter"with "V2" to make "absoluteY" for each row.当将“X_Parameter”和“V1”相加以生成“absoluteX”并将“Y_Parameter”与“V2”相加以生成每行的“absoluteY”时,我的问题就出现了。

below are the two ways I have tried with the errors以下是我尝试过的两种错误方法

DS1$absoluteX<-DS1$X_Parameter+DS1$V1

error In Ops.factor(DS1$X_Parameter, DS1$V1) : '+' not meaningful for factors Ops.factor(DS1$X_Parameter, DS1$V1) 中的错误:“+”对因子没有意义

other try was其他尝试是

DS1$absoluteX<-rowSums(DS1[,c(“X_Parameter”,”V1”)])

error Error in rowSums(DS1[, c("X_Parameter", "V1")]) : 'x' must be numeric错误 rowSums(DS1[, c("X_Parameter", "V1")]) 错误:'x' 必须是数字

I have tried using我试过使用

as.numeric(DS1$V1) 

that causes all values to become 1导致所有值变为 1

Any thoughts?Thanks有什么想法吗?谢谢

You can extract the numbers from a vector of file names as follows (not sure it is the shortest possible code, but it seems to work)您可以按如下方式从文件名向量中提取数字(不确定它是否是最短的代码,但它似乎有效)

fnams<-c("file1[1000,2000].txt","file1[1500,2500].txt")
opsqbr<-regexpr("\\[",fnams)
comm<-regexpr(",",fnams)
clsqbr<-regexpr("\\]",fnams)
reslt<-data.frame(col1=as.numeric(substring(fnams,opsqbr+1,comm-1)),
                  col2=as.numeric(substring(fnams,comm+1,clsqbr-1)))
reslt

Which yields哪个产量

  col1 col2
1 1000 2000
2 1500 2500

Once you have this data frame,it is easy to sequentially read the files and do the addition一旦你有了这个数据框,就很容易顺序读取文件并进行加法

## set path to wherever your files are
setwd("path")

## make a vector with names of your files
txt_names <- list.files(pattern = ".txt") # use this to make a complete list of names



## read your files in
for (i in 1:length(txt_names)) assign(txt_names[i], read.csv(txt_names[i], sep = "whatever your separator is"))


## for now I'm making a dummy vector and data frame
txt_names <- c("[1000,2000]")
ds1 <- data.frame(column1 = c(1,2), column2 = c(2,4))


## grab the text you require from the file names
require(stringr)

remove_text <- str_extract(txt_names, pattern = "\\[[0-9,0-9]+\\]")
step1 <- gsub("(\\[)", "", remove_text)
step2 <- gsub("(\\])", "", step1)

## step2 should look like this
> step2

[1] "1000,1001"

## split each string and convert to data frame with two columns 
ds2 <- as.data.frame(do.call("rbind", (str_split(step2, ","))))

## cbind with the file
df <- cbind(ds1, ds2)

## coerce factor columns to numeric
df$V1 <- as.numeric(as.character(df$V1))
df$V2 <- as.numeric(as.character(df$V2))

## perform the operation to change the columns

df$V1 <- df$column1 + df$V1
df$V2 <- df$column2 + df$V2

NOw you have a data.frame with two columns , each containing the file name parts you need.现在您有一个包含两列的 data.frame,每列包含您需要的文件名部分。 Just rep them times length of each of your data.frames and cbind.只需将它们乘以每个 data.frames 和 cbind 的长度即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM