简体   繁体   English

R-提取会在文件之间更改位置的数据(txt)

[英]R - extract data which changes position from file to file (txt)

I have a folder with tons of txt files from where I have to extract especific data. 我有一个包含大量txt文件的文件夹,必须从中提取特定数据。 The problem is that the format of the file has changed once and the position of the data I need to extract has also changed. 问题在于文件的格式已更改一次,并且我需要提取的数据的位置也已更改。 So I need to deal with files in different format. 所以我需要处理不同格式的文件。

To try to make it more clear, in column 4 I have the name of the variable and in 5 I have the value, but sometimes this is in a different row. 为了更清楚一点,在第4列中有变量名,在第5列中有值,但有时这是在另一行中。 Is there a way to find the name of the variable (in which row) and then extract its value? 有没有一种方法可以找到变量的名称(在哪一行),然后提取其值?

Thanks in advance 提前致谢

EDITING 编辑中

In some files I will have the data like this: 在某些文件中,我将得到如下数据:

Column 1-------Column 2. 第1列------第2列

Device ID------A. 设备ID ------ A。

Voltage------- 500. 电压------- 500。

Current--------28 当前-------- 28

But in some point in life, there was a change in the software to add another variable and the new file iis like this: 但是在生活中的某个时刻,软件发生了变化,添加了另一个变量,新文件如下所示:

Column 1-------Column 2. 第1列------第2列

Device ID------A. 设备ID ------ A。

Voltage------- 500. 电压------- 500。

Error------------5. 错误------------ 5。

Current--------28 当前-------- 28

So I need to deal with these 2 types of data, extracting the same variables which are in different rows. 因此,我需要处理这两种数据,提取位于不同行中的相同变量。

If these files can't be read with read.table use readLines and then find those lines that start with the keyword you need. 如果无法使用read.table读取这些文件,请使用readLines ,然后找到以所需关键字开头的行。

For example: 例如:

Sample file 1 (with the dashes included and extra line breaks): 示例文件1(包括破折号和额外的换行符):

Column 1-------Column 2.

Device ID------A.

Voltage------- 500.

Error------------5.

Current--------28

Sample file2 (with a comma as separator): 示例文件2(以逗号作为分隔符):

Column 1,Column 2.
Device ID,A.
Current,555
Voltage, 500.
Error,5.

For both cases do: 对于这两种情况,请执行以下操作:

text = readLines(con = file("your filename here"))
curr = text[grepl("^Current", text, ignore.case = T)]

Which returns: 哪个返回:

for file 1: 对于文件1:

[1] "Current--------28"

for file 2: 对于文件2:

[1] "Current,555"

Then use gsub to remove anything that is not a number. 然后使用gsub删除所有非数字的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM