在MATLAB中提取单引号之间的文本

Question

I have multiple lines in some text files such as 我在一些文本文件中有多行，例如

.model sdata1 s tstonefile='../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p' passive=2

I want to extract the text between the single quotes in MATLAB. 我想在MATLAB中的单引号之间提取文本。

Much help would be appreciated. 很多帮助将不胜感激。

Answer 1

If you plan to use textscan: 如果您打算使用textscan：

fid = fopen('data.txt','r');
rawdata = textscan(fid,'%s','delimiter','''');
fclose(fid);

output = rawdata{:}(2)

As also used in other answers the single apostrophe ' is represented by a double one: '' , eg for delimiters. 正如在其他答案中所使用的那样，单个撇号'由双倍表示： '' ，例如用于分隔符。

considering the comment: 考虑评论：

fid = fopen('data.txt','r');
rawdata = textscan(fid,'%s','delimiter','\n');
fclose(fid);

lines = rawdata{1,1};
L = size(lines,1);
output = cell(L,1);
for ii=1:L
    temp = textscan(lines{ii},'%s','delimiter','''');
    output{ii,1} = temp{:}(2);
end

Answer 2

To get all of the text inside multiple '' blocks, regexp can be used as follows: 要获取多个''块中的所有文本，可以按如下方式使用regexp：

regexp(txt,'''(.[^'']*)''','tokens')

This says to get text surrounded by ' characters, which does not include a ' in the captured text. 这表示文本被'字符包围，在捕获的文本中不包含' 。 For example, consider this file with two lines (I made up different file name), 例如，考虑这个文件有两行（我编写了不同的文件名），

txt = ['.model sdata1 s tstonefile=''../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p'' passive=2 ', char(10), ...
'.model sdata1 s tstonefile=''../data/s_element/isdimm_rcv_via_3port_via_minstub.s00p'' passive=2']
>> stringCell = regexp(txt,'''(.[^'']*)''','tokens');
>> stringCell{:}
ans = 
    '../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p'
ans = 
    '../data/s_element/isdimm_rcv_via_3port_via_minstub.s00p'
>>

Trivia: 琐事：

char(10) gives a newline character because 10 is the ASCII code for newline. char(10)给出换行符，因为10是换行符的ASCII码。
The . 的. character in regexp ( regex in the rest of the coding word) pattern usually does not match a newline, which would make this a safer pattern. regexp字符（编码字的其余部分中的regex ）模式通常与换行符不匹配，这将使其成为更安全的模式。 In MATLAB, a dot in regexp does match a newline , so to disable this, we could add 'dotexceptnewline' as the last input argument to `regexp``. 在MATLAB中， regexp中的一个点确实匹配换行符，因此要禁用它，我们可以添加'dotexceptnewline'作为`regexp``的最后一个输入参数。 This is convenient to ensure we don't get the text outside of the quotes instead, but not needed since the first match sets precedent. 这样可以方便地确保我们不会在引号之外得到文本，但是因为第一个匹配设置了先例，所以不需要。
Instead of excluding a ' from the match with [^''] , the match can be made non-greedy with ? 而不是用[^'']排除'匹配[^''] ，匹配可以非贪婪? as follows, regexp(txt,'''(.*?)''','tokens') . 如下所示， regexp(txt,'''(.*?)''','tokens') 。

Answer 3

One easy way is to split the string with single quote delimiter and take the even-numbered strings in the output: 一种简单的方法是使用单引号分隔符拆分字符串，并在输出中使用偶数编号的字符串：

str = fileread('test.txt');
out = regexp(str, '''', 'split');
out = out(2:2:end);

Answer 4

You can do this using regular expressions. 您可以使用正则表达式执行此操作。 Assuming that there is only one occurrence of text between quotation marks: 假设引号之间只出现一次文本：

% select all chars between single quotation marks.
out = regexp(inputString,'''(.*)''','tokens','once');

Answer 5

After identifing which lines you want to extract info from, you could tokenize it or do something like this if they all have the same form: 在确定要从中提取信息的行之后，如果它们都具有相同的形式，您可以对其进行标记或执行类似的操作：

test='.model sdata1 s tstonefile=''../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p'' passive=2';
a=strfind(test,'''')
test=test(a(1):a(2))

在MATLAB中提取单引号之间的文本

问题描述

5 个解决方案

解决方案1
2 2013-10-17 16:48:28

解决方案2
2 已采纳 2013-10-17 18:39:16

解决方案3
2 2013-10-18 14:54:36

解决方案4
1 2013-10-17 16:43:49

解决方案5
1 2013-10-17 16:48:12

在MATLAB中提取单引号之间的文本

问题描述

5 个解决方案

解决方案1 2 2013-10-17 16:48:28

解决方案2 2 已采纳 2013-10-17 18:39:16

解决方案3 2 2013-10-18 14:54:36

解决方案4 1 2013-10-17 16:43:49

解决方案5 1 2013-10-17 16:48:12

解决方案1
2 2013-10-17 16:48:28

解决方案2
2 已采纳 2013-10-17 18:39:16

解决方案3
2 2013-10-18 14:54:36

解决方案4
1 2013-10-17 16:43:49

解决方案5
1 2013-10-17 16:48:12