[英]How can I extract a text that has a specific pattern from a column of cells using EXCEL formula?
Help please, I have a dataset that contains a column of text which contain users mentions and their tweets. 请提供帮助,我有一个数据集,其中包含一列文本,其中包含用户提及的内容及其推文。 I would like to extract all the users mentioned in the tweets in EXCEL. 我想提取EXCEL推文中提到的所有用户。 In another words, I would like to do this task: for each row in a column if the cell value of the row contains a text starts with @ and end with a space then put that string value in another column, each cell might contain more than one occurance of the string (more than one user mentioned in a tweet).The question is that possible to do with EXCEL formulas not with coding? 换句话说,我想执行此任务:对于列中的每一行,如果该行的单元格值包含以@开头并以空格结尾的文本,则将该字符串值放在另一列中,则每个单元格可能包含更多而不是一次出现字符串(一条推文中提到了多个用户)。问题是,可能与EXCEL公式有关而不与编码有关? If yes, would you please direct me to which formula I should use.. and if not, do you know a good way/method to accomplish this task? 如果是,请您指导我使用哪个公式。如果没有,您是否知道完成此任务的好方法/方法? Please do not send me links to documentations, if you have a well documented code to do this task or know a software/tool that can do that it will be great. 如果您有记录良好的代码可以完成此任务,或者知道可以执行此操作的软件/工具,那么请不要给我发送文档链接。 Thanks for your help in advance. 感谢您的帮助。
This is only a partial solution. 这只是部分解决方案。 It retrieves the first instance of text bounded by "@" and a single space. 它检索由“ @”和一个空格限制的文本的第一个实例。 With data in A1 , in B1 enter: 对于A1中的数据,在B1中输入:
=LEFT(MID(A1,FIND("@",A1)+1,9999),FIND(" ",MID(A1,FIND("@",A1)+1,9999)))
For example: 例如:
I suggest that you break this task down into multiple columns, to understand how the formulas work together to get to your answer. 我建议您将此任务分为多列,以了解公式如何协同工作以获取答案。
Column Headers 列标题
Column A = Your Data
Column B = First Start = Find the first occurrence of @
Column C = First End = Find the end of the first occurrence with a space
Column D = Second Start = Find the Second occurrence of @
Column E = Second End = Find the end of the second occurrence with a space
Column F = First Twitter Account = MID the First Start/End
Column G = Second Twitter Account = Mid the Second Start/End
Formulas 公式
Column A = "An Example @Tweet with @two mentions"
Column B = Find("@",A2)
Column C = FIND(" ",A2,B2)
Column D = FIND("@",A3,C3)
Column E = FIND(" ",A3,D3)
Column F = MID(A2,B2,C2-B2)
Column G = MID(A2,D2,E2-D2)
You can repeat the above pattern for as many "mentions" as needed. 您可以根据需要重复上述模式进行多次“提及”。 You can learn that by counting how many @ signs are in each string, and making sure to write enough formulas to accommodate that number. 您可以通过计算每个字符串中有多少@符号并确保编写足够的公式来容纳该数字来了解这一点。
You could mash all of the above into one formula, but it would be a beast to read. 您可以将以上所有内容合并为一个公式,但是这将是阅读的野兽。
Keep in mind as well, if a "mention" is made at the END of a string, the above formulas will not count it. 同样要记住,如果在字符串的末尾进行“提及”,则上述公式将不对其进行计数。 ie In your question you mention that mentions end in a space, which may not be the case if it happens at the end of the tweet. 即在您的问题中,您提到的提及以空格结尾,如果它发生在推文结尾可能不是这种情况。
Although tagged with [excel-vba] you offer no code and do have formula in your Title, so I suggest: 尽管标有[excel-vba],但您没有提供任何代码,并且标题中确实包含公式 ,因此我建议:
@
with say |@
where the first character is distinctive (working on a copy of your data, select all, HOME > Editing - Find & Select, Replace..., Find what: @
Replace with: |@
, Replace All.) 将@
替换为|@
,其中第一个字符是唯一字符(处理数据副本,全选,HOME>编辑-查找并选择,替换...,查找内容: @
替换为: |@
,全部替换。 ) |
选择相关列,数据,文本到列,定界,下一步,定界符其他:(仅) |
, Finish. ,完成。 In the first completely empty column: 在第一个完全空的列中:
=IF(LEFT(A1)="@",LEFT(A1,FIND(" ",A1)),"")
copied across as many columns as previously were occupied and all formulae then copied down to suit. 复制之前占用的列数,然后将所有公式向下复制以适合。
This should cope with an indeterminate number of @
instances in any one cell and also where the last instance is not followed by a space. 这应该可以处理任意一个单元格中不确定数量的@
实例,并且在最后一个实例后没有空格的情况下也是如此。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.