简体   繁体   English

是否有一个excel公式可以从单元格中的字符串末尾提取数字,其中长度并不总是恒定的

[英]Is there an excel formula to extract numbers from the end of a string in a cell, where the length is not always constant

I am trying to separate information copied from a PDF table - id usually use text to columns but the only delamination is spaces and this then splits the data into multiple unusable columns我正在尝试分离从 PDF 表复制的信息 - id 通常使用文本到列,但唯一的分层是空格,然后将数据拆分为多个不可用的列

The data comes like this:数据是这样的:

Raw Data原始数据
A1 Company 0 A1 公司 0
Company2 40000公司2 40000
name a 1命名一个 1
name b 15名字 b 15
name c 184名称 c 184
Big 17 Company 1887 17 大公司 1887

I need the output to be:我需要输出为:

Company公司 Units单位
A1 Company A1公司 0 0
Company2公司2 40000 40000
name a命名一个 1 1
name b名字 b 15 15
name c名字 c 184 184
Big 17 Company 17 大公司 1887 1887年

So the company name (that might contain numbers) is separated for the unit number (that could be 1-5 digits long).因此,公司名称(可能包含数字)与单位编号(可能是 1-5 位长)分开。

I haven't been able to figure out a way that uses =len() as the string length isn't a constant mixed with the last numbers not being a consistent number of digits.我一直无法找出使用 =len() 的方法,因为字符串长度不是常数,最后一个数字不是一致的位数。

I'm currently using:我目前正在使用:

=SUMPRODUCT(MID(0&A2, LARGE(INDEX(ISNUMBER(--MID(A2, ROW(INDIRECT("1:"&LEN(A2))), 1)) * ROW(INDIRECT("1:"&LEN(A2))), 0), ROW(INDIRECT("1:"&LEN(A2))))+1, 1) * 10^ROW(INDIRECT("1:"&LEN(A2)))/10) 

This gives me all the numbers in the cell - which works for 90% of the data as most of the company's don't have numbers in their name.这给了我单元格中的所有数字 - 这适用于 90% 的数据,因为大多数公司的名称中没有数字。 But for something like 'A1 Company 0' it gives 10 as the output not just the 0. I then go and manually edit the small number of companies that this happens too.但是对于像“A1 Company 0”这样的东西,它会给出 10 作为输出,而不仅仅是 0。然后我去手动编辑少数也会发生这种情况的公司。

I then use a mixture of =LEN() =LEFT and =RIGHT to split the information up as required for the further automated analysis.然后,我使用=LEN() =LEFT=RIGHT的混合物来根据进一步自动分析的需要拆分信息。

I'd prefer a formula over VBA/macro我更喜欢公式而不是 VBA/宏

I cant provide the actual data but I hope I've given enough examples in the table above to show the main problems (different company name lengths, companies with numbers in their name, different amount of digits representing the units)我无法提供实际数据,但我希望我在上表中提供了足够的示例来显示主要问题(不同的公司名称长度、名称中带有数字的公司、代表单位的不同数字数量)

FILTERXML() would best choice for this case. FILTERXML()是这种情况的最佳选择。 Try-尝试-

=FILTERXML("<t><s>"&SUBSTITUTE(A1:A6," ","</s><s>")&"</s></t>","//s[last()]")

Details about FILTERXML() from JvdV here .有关 JvdV 的FILTERXML()的详细信息,请点击此处

在此处输入图像描述

Using Libre Office, but this formula checks for the last space in the cell使用 Libre Office,但此公式检查单元格中的最后一个空格

=RIGHT(A1,LEN(A1)-FIND("@",SUBSTITUTE(A1," ","@",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))),1))

在此处输入图像描述

Taken from: https://trumpexcel.com/find-characters-last-position/取自: https ://trumpexcel.com/find-characters-last-position/

See if the following works for you:看看以下是否适合您:

在此处输入图像描述

Formula in B2 : B2中的公式:

=LEFT(A2,LEN(A2)-1-LEN(C2))

In C2 :C2中:

=-LOOKUP(1,-RIGHT(A2,ROW($1:$5)))

For those users using ms365's newest functions:对于使用 ms365 最新功能的用户:

=HSTACK(TEXTBEFORE(A2," ",-1),TEXTAFTER(A2," ",-1))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM