简体   繁体   English

解析Excel单元格。 怎么样?

[英]Parsing excel cell. How?

We have Excel file. 我们有Excel文件。 This file is in a cells with the name "address" containing the line, for example: 该文件位于包含该行的名称为“ address”的单元格中,例如:

The Accounts Department, National Bank Ltd, 20 Lombard Str., London 3 WRS, England

Need to share information in the cell groups. 需要在单元组中共享信息。 That is, we must have the following cells: 也就是说,我们必须具有以下单元格:

"country": England "city": London "street": Lombard Str. “国家”:英格兰“城市”:伦敦“街道”:伦巴第大街。 ..... and other ..... 和别的

That is necessary to analyze the contents of the cell and divide the content into logical parts. 这是分析单元格的内容并将内容分为逻辑部分所必需的。 You can tell from what I get started? 您可以从我开始了解什么?

This really depends on whether your "logical parts" are delimited in some way such that you can id each part separately. 这实际上取决于您的“逻辑部分”是否以某种方式界定,以便您可以分别标识每个部分。 I doubt you can assume a comma "," as a delimiter as address components may themselves contain commas (eg the name of a firm/business). 我怀疑您会假定逗号“,”作为分隔符,因为地址部分本身可能包含逗号(例如,公司/企业的名称)。 Additionally you may have issues with data cleanliness - that is commas may be missing, or in the wrong place or whatever. 此外,您可能会遇到数据整洁度的问题-逗号可能丢失,或者放置在错误的地方或其他地方。

If you have delimited data your job is simplified somewhat, in that you'll be able to id each field independently. 如果您有定界数据,那么您的工作就会有所简化,因为您将能够独立地标识每个字段。 However that's still not straightforward. 但是,这仍然不容易。 If you do not have delimited data, it's going to be much harder. 如果您没有分隔的数据,它将变得更加困难。 Anyway, identification of fields will probably be along these lines: 无论如何,字段的标识可能会遵循以下原则:

1) Postcode (there's a well known regex for this - however again you may need to cope with malformed or invalid postcodes or typos) 1)邮政编码(对此有一个众所周知的正则表达式-但是,您可能需要再次处理格式错误或无效的邮政编码或错别字)

2) Country & town, city - you can get these with a dictionary of UK towns & cities. 2)国家和城市,城市-您可以通过英国城市和城市词典获得这些信息。 Have a Google. 有一个谷歌。

3) Villages - harder, but again a dictionary will get you 98% of the way there. 3)村庄-难度更高,但再次尝试字典可以使您达到98%的学习水平。

4) Streets, Roads etc: can't really use a dictionary for this. 4)街道,道路等:不能真正使用字典。 You'll need to do some kind of recognition based on keywords - if the field ends in street, road, lane or whatever. 您需要根据关键字进行某种识别-如果字段以街道,道路,车道或其他内容结尾。 However there are a lot of these. 但是,其中有很多。 You may find a bayesian approach works well for this. 您可能会发现贝叶斯方法对此非常有效。

5) Company name, department etc. Harder still. 5)公司名称,部门等。更难。 Again certain keywords can flag these (eg "ltd") but I'm guessing most of your entries are not guaranteed to include legal entity. 同样,某些关键字可以标记这些关键字(例如“ ltd”),但我想您的大多数条目都不能保证包含法人实体。 And departments can be anything. 部门可以是任何事物。

Also - what about people names? 另外-人名呢? can you recognise those? 你能认出那些吗?

In short, this is quite a big and involved job to get done correctly. 简而言之,这是一项很大且涉及正确的工作。 There is no easy/simple answer. 没有简单的答案。

BTW - if you access to the PAF that might help you: http://www.royalmail.com/portal/rm/jump2?mediaId=400085&catId=400084&campaignid=paf_redirect 顺便说一句-如果您访问可能会帮助您的PAF: http ://www.royalmail.com/portal/rm/jump2?mediaId=400085&catId=400084&campaignid=paf_redirect

But that still wont help you with departments, business or people names. 但这仍然无法帮助您了解部门,业务或人员名称。

There is no sure-fire way to do this. 没有确定的方法可以做到这一点。 Assuming (and this is a big assumption) that commas are only used to separate cells, you can the Data menu, select Text To Columns , and select comma as your delimiter. 假设(这是一个很大的假设)逗号用于分隔单元格,则可以在“ Data菜单中选择“ Text To Columns ,然后选择comma作为分隔符。

That should give you something like the following: 那应该给你类似下面的东西:

A1                      | B1                | C1              | D1           | E1     
The Accounts Department | National Bank Ltd | 20 Lombard Str. | London 3 WRS | England

From there, in cell F1, you could do the following to try and extract the street name: 从那里开始,在单元格F1中,您可以执行以下操作尝试提取街道名称:

=RIGHT(C1,LEN(TRIM(C1))-FIND(" ",TRIM(C1)))

You can use this to find the city: 您可以使用它来查找城市:

=LEFT(D1,FIND(" ",TRIM(D1)))

You'll probably find exceptions to both my formulas, and you'll just have to work around that. 您可能会发现我的两个公式都没有例外,您只需要解决这个问题即可。

If my first assumption is wrong, and there are commas in the text other than the field delimiter, I'd ask to get the file back with a different delimiter (pipe for example). 如果我的第一个假设是错误的,也比字段分隔符其他文本逗号,我会问得到的文件背面采用了不同的分隔符(管为例)。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将数据从Word复制到Excel单元格。 然后在相邻单元格中触发公式。 怎么样? - Copy data from Word to Excel Cell. Then trigger formula in adjacent cell. How? Excel SumProduct在单元格中带有if语句。 - Excel SumProduct with if Statement in a cell. 在 excel 中复制一列并将每个条目/单元格加倍。 如何? - Copying a column in excel and doubling every entry/cell. How to? EXCEL:单元格中的数组。 如何取回数组? - EXCEL: Array in a cell. How to get the array back? Excel在同一列中向所有人发送电子邮件,只需要向特定单元格中的所有人发送电子邮件。 怎么样? - Excel sending emails to all people in one Column, need to email only person in specific cell. How? Office 360​​ Excel公式,输出到单独的单元格。 没有密码 - Office 360 Excel formula with output to separate cell. No code Excel单元格包含对其他单元格的引用。 设置范围以指向另一个单元格 - Excel cell contains reference to other cell. Set a range to point at the other cell 如何查找特定单元格中某个数字范围内的值总数? - How to find total number of values in a number range in a particular cell.? 定义检查同一单元格中的两个日期以及其他单元格中的文本的行颜色。 Excel宏? - Define color of row checking two dates within same cell and text in other cell. Excel macros? Excel单元格中的HTML数据。 Excel更改值,导致无效导入数据库 - HTML data in Excel cell. Excel changing values, causing invalid import into database
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM