Python：从文件名中提取唯一ID

Question

I have a csv file that contains a list of authors in the MLA in the format. 我有一个csv文件，其中包含格式为MLA的作者列表。

df = pd.read_csv('file.csv')

If I check the column name I have: 如果我检查列名，则有：

df['name']

'van der Hilst, Rob, Chen, Min, Huang, Hui, Niu, Fenglin, Yao, Huajian'
'Malanotte-Rizzoli, Paola, Eltahir, Elfatih, Wei, Jun, Xue, Pengfei'
'Bowring, Samuel, Hoke, Gregory, Schmitz, Mark'

I want to extract the Firstname+Familyname and assign to it a unique ID. 我想提取“ Firstname+Familyname并为其分配一个唯一的ID。 For instance I want Rob van der Hilst = 0 , Min Chen = 1 and so on. 例如，我希望Rob van der Hilst = 0 ， Min Chen = 1 ，依此类推。

Answer 1

if I understand your question correct, then you can take advantage of the python string sclicing and other fancy language features 如果我正确理解您的问题，则可以利用python字符串切片和其他精美的语言功能

here is the code and explanation: 这是代码和说明：

load names 加载名称

names = 'van der Hilst, Rob, Chen, Min, Huang, Hui, Niu, Fenglin, Yao, Huajian' + \
        'Malanotte-Rizzoli, Paola, Eltahir, Elfatih, Wei, Jun, Xue, Pengfei' + \
        'Bowring, Samuel, Hoke, Gregory, Schmitz, Mark'

split names on comma followed by space: 用逗号分隔名称，后跟空格：

names = names.split(', ')

use python slicing to extract first and last names, names looks as follows: ['van der Hilst', 'Rob', 'Chen', 'Min', 'Huang' ...] 使用python切片提取名字和姓氏，名字看起来如下：['van der Hilst'，'Rob'，'Chen'，'Min'，'Huang'...]

slicing takes the form of scalar[start:stop:steps], we thus start at the first first name and first larst name, and take steps of size 2 to get all the other last or first names, if 'stop' is empty it means 'continue til the end' 切片采用标量[start：stop：steps]的形式，因此我们从名字和姓氏开始，并采用大小为2的步骤来获取所有其他姓氏或名字，如果'stop'为空，表示“一直持续到最后”

last_names = names[::2]
first_names = names[1::2]

finally we use a dictionary comprehention to map names to ids, to do this we use: 最后，我们使用字典解析将名称映射到id，为此，我们使用：

the zip function to stick last and first names together zip函数将姓氏和名字结合在一起

the enumerate function to assign numbers 枚举函数来分配数字

the '%s %s' to concatinate the first and last name '％s％s'来代替名字和姓氏

names = {'%s %s' % (fn, ln) : _id for _id, (fn, ln) in enumerate(zip(first_names, last_names))}

the final code is: 最终的代码是：

names = 'van der Hilst, Rob, Chen, Min, Huang, Hui, Niu, Fenglin, Yao, Huajian' + \
        'Malanotte-Rizzoli, Paola, Eltahir, Elfatih, Wei, Jun, Xue, Pengfei' + \
        'Bowring, Samuel, Hoke, Gregory, Schmitz, Mark'
names = names.split(', ')
last_names = names[::2]
first_names = names[1::2]

names = {'%s %s' % (fn, ln) : _id for _id, (fn, ln) in enumerate(zip(first_names, last_names))}

Python：从文件名中提取唯一ID

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-10-21 19:32:49

Python：从文件名中提取唯一ID

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-10-21 19:32:49

解决方案1
2 已采纳 2015-10-21 19:32:49