简体   繁体   English

Python:从文件名中提取唯一ID

[英]Python: extract unique id from file name

I have a csv file that contains a list of authors in the MLA in the format. 我有一个csv文件,其中包含格式为MLA的作者列表。

df = pd.read_csv('file.csv')

If I check the column name I have: 如果我检查列名,则有:

df['name']

'van der Hilst, Rob, Chen, Min, Huang, Hui, Niu, Fenglin, Yao, Huajian'
'Malanotte-Rizzoli, Paola, Eltahir, Elfatih, Wei, Jun, Xue, Pengfei'
'Bowring, Samuel, Hoke, Gregory, Schmitz, Mark'

I want to extract the Firstname+Familyname and assign to it a unique ID. 我想提取“ Firstname+Familyname并为其分配一个唯一的ID。 For instance I want Rob van der Hilst = 0 , Min Chen = 1 and so on. 例如,我希望Rob van der Hilst = 0Min Chen = 1 ,依此类推。

if I understand your question correct, then you can take advantage of the python string sclicing and other fancy language features 如果我正确理解您的问题,则可以利用python字符串切片和其他精美的语言功能

here is the code and explanation: 这是代码和说明:

load names 加载名称

names = 'van der Hilst, Rob, Chen, Min, Huang, Hui, Niu, Fenglin, Yao, Huajian' + \
        'Malanotte-Rizzoli, Paola, Eltahir, Elfatih, Wei, Jun, Xue, Pengfei' + \
        'Bowring, Samuel, Hoke, Gregory, Schmitz, Mark'

split names on comma followed by space: 用逗号分隔名称,后跟空格:

names = names.split(', ')

use python slicing to extract first and last names, names looks as follows: ['van der Hilst', 'Rob', 'Chen', 'Min', 'Huang' ...] 使用python切片提取名字和姓氏,名字看起来如下:['van der Hilst','Rob','Chen','Min','Huang'...]

slicing takes the form of scalar[start:stop:steps], we thus start at the first first name and first larst name, and take steps of size 2 to get all the other last or first names, if 'stop' is empty it means 'continue til the end' 切片采用标量[start:stop:steps]的形式,因此我们从名字和姓氏开始,并采用大小为2的步骤来获取所有其他姓氏或名字,如果'stop'为空,表示“一直持续到最后”

last_names = names[::2]
first_names = names[1::2]

finally we use a dictionary comprehention to map names to ids, to do this we use: 最后,我们使用字典解析将名称映射到id,为此,我们使用:

the zip function to stick last and first names together zip函数将姓氏和名字结合在一起

the enumerate function to assign numbers 枚举函数来分配数字

the '%s %s' to concatinate the first and last name '%s%s'来代替名字和姓氏

names = {'%s %s' % (fn, ln) : _id for _id, (fn, ln) in enumerate(zip(first_names, last_names))}

the final code is: 最终的代码是:

names = 'van der Hilst, Rob, Chen, Min, Huang, Hui, Niu, Fenglin, Yao, Huajian' + \
        'Malanotte-Rizzoli, Paola, Eltahir, Elfatih, Wei, Jun, Xue, Pengfei' + \
        'Bowring, Samuel, Hoke, Gregory, Schmitz, Mark'
names = names.split(', ')
last_names = names[::2]
first_names = names[1::2]

names = {'%s %s' % (fn, ln) : _id for _id, (fn, ln) in enumerate(zip(first_names, last_names))}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM