[英]Python: extract unique id from file name
I have a csv
file that contains a list of authors in the MLA
in the format. 我有一个
csv
文件,其中包含格式为MLA
的作者列表。
df = pd.read_csv('file.csv')
If I check the column name I have: 如果我检查列名,则有:
df['name']
'van der Hilst, Rob, Chen, Min, Huang, Hui, Niu, Fenglin, Yao, Huajian'
'Malanotte-Rizzoli, Paola, Eltahir, Elfatih, Wei, Jun, Xue, Pengfei'
'Bowring, Samuel, Hoke, Gregory, Schmitz, Mark'
I want to extract the Firstname+Familyname
and assign to it a unique ID. 我想提取“
Firstname+Familyname
并为其分配一个唯一的ID。 For instance I want Rob van der Hilst = 0
, Min Chen = 1
and so on. 例如,我希望
Rob van der Hilst = 0
, Min Chen = 1
,依此类推。
if I understand your question correct, then you can take advantage of the python string sclicing and other fancy language features 如果我正确理解您的问题,则可以利用python字符串切片和其他精美的语言功能
here is the code and explanation: 这是代码和说明:
load names 加载名称
names = 'van der Hilst, Rob, Chen, Min, Huang, Hui, Niu, Fenglin, Yao, Huajian' + \
'Malanotte-Rizzoli, Paola, Eltahir, Elfatih, Wei, Jun, Xue, Pengfei' + \
'Bowring, Samuel, Hoke, Gregory, Schmitz, Mark'
split names on comma followed by space: 用逗号分隔名称,后跟空格:
names = names.split(', ')
use python slicing to extract first and last names, names looks as follows: ['van der Hilst', 'Rob', 'Chen', 'Min', 'Huang' ...] 使用python切片提取名字和姓氏,名字看起来如下:['van der Hilst','Rob','Chen','Min','Huang'...]
slicing takes the form of scalar[start:stop:steps], we thus start at the first first name and first larst name, and take steps of size 2 to get all the other last or first names, if 'stop' is empty it means 'continue til the end' 切片采用标量[start:stop:steps]的形式,因此我们从名字和姓氏开始,并采用大小为2的步骤来获取所有其他姓氏或名字,如果'stop'为空,表示“一直持续到最后”
last_names = names[::2]
first_names = names[1::2]
finally we use a dictionary comprehention to map names to ids, to do this we use: 最后,我们使用字典解析将名称映射到id,为此,我们使用:
the zip function to stick last and first names together zip函数将姓氏和名字结合在一起
the enumerate function to assign numbers 枚举函数来分配数字
the '%s %s' to concatinate the first and last name '%s%s'来代替名字和姓氏
names = {'%s %s' % (fn, ln) : _id for _id, (fn, ln) in enumerate(zip(first_names, last_names))}
the final code is: 最终的代码是:
names = 'van der Hilst, Rob, Chen, Min, Huang, Hui, Niu, Fenglin, Yao, Huajian' + \
'Malanotte-Rizzoli, Paola, Eltahir, Elfatih, Wei, Jun, Xue, Pengfei' + \
'Bowring, Samuel, Hoke, Gregory, Schmitz, Mark'
names = names.split(', ')
last_names = names[::2]
first_names = names[1::2]
names = {'%s %s' % (fn, ln) : _id for _id, (fn, ln) in enumerate(zip(first_names, last_names))}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.