[英]Indexing column in Pandas Dataframe returns NaN
I am running into a problem with trying to index my dataframe.我在尝试索引我的数据框时遇到问题。 As shown in the attached picture, I have a column in the dataframe called 'Identifiers' that contains a lot of redundant information ({'print_isbn_canonical': ').
如附图所示,我在数据框中有一列名为“标识符”的列,其中包含大量冗余信息({'print_isbn_canonical':')。 I only want the ISBN that comes after.
我只想要后面的 ISBN。
#Option 1 I tried
testdf2 = testdf2[testdf2['identifiers'].str[26:39]]
#Option 2 I tried
testdf2['identifiers_test'] = testdf2['identifiers'].str.replace("{'print_isbn_canonical': '","")
Unfortunately both of these options turn the dataframe column into a colum only containing NaN values不幸的是,这两个选项都将数据框列变成只包含 NaN 值的列
Please help out!请帮忙! I cannot seem to find the solution and have tried several things.
我似乎无法找到解决方案并尝试了几件事。 Thank you all in advance!
谢谢大家!
If the contents of your column identifiers
is a real dict / json type, you can use the string accessor str[]
to access the dict value by key, as follows:如果你的列
identifiers
的内容是真正的 dict / json 类型,你可以使用字符串访问器str[]
来按键访问 dict 值,如下所示:
testdf2['identifiers_test'] = testdf2['identifiers'].str['print_isbn_canonical']
Demo演示
data = {'identifiers': [{'print_isbn_canonical': '9780721682167', 'eis': '1234'}]}
df = pd.DataFrame(data)
df['isbn'] = df['identifiers'].str['print_isbn_canonical']
print(df)
identifiers isbn
0 {'print_isbn_canonical': '9780721682167', 'eis': '1234'} 9780721682167
Try this out :试试这个:
testdf2['new_column'] = testdf2.apply(lambda r : r.identifiers[26:39],axis=1)
Here I assume that the identifiers column is string type这里我假设标识符列是字符串类型
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.