[英]Fill one Dataframe Column from specific value in list of another column
My dataframe has a column pairs
that contains a key-pair list
.我的数据框有一个包含 key-pair
list
的列pairs
。 Each key is unique in the list.每个键在列表中都是唯一的。 eg:
例如:
df = pd.DataFrame({
'id': ['1', '2', '3'],
'abc':None,
'pairs': [ ['abc/123', 'foo/345', 'xyz/789'], ['abc/456', 'foo/111', 'xyz/789'], ['xxx/222', 'foo/555', 'xyz/333'] ]
})
Dataframe is :数据框是:
id | abc | pairs
------------------------------------
1 |None | [abc/123, foo/345, xyz/789]
2 |None | [abc/456, foo/111, xyz/789]
3 |None | [xxx/222, foo/555, xyz/333]
The column abc
is filled with the value in column pairs
if an element (idx=0) split by \
has the value (key) =='abc'.如果由
\
分割的元素 (idx=0) 具有值 (key) =='abc',则列abc
将填充列pairs
中的值。
Expected df
:预期
df
:
id | abc | pairs
------------------------------------
1 |123 | [abc/123, foo/345, xyz/789]
2 |456 | [abc/456, foo/111, xyz/789]
3 |None | [xxx/222, foo/555, xyz/333]
I look for something like:我寻找类似的东西:
df.loc[df['pairs'].map(lambda x: 'abc' in (l.split('/')[0] for l in x)), 'abc'] = 'FOUND'
my problem is to replace the FOUND by the correct value the l.split('/')[0]
我的问题是用正确的值替换FOUND
l.split('/')[0]
You can use .str
repeatedly:您可以重复使用
.str
:
df['abc'] = df['pairs'].str[0].str.split('/').loc[lambda x: x.str[0] == 'abc'].str[1]
Output:输出:
>>> df
id abc pairs
0 1 123 [abc/123, foo/345, xyz/789]
1 2 456 [abc/456, foo/111, xyz/789]
2 3 NaN [xxx/222, foo/555, xyz/333]
More readable alternative:更具可读性的替代方案:
x = df['pairs'].str[0].str.split('/')
df.loc[x.str[0] == 'abc', 'abc'] = x.str[1]
Use str.get
as much as you like ;)尽可能多地使用
str.get
;)
s = df['pairs'].str.get(0).str.split('/')
df['abc'] = np.where(s.str.get(0) == 'abc', s.str.get(1), None)
Try, you don't need apply
nor lambda functions:试试看,你不需要
apply
也不需要 lambda 函数:
a = df['pairs'].str[0].str
df['abc'] = a.split('/').str[1].where(a.startswith('abc'))
Output:输出:
id abc pairs
0 1 123 [abc/123, foo/345, xyz/789]
1 2 456 [abc/456, foo/111, xyz/789]
2 3 NaN [xxx/222, foo/555, xyz/333]
Note: str[0] is equal to using str.get(0).注意:str[0] 等于使用 str.get(0)。
"Elements in the split lists can be accessed using get or [] notation:" “可以使用 get 或 [] 表示法访问拆分列表中的元素:”
Try this尝试这个
# data
df = pd.DataFrame({
'id': ['1', '2', '3'],
'abc':None,
'pairs': [ ['abc/123', 'foo/345', 'xyz/789'], ['abc/456', 'foo/111', 'xyz/789'], ['xxx/222', 'foo/555', 'xyz/333'] ]
})
# construct a dict in loop and get value of abc key
df['abc'] = df['pairs'].apply(lambda x: dict(e.split('/') for e in x).get('abc'))
df
Upon reading the question again, it seems you're only interested in abc
key if it's the first element in the lists, so instead of reading each list, just index the first element and split再次阅读问题后,您似乎只对
abc
键感兴趣,如果它是列表中的第一个元素,所以不要读取每个列表,只需索引第一个元素并拆分
df['abc'] = df['pairs'].apply(lambda x: dict([x[0].split('/')]).get('abc'))
" You can use .str repeatedly " -> Yes, but… it is quite slow ! “您可以反复使用 .str ” -> 是的,但是……它很慢!
In this context, it is much better to use a list comprehension:在这种情况下,最好使用列表推导:
df['abc'] = [x[1] if (x:=l[0].split('/'))[0].startswith('abc') else float('nan')
for l in df['pairs']]
Rule of thumb: if you need 3 str
or more, better try the list comprehension!经验法则:如果您需要
str
或更多,最好尝试列表理解!
One picture is better than thousand words: test of the performance (all current answers) from 3 to almost 1M rows:一张图胜过千字:从 3 行到近 1M 行的性能测试(所有当前答案):
df['abc'] = [next((x.split('/')[1] for x in l if x.startswith('abc')), None)
for l in df['pairs']]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.