简体   繁体   English

从另一列列表中的特定值填充一个数据框列

[英]Fill one Dataframe Column from specific value in list of another column

My dataframe has a column pairs that contains a key-pair list .我的数据框有一个包含 key-pair list的列pairs Each key is unique in the list.每个键在列表中都是唯一的。 eg:例如:

df = pd.DataFrame({
        'id':  ['1', '2', '3'],
        'abc':None,
        'pairs': [ ['abc/123', 'foo/345', 'xyz/789'],  ['abc/456', 'foo/111', 'xyz/789'],  ['xxx/222', 'foo/555', 'xyz/333'] ]
      })

Dataframe is :数据框是

  id | abc  | pairs
  ------------------------------------
  1  |None  | [abc/123, foo/345, xyz/789]
  2  |None  | [abc/456, foo/111, xyz/789]
  3  |None  | [xxx/222, foo/555, xyz/333]

The column abc is filled with the value in column pairs if an element (idx=0) split by \ has the value (key) =='abc'.如果由\分割的元素 (idx=0) 具有值 (key) =='abc',则列abc将填充列pairs中的值。

Expected df :预期df

  id | abc  | pairs
  ------------------------------------
  1  |123   | [abc/123, foo/345, xyz/789]
  2  |456   | [abc/456, foo/111, xyz/789]
  3  |None  | [xxx/222, foo/555, xyz/333]

I look for something like:我寻找类似的东西:

df.loc[df['pairs'].map(lambda x: 'abc' in (l.split('/')[0] for l in x)), 'abc'] = 'FOUND'

my problem is to replace the FOUND by the correct value the l.split('/')[0]我的问题是用正确的值替换FOUND l.split('/')[0]

You can use .str repeatedly:您可以重复使用.str

df['abc'] = df['pairs'].str[0].str.split('/').loc[lambda x: x.str[0] == 'abc'].str[1]

Output:输出:

>>> df
  id  abc                        pairs
0  1  123  [abc/123, foo/345, xyz/789]
1  2  456  [abc/456, foo/111, xyz/789]
2  3  NaN  [xxx/222, foo/555, xyz/333]

More readable alternative:更具可读性的替代方案:

x = df['pairs'].str[0].str.split('/')
df.loc[x.str[0] == 'abc', 'abc'] = x.str[1]

Use str.get as much as you like ;)尽可能多地使用str.get ;)

s = df['pairs'].str.get(0).str.split('/')
df['abc'] = np.where(s.str.get(0) == 'abc', s.str.get(1), None)

Try, you don't need apply nor lambda functions:试试看,你不需要apply也不需要 lambda 函数:

a = df['pairs'].str[0].str
df['abc'] = a.split('/').str[1].where(a.startswith('abc'))

Output:输出:

  id  abc                        pairs
0  1  123  [abc/123, foo/345, xyz/789]
1  2  456  [abc/456, foo/111, xyz/789]
2  3  NaN  [xxx/222, foo/555, xyz/333]

Note: str[0] is equal to using str.get(0).注意:str[0] 等于使用 str.get(0)。

"Elements in the split lists can be accessed using get or [] notation:" “可以使用 get 或 [] 表示法访问拆分列表中的元素:”

Try this尝试这个

# data
df = pd.DataFrame({
        'id':  ['1', '2', '3'],
        'abc':None,
        'pairs': [ ['abc/123', 'foo/345', 'xyz/789'],  ['abc/456', 'foo/111', 'xyz/789'],  ['xxx/222', 'foo/555', 'xyz/333'] ]
      })
# construct a dict in loop and get value of abc key
df['abc'] = df['pairs'].apply(lambda x: dict(e.split('/') for e in x).get('abc'))
df

Upon reading the question again, it seems you're only interested in abc key if it's the first element in the lists, so instead of reading each list, just index the first element and split再次阅读问题后,您似乎只对abc键感兴趣,如果它是列表中的第一个元素,所以不要读取每个列表,只需索引第一个元素并拆分

df['abc'] = df['pairs'].apply(lambda x: dict([x[0].split('/')]).get('abc'))

在此处输入图像描述

" You can use .str repeatedly " -> Yes, but… it is quite slow ! 您可以反复使用 .str ” -> 是的,但是……它很慢

In this context, it is much better to use a list comprehension:在这种情况下,最好使用列表推导:

df['abc'] = [x[1] if (x:=l[0].split('/'))[0].startswith('abc') else float('nan')
            for l in df['pairs']]

Rule of thumb: if you need 3 str or more, better try the list comprehension!经验法则:如果您需要str或更多,最好尝试列表理解!

One picture is better than thousand words: test of the performance (all current answers) from 3 to almost 1M rows:一张图胜过千字:从 3 行到近 1M 行的性能测试(所有当前答案):

在此处输入图像描述

bonus: matching first "abc" on any position (not only 1st)奖金:在任何位置匹配第一个“abc”(不仅是第一个)
df['abc'] = [next((x.split('/')[1] for x in l if x.startswith('abc')), None)
             for l in df['pairs']]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用pandas DataFrame中另一列的值填充一列 - Fill one column with value of another column in pandas DataFrame 使用来自另一个 dataframe 的列的值填充列,具体取决于条件 - fill column with value of a column from another dataframe, depending on conditions 从另一个数据帧填充数据帧的列 - Fill column of a dataframe from another dataframe 如何从另一个数据框更新一个数据框的特定列 - how to update specific column of one dataframe from another dataframe python数据框,基于一列的groupby并使用最后一个非空值填充另一列的空值 - python dataframe, groupby based on one column and fill null values from another column using last non-null value 如何迭代每一行并从一个 dataframe 的特定列中找到下一个匹配列值并将其与另一个 dataframe 进行比较? - How to iterate each row and find the next matching column value from a specific column from one dataframe and comparing it to another dataframe? 用来自另一个数据帧特定列的值替换来自数据帧特定列的 Nan 值 - Replacing Nan value from specific column of a dataframe with value from specific column of another dataframe 用选定的列表值填写 dataframe 中的列 - Fill in column in dataframe with the selected list value 如何使用来自另一个 dataframe 列的值填充 pandas dataframe 列 - How to fill a pandas dataframe column using a value from another dataframe column 将一个 Dataframe 的列乘以另一个 Dataframe 的值,由键确定 - Multiply Column of One Dataframe by a Value from Another Dataframe, Determined by a Key
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM