简体   繁体   English

根据计算扩展 Pandas 列以分隔行

[英]Expanding Pandas Column to Separate Rows based on calculation

I have DataFrame that looks like this.我有看起来像这样的 DataFrame。

df1 = pd.DataFrame(columns=['ID', 'Divide', 'Object', 'List'], data=[ ['A, B', 2, 20, [0, 5]], ['C, D', 2, 40, [10, 15, 35]], ['E, F', 2, 20, [11, 15]], ['G', 1, 10, [1, 5]], ['H', 1, 10, ''], ['I, J', 2, 20, ''] ])

|    | ID   |   Divide |   Object | List         |
|---:|:-----|---------:|---------:|:-------------|
|  0 | A, B |        2 |       20 | [0, 5]       |
|  1 | C, D |        2 |       40 | [10, 15, 35] |
|  2 | E, F |        2 |       20 | [11, 15]     |
|  3 | G    |        1 |       10 | [1, 5]       |
|  4 | H    |        1 |       10 |              |
|  5 | I, J |        2 |       20 |              |

Each ID needs to have its own row.每个ID都需要有自己的行。 However, List column has data belong to each ID .但是, List列具有属于每个ID的数据。 The logic is the following:逻辑如下:

  1. If there is single ID in the columns (no , ), no change is needed.如果列中只有一个ID (no , ),则无需更改。 a一个
  2. If there are two IDs ( ID contains , )如果有两个 ID( ID包含,
  3. Then, First ID has items from List from 0: Object/Divided - 1然后,第一个ID具有来自 0 的列表中的项目:对象/划分 - 1
  4. Second ID has the items from List from Object/Divided: Object - 1第二个ID具有来自 Object/Divided列表的项目:Object - 1

So, the final table looks like this:所以,决赛桌看起来像这样:

|    | ID   |   Divide |   Object | List   |
|---:|:-----|---------:|---------:|:-------|
|  0 | A    |        2 |       20 | 0, 5   |
|  1 | B    |        2 |       20 |        |
|  2 | C    |        2 |       40 | 10, 15 |
|  3 | D    |        2 |       40 | 35     |
|  4 | E    |        2 |       20 |        |
|  5 | F    |        2 |       20 | 11, 15 |
|  6 | G    |        1 |       10 | 1, 5   |
|  7 | H    |        1 |       10 |        |
|  8 | I    |        2 |       20 |        |
|  9 | J    |        2 |       20 |        |

If it was lists, then explode could be used to flatten out the list.如果是列表,则可以使用explode来展平列表。 But I don't know who to apply the calculation logic within the DataFrame to parse out the Detail.但是我不知道是谁应用 DataFrame 中的计算逻辑来解析出细节。 Thanks谢谢

You can try this:你可以试试这个:

import pandas as pd
df = pd.DataFrame(columns=['ID', 'Divide', 'Object', 'List'], data=[ ['A, B', 2, 20, [0, 5]], ['C, D', 2, 40, [10, 15, 35]], ['E, F', 2, 20, [11, 15]], ['G', 1, 10, [1, 5]], ['H', 1, 10, ''], ['I, J', 2, 20, ''] ])

def split_list(lst, limit):
    l1 = list()
    l2 = list()
    for e in lst:
        if e <= limit:
            l1.append(e)
        else:
            l2.append(e)
    return l1, l2

df['ID'] = df['ID'].str.split(', ')
df['Limit'] = df['Object'] / df['Divide']
df['List'] = df.apply(lambda row: dict(zip(row['ID'], split_list(row['List'], row['Limit']))), axis=1)
df = df.explode('ID')
df['List'] = df.apply(lambda row: row['List'].get(row['ID']), axis=1)

print(df)


# Out[192]:
#   ID  Divide  Object      List  Limit
# 0  A       2      20    [0, 5]   10.0
# 0  B       2      20        []   10.0
# 1  C       2      40  [10, 15]   20.0
# 1  D       2      40      [35]   20.0
# 2  E       2      20        []   10.0
# 2  F       2      20  [11, 15]   10.0
# 3  G       1      10    [1, 5]   10.0
# 4  H       1      10        []   10.0
# 5  I       2      20        []   10.0
# 5  J       2      20        []   10.0
  • Most code is getting your sample data in right shape....大多数代码正在使您的示例数据具有正确的形状......
  • assumption an individual ID is 1 character df.ID_r==df["ID"].str.strip().str[:1]假设个人 ID 为 1 个字符df.ID_r==df["ID"].str.strip().str[:1]
  • logic is then as stated然后逻辑如所述
import io, json
df = (pd.read_csv(io.StringIO("""|    | ID   |   Divide |   Object | List         |
|  0 | A, B |        2 |       20 | [0, 5]       |
|  1 | C, D |        2 |       40 | [10, 15, 35] |
|  2 | E, F |        2 |       20 | [11, 15]     |
|  3 | G    |        1 |       10 | [1, 5]       |
|  4 | H    |        1 |       10 |              |
|  5 | I, J |        2 |       20 |              |"""), sep="|")
      .pipe(lambda d: d.rename(columns={c:c.strip() for c in d.columns}))
      .pipe(lambda d: d.drop(columns=[c for c in d.columns if "Unnamed" in c or c==""]))
      .assign(List=lambda d: d["List"].apply(lambda l: json.loads(l) if "[" in l else []))
     )
### end make sample data work... NB List is a list and empty if no list..
# explode ID column
df = df.join(df["ID"].apply(lambda id: [t.strip() for t in id.split(",")]).explode(), rsuffix="_r")
# real logic, take first two list items if first, else ...
df["List"] = np.where(df.ID_r==df["ID"].str.strip().str[:1], df["List"].apply(lambda l: l[:2]),df["List"].apply(lambda l: l[2:]))
df.reset_index(drop=True).drop(columns=["ID"]).rename(columns={"ID_r":"ID"})

output output

Divide划分 Object Object List列表 ID ID
0 0 2 2 20 20 [0, 5] [0, 5] A一个
1 1 2 2 20 20 [] [] B
2 2 2 2 40 40 [10, 15] [10, 15] C C
3 3 2 2 40 40 [35] [35] D D
4 4 2 2 20 20 [11, 15] [11、15] E
5 5 2 2 20 20 [] [] F F
6 6 1 1 10 10 [1, 5] [1, 5] G G
7 7 1 1 10 10 [] [] H H
8 8 2 2 20 20 [] [] I
9 9 2 2 20 20 [] [] J Ĵ

Try exploding both ID and List, then conditionally filtering based on which order the IDs came in.尝试同时分解 ID 和 List,然后根据 ID 进入的顺序有条件地过滤。

import pandas as pd

df1 = pd.DataFrame(columns=['ID', 'Divide', 'Object', 'List'],
                   data=[['A, B', 2, 20, [0, 5]],
                         ['C, D', 2, 40, [10, 15, 35]],
                         ['E, F', 2, 20, [11, 15]],
                         ['G', 1, 10, [1, 5]],
                         ['H', 1, 10, ''],
                         ['I, J', 2, 20, '']])

# Split and Explode ID
df1['ID'] = df1['ID'].str.split(', ')

# Group By Each ID and set index so that First and Second IDs are tracked
df1 = df1.explode('ID') \
    .groupby(level=0) \
    .apply(lambda x: x.reset_index()) \
    .droplevel(0)
# Calculate Cap For Later
df1['cap'] = df1['Object'] // df1['Divide'] - 1


def split_lists(g):
    # If more than 1 row and non-empty list
    if len(g) > 1 and not g['List'].empty:
        # Check if is the First ID
        if g['level_0'].iloc[0] == 0:
            # Filter Less Than Equal To Cap
            g['List'] = g['List'][g['List'] <= g['cap']]
        else:
            # Filter Greater Than Cap
            g['List'] = g['List'][g['List'] > g['cap']]
    return g


# Explode Lists Group By ID filter using function
# Regroup and convert back to lists
df2 = df1 \
    .explode('List') \
    .reset_index() \
    .groupby('ID') \
    .apply(split_lists) \
    .groupby('ID')['List'] \
    .apply(lambda x: x.dropna().tolist())

# Drop Extra Columns from df1 and merge back
out = df1.drop(columns=['List', 'index', 'cap']) \
    .merge(df2, left_on='ID', right_index=True, how='left') \
    .reset_index(drop=True)

print(out)

Out:出去:

  ID  Divide  Object      List
0  A       2      20    [0, 5]
1  B       2      20        []
2  C       2      40  [10, 15]
3  D       2      40      [35]
4  E       2      20        []
5  F       2      20  [11, 15]
6  G       1      10    [1, 5]
7  H       1      10        []
8  I       2      20        []
9  J       2      20        []

DF1 with additional columns带有附加列的 DF1

   index ID  Divide  Object          List  cap
0      0  A       2      20        [0, 5]    9
1      0  B       2      20        [0, 5]    9
0      1  C       2      40  [10, 15, 35]   19
1      1  D       2      40  [10, 15, 35]   19
0      2  E       2      20      [11, 15]    9
1      2  F       2      20      [11, 15]    9
0      3  G       1      10        [1, 5]    9
0      4  H       1      10                  9
0      5  I       2      20                  9
1      5  J       2      20                  9

DF2 after filter and regroup过滤和重组后的DF2

ID
A      [0, 5]
B          []
C    [10, 15]
D        [35]
E          []
F    [11, 15]
G      [1, 5]
H          []
I          []
J          []
Name: List, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM