如何从数据集中分离一行，但在每一行中重复一些变量？

Question

我有一个数据集，其中每一行都包含需要分隔并打印在不同行中的信息，但我需要在每个新打印的行上保留公司名称：

示例数据集这些是标题：

company | marketing_budget | marketing_remaining | finance_budget | finance_remaining | sales_budget | sales_remaining

这些是 2 行数据：

Law Office | 450,000 | 150,000 | 300,000 | 100,000 | 200,000 | 50,000
Restaurant | 30,000  | 7,000   | null    | null    | 25,000  | 10,000

我需要将一条线分成我需要的多条线。 有些公司可能有营销预算但没有财务预算或任何其他可能的组合......所以输出应该是这样的（我还需要添加部门，它不包含在列中，它只是获取信息的列的标题）

Company    | Department | Budget  | Amount Remaining
Law Office | Marketing  | 450,000 | 150,000
Law Office | Finace     | 300,000 | 100,000
Law Office | Sales      | 200,00  | 50,000
Restaurant | Marketing  | 30,000  | 7,000
Restaurant | Sales      | 25,000  | 10,000

Answer 1

您可以使用 Python 包pandas来构建表。 并且还使用列表理解和list.split()方法来处理数据

import pandas as pd

d='''company | marketing_budget | marketing_remaining | finance_budget | finance_remaining | sales_budget | sales_remaining
Law Office | 450,000 | 150,000 | 300,000 | 100,000 | 200,000 | 50,000
Restaurant | 30,000 | 7,000 | null | null | 25,000 | 10,000'''

data = [e.strip().split('|') for e in d.split('\n')]
df = pd.DataFrame([[e.strip() for e in l] for l in data[1:]], columns=[e.strip() for e in data[0]])
print(df)

输出

      company marketing_budget marketing_remaining finance_budget finance_remaining sales_budget sales_remaining
0  Law Office          450,000             150,000        300,000           100,000      200,000          50,000
1  Restaurant           30,000               7,000           null              null       25,000          10,000

在此之后，使用df.melt()和df.pivot()方法获得最终结果！

df = df.melt(id_vars='company')
df[['department','value_type']] = df.variable.str.split('_', expand=True)
df = df.pivot(index=['company', 'department'], columns='value_type', values='value').sort_index().reset_index()
df = df[df['budget']!='null']
df = df.rename_axis(None, axis=1).reset_index(drop=True)
print(df)

输出：

      company department   budget remaining
0  Law Office    finance  300,000   100,000
1  Law Office  marketing  450,000   150,000
2  Law Office      sales  200,000    50,000
3  Restaurant  marketing   30,000     7,000
4  Restaurant      sales   25,000    10,000

谢谢@BeRT2me，对我来说很好的学习！

Answer 2

给定一个看起来像这样的文本文件：

Law Office | 450,000 | 150,000 | 300,000 | 100,000 | 200,000 | 50,000
Restaurant | 30,000  | 7,000   | null    | null    | 25,000  | 10,000

我们可以做的：

df = pd.read_csv('file.txt', sep=' \| ', engine='python')

# Reverse the column names on '_'.
df.columns = ['_'.join(reversed(x.split('_'))) for x in df.columns]

# Use pd.wide_to_long
df = pd.wide_to_long(df, ['budget', 'remaining'], i='company', j='department', sep='_', suffix=r'\w+').sort_index()
df = df.reset_index().dropna()
print(df)

输出：

      company department   budget remaining
0  Law Office    finance  300,000   100,000
1  Law Office  marketing  450,000   150,000
2  Law Office      sales  200,000    50,000
4  Restaurant  marketing   30,000     7,000
5  Restaurant      sales   25,000    10,000

测试，以及我如何将值设为数字以供将来计算：

import pandas as pd
from io import StringIO

d='''company | marketing_budget | marketing_remaining | finance_budget | finance_remaining | sales_budget | sales_remaining
Law Office | 450,000 | 150,000 | 300,000 | 100,000 | 200,000 | 50,000
Restaurant | 30,000 | 7,000 | null | null | 25,000 | 10,000'''

df = pd.read_csv(StringIO(d), sep=' \| ', engine='python')
df = df.fillna('').applymap(lambda x: x.replace(',', ''))
for col in df.columns:
    df[col] = pd.to_numeric(df[col], errors='ignore')

df.columns = ['_'.join(reversed(x.split('_'))) for x in df.columns]

df = pd.wide_to_long(df, ['budget', 'remaining'], i='company', j='department', sep='_', suffix=r'\w+').sort_index()
df = df.reset_index().dropna()
print(df)

....

      company department    budget  remaining
0  Law Office    finance  300000.0   100000.0
1  Law Office  marketing  450000.0   150000.0
2  Law Office      sales  200000.0    50000.0
4  Restaurant  marketing   30000.0     7000.0
5  Restaurant      sales   25000.0    10000.0

如何从数据集中分离一行，但在每一行中重复一些变量？

问题描述

2 个解决方案

解决方案1
3 2022-06-06 01:12:37

解决方案2
2 已采纳 2022-06-06 01:24:55

如何从数据集中分离一行，但在每一行中重复一些变量？

问题描述

2 个解决方案

解决方案1 3 2022-06-06 01:12:37

解决方案2 2 已采纳 2022-06-06 01:24:55

解决方案1
3 2022-06-06 01:12:37

解决方案2
2 已采纳 2022-06-06 01:24:55