仅当使用 Python 和 Faker 的上一列中存在某个值时，如何将数据添加到列中？

Question

I'm pretty new to Python and not sure what to even google for this.我对 Python 还很陌生，甚至不知道谷歌要做什么。 What I am trying to do is create a Pandas DataFrame that is filled with fake data by using Faker.我要做的是创建一个 Pandas DataFrame 使用 Faker 填充虚假数据。 The problem I am having is each column is generating fake data in a silo.我遇到的问题是每一列都在一个孤岛中生成假数据。 I want to be able to have fake data created based on something that exists in a prior column.我希望能够根据先前列中存在的内容创建虚假数据。

So in my example below, I have pc_type ["PC", "Apple] From there I have the operating system and the options are Windows 10, Windows 11, and MacOS. Now I want only where pc_type = "Apple" to have the columns fill with the value of MacOS. Then for everything that is type PC, it's 50% Windows 10 and 50% Windows 11.所以在下面的示例中，我有pc_type ["PC", "Apple]从那里我有操作系统，选项是 Windows 10、Windows 11 和 MacOS。现在我只想要pc_type = "Apple"列填充 MacOS 的值。然后对于类型为 PC 的所有内容，它是 50% Windows 10 和 50% Windows 11。

How would I write this code so that in the function body I can make that distinction clear and the results will reflect that?我将如何编写此代码，以便在 function 主体中我可以清楚地区分这种区别并且结果将反映这一点？

from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random

pc_type = ['PC', 'Apple']
fake = Faker()


def create_data(x):
    project_data = {}
    for i in range(0, x):
        project_data[i] = {}
        project_data[i]['Name'] = fake.name()
        project_data[i]['PC Type'] = fake.random_element(pc_type)
        project_data[i]['With Windows 10'] = fake.boolean(chance_of_getting_true=25)
        project_data[i]['With Windows 11 '] = fake.boolean(chance_of_getting_true=25)
        project_data[i]['With MacOS'] = fake.boolean(chance_of_getting_true=50)

    return project_data


df = pd.DataFrame(create_data(10)).transpose()
df

Answer 1

To have coherent values, you can use something like:要获得一致的值，您可以使用以下内容：

from faker import Faker
import pandas as pd
import numpy as np


def create_data(x):
    pc_type = ['PC', 'Apple']
    fake = Faker()
    data = {'Name': [fake.name() for _ in range(x)],
            'PC Type': np.random.choice(pc_type, x)}
    df = pd.DataFrame(data)
    df['With MacOS'] = df['PC Type'] == 'Apple'

    pc = df['PC Type'] == 'PC'
    w10 = np.random.choice([True, False], len(df), p=(0.5, 0.5))
    df['With Windows 10'] = pc & w10
    df['With Windows 11'] = pc & ~w10

    return df

df = create_data(10)

Output: Output：

>>> df
                Name PC Type  With MacOS  With Windows 10  With Windows 11
0     Charles Dawson      PC       False             True            False
1  Patricia Bautista      PC       False            False             True
2         Ruth Clark      PC       False             True            False
3       Justin Lopez      PC       False             True            False
4      Grace Russell      PC       False             True            False
5         Grant Moss      PC       False             True            False
6           Tracy Ho   Apple        True            False            False
7    Connie Mitchell   Apple        True            False            False
8  Catherine Nichols   Apple        True            False            False
9   Nathaniel Bryant      PC       False            False             True

Answer 2

I'd slightly change the approach and generate a column OS .我会稍微改变方法并生成一个列OS 。 This column you can then transform into With MacOS etc. if needed.如果需要，您可以将此列转换为With MacOS等。

With this approach its easier to get the 0.5 / 0.5 split within Windows right:使用这种方法，更容易在 Windows 中获得 0.5 / 0.5 拆分：

from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random
from collections import OrderedDict

pc_type = ['PC', 'Apple']
wos_type = OrderedDict([('With Windows 10', 0.5), ('With Windows 11', 0.5)])
fake = Faker()

def create_data(x):
    project_data = {}
    for i in range(x):
        project_data[i] = {}
        project_data[i]['Name'] = fake.name()
        project_data[i]['PC Type'] = fake.random_element(pc_type)
        if project_data[i]['PC Type'] == 'PC':
            project_data[i]['OS'] = fake.random_element(elements = wos_type)
        else:
            project_data[i]['OS'] = 'MacOS'

    return project_data


df = pd.DataFrame(create_data(10)).transpose()
df

Output Output

                     Name PC Type               OS
0         Nicholas Walker   Apple            MacOS
1               Eric Hull      PC  With Windows 10
2       Veronica Gonzales      PC  With Windows 11
3  Mrs. Krista Richardson   Apple            MacOS
4              Anne Craig      PC  With Windows 10
5            Joseph Hayes      PC  With Windows 10
6             Mary Nelson   Apple            MacOS
7               Jill Hunt   Apple            MacOS
8             Mark Taylor      PC  With Windows 11
9           Kyle Thompson      PC  With Windows 10

仅当使用 Python 和 Faker 的上一列中存在某个值时，如何将数据添加到列中？

问题描述

2 个解决方案

解决方案1
0 2022-07-29 05:13:07

解决方案2
0 2022-07-29 05:35:01

仅当使用 Python 和 Faker 的上一列中存在某个值时，如何将数据添加到列中？

问题描述

2 个解决方案

解决方案1 0 2022-07-29 05:13:07

解决方案2 0 2022-07-29 05:35:01

解决方案1
0 2022-07-29 05:13:07

解决方案2
0 2022-07-29 05:35:01