简体   繁体   English

仅当使用 Python 和 Faker 的上一列中存在某个值时,如何将数据添加到列中?

[英]How do I add data to a column only if a certain value exists in previous column using Python and Faker?

I'm pretty new to Python and not sure what to even google for this.我对 Python 还很陌生,甚至不知道谷歌要做什么。 What I am trying to do is create a Pandas DataFrame that is filled with fake data by using Faker.我要做的是创建一个 Pandas DataFrame 使用 Faker 填充虚假数据。 The problem I am having is each column is generating fake data in a silo.我遇到的问题是每一列都在一个孤岛中生成假数据。 I want to be able to have fake data created based on something that exists in a prior column.我希望能够根据先前列中存在的内容创建虚假数据。

So in my example below, I have pc_type ["PC", "Apple] From there I have the operating system and the options are Windows 10, Windows 11, and MacOS. Now I want only where pc_type = "Apple" to have the columns fill with the value of MacOS. Then for everything that is type PC, it's 50% Windows 10 and 50% Windows 11.所以在下面的示例中,我有pc_type ["PC", "Apple]从那里我有操作系统,选项是 Windows 10、Windows 11 和 MacOS。现在我只想要pc_type = "Apple"列填充 MacOS 的值。然后对于类型为 PC 的所有内容,它是 50% Windows 10 和 50% Windows 11。

How would I write this code so that in the function body I can make that distinction clear and the results will reflect that?我将如何编写此代码,以便在 function 主体中我可以清楚地区分这种区别并且结果将反映这一点?

from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random

pc_type = ['PC', 'Apple']
fake = Faker()


def create_data(x):
    project_data = {}
    for i in range(0, x):
        project_data[i] = {}
        project_data[i]['Name'] = fake.name()
        project_data[i]['PC Type'] = fake.random_element(pc_type)
        project_data[i]['With Windows 10'] = fake.boolean(chance_of_getting_true=25)
        project_data[i]['With Windows 11 '] = fake.boolean(chance_of_getting_true=25)
        project_data[i]['With MacOS'] = fake.boolean(chance_of_getting_true=50)

    return project_data


df = pd.DataFrame(create_data(10)).transpose()
df

To have coherent values, you can use something like:要获得一致的值,您可以使用以下内容:

from faker import Faker
import pandas as pd
import numpy as np


def create_data(x):
    pc_type = ['PC', 'Apple']
    fake = Faker()
    data = {'Name': [fake.name() for _ in range(x)],
            'PC Type': np.random.choice(pc_type, x)}
    df = pd.DataFrame(data)
    df['With MacOS'] = df['PC Type'] == 'Apple'

    pc = df['PC Type'] == 'PC'
    w10 = np.random.choice([True, False], len(df), p=(0.5, 0.5))
    df['With Windows 10'] = pc & w10
    df['With Windows 11'] = pc & ~w10

    return df

df = create_data(10)

Output: Output:

>>> df
                Name PC Type  With MacOS  With Windows 10  With Windows 11
0     Charles Dawson      PC       False             True            False
1  Patricia Bautista      PC       False            False             True
2         Ruth Clark      PC       False             True            False
3       Justin Lopez      PC       False             True            False
4      Grace Russell      PC       False             True            False
5         Grant Moss      PC       False             True            False
6           Tracy Ho   Apple        True            False            False
7    Connie Mitchell   Apple        True            False            False
8  Catherine Nichols   Apple        True            False            False
9   Nathaniel Bryant      PC       False            False             True

I'd slightly change the approach and generate a column OS .我会稍微改变方法并生成一个列OS This column you can then transform into With MacOS etc. if needed.如果需要,您可以将此列转换为With MacOS等。

With this approach its easier to get the 0.5 / 0.5 split within Windows right:使用这种方法,更容易在 Windows 中获得 0.5 / 0.5 拆分:

from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random
from collections import OrderedDict

pc_type = ['PC', 'Apple']
wos_type = OrderedDict([('With Windows 10', 0.5), ('With Windows 11', 0.5)])
fake = Faker()

def create_data(x):
    project_data = {}
    for i in range(x):
        project_data[i] = {}
        project_data[i]['Name'] = fake.name()
        project_data[i]['PC Type'] = fake.random_element(pc_type)
        if project_data[i]['PC Type'] == 'PC':
            project_data[i]['OS'] = fake.random_element(elements = wos_type)
        else:
            project_data[i]['OS'] = 'MacOS'

    return project_data


df = pd.DataFrame(create_data(10)).transpose()
df

Output Output

                     Name PC Type               OS
0         Nicholas Walker   Apple            MacOS
1               Eric Hull      PC  With Windows 10
2       Veronica Gonzales      PC  With Windows 11
3  Mrs. Krista Richardson   Apple            MacOS
4              Anne Craig      PC  With Windows 10
5            Joseph Hayes      PC  With Windows 10
6             Mary Nelson   Apple            MacOS
7               Jill Hunt   Apple            MacOS
8             Mark Taylor      PC  With Windows 11
9           Kyle Thompson      PC  With Windows 10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 仅当某个列存在时,如何访问 dataframe 值? - How do I access a dataframe value only if a certain column exists? 如何使用 Python 删除列中单元格的某些部分? - How do I remove certain parts of cells in a column using Python? 如何使用 Python 的 mysql-connector 检查 MySQL 中是否存在某个列(使用 WHERE)? - How can I check if a certain column (using WHERE) exists in MySQL using Python's mysql-connector? 使用 Faker 创建的 Dataframe 列的数据类型 - Data type of Dataframe column created using Faker 当当前行和先前行中的名称(A 列中)匹配时,如何创建具有先前值(B 列中)的列? - How do I create a column with a previous value (in column B) when the names (in column A) in current and previous rows matches? 如何使用 python 仅选择我需要的列 - How do I pick only the column I require using python 如何使用python将列添加到现有excel文件? - How do I add a column to an existing excel file using python? 仅当列中存在某些条件值时才进行分组 - Groupby only when certain conditional value exists in a column 如何搜寻python已有的数据的sql列? - How do i seach an sql column for data that already exists from python? 如何在Python Pandas数据帧列上执行数学运算,但仅限于满足某个条件? - How do I perform a math operation on a Python Pandas dataframe column, but only if a certain condition is met?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM