[英]How do I add data to a column only if a certain value exists in previous column using Python and Faker?
I'm pretty new to Python and not sure what to even google for this.我对 Python 还很陌生,甚至不知道谷歌要做什么。 What I am trying to do is create a Pandas DataFrame that is filled with fake data by using Faker.
我要做的是创建一个 Pandas DataFrame 使用 Faker 填充虚假数据。 The problem I am having is each column is generating fake data in a silo.
我遇到的问题是每一列都在一个孤岛中生成假数据。 I want to be able to have fake data created based on something that exists in a prior column.
我希望能够根据先前列中存在的内容创建虚假数据。
So in my example below, I have pc_type ["PC", "Apple]
From there I have the operating system and the options are Windows 10, Windows 11, and MacOS. Now I want only where pc_type = "Apple"
to have the columns fill with the value of MacOS. Then for everything that is type PC, it's 50% Windows 10 and 50% Windows 11.所以在下面的示例中,我有
pc_type ["PC", "Apple]
从那里我有操作系统,选项是 Windows 10、Windows 11 和 MacOS。现在我只想要pc_type = "Apple"
列填充 MacOS 的值。然后对于类型为 PC 的所有内容,它是 50% Windows 10 和 50% Windows 11。
How would I write this code so that in the function body I can make that distinction clear and the results will reflect that?我将如何编写此代码,以便在 function 主体中我可以清楚地区分这种区别并且结果将反映这一点?
from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random
pc_type = ['PC', 'Apple']
fake = Faker()
def create_data(x):
project_data = {}
for i in range(0, x):
project_data[i] = {}
project_data[i]['Name'] = fake.name()
project_data[i]['PC Type'] = fake.random_element(pc_type)
project_data[i]['With Windows 10'] = fake.boolean(chance_of_getting_true=25)
project_data[i]['With Windows 11 '] = fake.boolean(chance_of_getting_true=25)
project_data[i]['With MacOS'] = fake.boolean(chance_of_getting_true=50)
return project_data
df = pd.DataFrame(create_data(10)).transpose()
df
To have coherent values, you can use something like:要获得一致的值,您可以使用以下内容:
from faker import Faker
import pandas as pd
import numpy as np
def create_data(x):
pc_type = ['PC', 'Apple']
fake = Faker()
data = {'Name': [fake.name() for _ in range(x)],
'PC Type': np.random.choice(pc_type, x)}
df = pd.DataFrame(data)
df['With MacOS'] = df['PC Type'] == 'Apple'
pc = df['PC Type'] == 'PC'
w10 = np.random.choice([True, False], len(df), p=(0.5, 0.5))
df['With Windows 10'] = pc & w10
df['With Windows 11'] = pc & ~w10
return df
df = create_data(10)
Output: Output:
>>> df
Name PC Type With MacOS With Windows 10 With Windows 11
0 Charles Dawson PC False True False
1 Patricia Bautista PC False False True
2 Ruth Clark PC False True False
3 Justin Lopez PC False True False
4 Grace Russell PC False True False
5 Grant Moss PC False True False
6 Tracy Ho Apple True False False
7 Connie Mitchell Apple True False False
8 Catherine Nichols Apple True False False
9 Nathaniel Bryant PC False False True
I'd slightly change the approach and generate a column OS
.我会稍微改变方法并生成一个列
OS
。 This column you can then transform into With MacOS
etc. if needed.如果需要,您可以将此列转换为
With MacOS
等。
With this approach its easier to get the 0.5 / 0.5 split within Windows right:使用这种方法,更容易在 Windows 中获得 0.5 / 0.5 拆分:
from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random
from collections import OrderedDict
pc_type = ['PC', 'Apple']
wos_type = OrderedDict([('With Windows 10', 0.5), ('With Windows 11', 0.5)])
fake = Faker()
def create_data(x):
project_data = {}
for i in range(x):
project_data[i] = {}
project_data[i]['Name'] = fake.name()
project_data[i]['PC Type'] = fake.random_element(pc_type)
if project_data[i]['PC Type'] == 'PC':
project_data[i]['OS'] = fake.random_element(elements = wos_type)
else:
project_data[i]['OS'] = 'MacOS'
return project_data
df = pd.DataFrame(create_data(10)).transpose()
df
Output Output
Name PC Type OS
0 Nicholas Walker Apple MacOS
1 Eric Hull PC With Windows 10
2 Veronica Gonzales PC With Windows 11
3 Mrs. Krista Richardson Apple MacOS
4 Anne Craig PC With Windows 10
5 Joseph Hayes PC With Windows 10
6 Mary Nelson Apple MacOS
7 Jill Hunt Apple MacOS
8 Mark Taylor PC With Windows 11
9 Kyle Thompson PC With Windows 10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.