[英]How do I add data to a column only if a certain value exists in previous column using Python and Faker?
我對 Python 還很陌生,甚至不知道谷歌要做什么。 我要做的是創建一個 Pandas DataFrame 使用 Faker 填充虛假數據。 我遇到的問題是每一列都在一個孤島中生成假數據。 我希望能夠根據先前列中存在的內容創建虛假數據。
所以在下面的示例中,我有pc_type ["PC", "Apple]
從那里我有操作系統,選項是 Windows 10、Windows 11 和 MacOS。現在我只想要pc_type = "Apple"
列填充 MacOS 的值。然后對於類型為 PC 的所有內容,它是 50% Windows 10 和 50% Windows 11。
我將如何編寫此代碼,以便在 function 主體中我可以清楚地區分這種區別並且結果將反映這一點?
from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random
pc_type = ['PC', 'Apple']
fake = Faker()
def create_data(x):
project_data = {}
for i in range(0, x):
project_data[i] = {}
project_data[i]['Name'] = fake.name()
project_data[i]['PC Type'] = fake.random_element(pc_type)
project_data[i]['With Windows 10'] = fake.boolean(chance_of_getting_true=25)
project_data[i]['With Windows 11 '] = fake.boolean(chance_of_getting_true=25)
project_data[i]['With MacOS'] = fake.boolean(chance_of_getting_true=50)
return project_data
df = pd.DataFrame(create_data(10)).transpose()
df
要獲得一致的值,您可以使用以下內容:
from faker import Faker
import pandas as pd
import numpy as np
def create_data(x):
pc_type = ['PC', 'Apple']
fake = Faker()
data = {'Name': [fake.name() for _ in range(x)],
'PC Type': np.random.choice(pc_type, x)}
df = pd.DataFrame(data)
df['With MacOS'] = df['PC Type'] == 'Apple'
pc = df['PC Type'] == 'PC'
w10 = np.random.choice([True, False], len(df), p=(0.5, 0.5))
df['With Windows 10'] = pc & w10
df['With Windows 11'] = pc & ~w10
return df
df = create_data(10)
Output:
>>> df
Name PC Type With MacOS With Windows 10 With Windows 11
0 Charles Dawson PC False True False
1 Patricia Bautista PC False False True
2 Ruth Clark PC False True False
3 Justin Lopez PC False True False
4 Grace Russell PC False True False
5 Grant Moss PC False True False
6 Tracy Ho Apple True False False
7 Connie Mitchell Apple True False False
8 Catherine Nichols Apple True False False
9 Nathaniel Bryant PC False False True
我會稍微改變方法並生成一個列OS
。 如果需要,您可以將此列轉換為With MacOS
等。
使用這種方法,更容易在 Windows 中獲得 0.5 / 0.5 拆分:
from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random
from collections import OrderedDict
pc_type = ['PC', 'Apple']
wos_type = OrderedDict([('With Windows 10', 0.5), ('With Windows 11', 0.5)])
fake = Faker()
def create_data(x):
project_data = {}
for i in range(x):
project_data[i] = {}
project_data[i]['Name'] = fake.name()
project_data[i]['PC Type'] = fake.random_element(pc_type)
if project_data[i]['PC Type'] == 'PC':
project_data[i]['OS'] = fake.random_element(elements = wos_type)
else:
project_data[i]['OS'] = 'MacOS'
return project_data
df = pd.DataFrame(create_data(10)).transpose()
df
Output
Name PC Type OS
0 Nicholas Walker Apple MacOS
1 Eric Hull PC With Windows 10
2 Veronica Gonzales PC With Windows 11
3 Mrs. Krista Richardson Apple MacOS
4 Anne Craig PC With Windows 10
5 Joseph Hayes PC With Windows 10
6 Mary Nelson Apple MacOS
7 Jill Hunt Apple MacOS
8 Mark Taylor PC With Windows 11
9 Kyle Thompson PC With Windows 10
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.